Voxtype is highly customizable and that’s why I choose it. A fancy voice-to-text tool, in my opinion, should at least have the following features:
LLM post-processing. This is critical: LLM polishes the raw transcription, adds punctuation, corrects grammar and makes the text more polite and official. This is the key to mimicking the Typeless experience. A large-enough model like DeepSeek-V3.2 is required, since tiny-size models are not good at instruction following. You won’t want your model to answer the question in your voice note or tell you whom she is, but to polish the text (I met such problem when using ollama to run models like qwen2.5:1.5b locally.)!