Published note

changelog

keizo ยท November 9, 2024

changelog overview โ†’

Crazy speed improvement on speech to text. Switched speech to text transcription api to Groq Whisper-3-large. It falls back to OpenAI api (whisper 2) which should make everything a little more reliable with that redundancy. I tried whisper-3-large-turbo, and while faster, doesn't seem quite as accurate.

Also added voice activity detection. Basically it should prevent the random hallucination or jibberish if you submit empty audio. VAD is shockingly tricky to get right. Not absolutely sure this is right. But testing live!