Dev.to•Jan 29, 2026, 9:17 AM

Whisper adds voice input to react forms: now users speak their minds while devs battle mic permissions and skyrocketing vps bills

Ryan Cwynar, a developer, has successfully integrated real-time voice transcription into a React form using OpenAI's Whisper technology. The setup involves capturing audio via the MediaRecorder API, streaming it to a Whisper server over WebSocket, and transcribing the audio in real-time. The Whisper server, running on a small VPS, handles audio chunk buffering, voice activity detection, and streaming transcription. Cwynar's implementation uses a 16kHz mono audio setup, with 500ms chunks providing a balance between latency and efficiency. The server sends partial and final transcriptions, which are updated in the input field as the user speaks. This technology has significant implications for modern web apps, where voice input is becoming increasingly expected, particularly on mobile devices. Cwynar's implementation demonstrates the feasibility of self-hosted Whisper servers, offering better accuracy, privacy, and control compared to browser-based solutions like webkitSpeechRecognition. The full implementation is live on Cwynar's website, showcasing the potential of AI-powered voice input for various applications.

Viral Score: 78%

Read full article on Dev.to →

RoastedFeeds

Whisper adds voice input to react forms: now users speak their minds while devs battle mic permissions and skyrocketing vps bills

More Roasted Feeds