- First-gen Chromecast are now failing for many, 13 years later
- I fixed Android Auto lag with 3 tiny changes
- 3 gripping HBO Max documentaries to watch this weekend (May 22-24)
- Build AI-powered dashboard automation agents with NLP on Amazon Bedrock AgentCore
- Gemini app for Mac adding ‘Spark’ agent, voice control this summer
- I switched back to wired headphones, and I’m not going back
- Motorola Razr Ultra 2026 review: Copy and paste
- 2027 Kia Seltos delivers flagship Audi tech for over $15,000 less
Browsing: speech
IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference
IBM released two new open speech recognition models— Granite Speech 4.1 2B and Granite Speech 4.1 2B-NAR — and they make a compelling case for what…
smol-audio: A Colab-Friendly Notebook Collection for Fine-Tuning Whisper, Parakeet, Voxtral, Granite Speech, and Audio Flamingo 3
Audio AI has had a breakout year. Automatic speech recognition has gotten dramatically better with models like OpenAI’s Whisper variants, NVIDIA’s Parakeet, and Mistral’s Voxtral. Audio…
What you need to knowGoogle celebrates its 20th anniversary of the Translate app with users by rolling out a new feature: Pronunciation Practice.Located in the Practice…
MiniMax Releases MMX-CLI: A Command-Line Interface That Gives AI Agents Native Access to Image, Video, Speech, Music, Vision, and Search
MiniMax, the AI research company behind the MiniMax omni-modal model stack, has released MMX-CLI — Node.js-based command-line interface that exposes the MiniMax AI platform’s full suite…
Mistral AI Releases Voxtral TTS: A 4B Open-Weight Streaming Speech Model for Low-Latency Multilingual Voice Generation
Mistral AI has released Voxtral TTS, an open-weight text-to-speech model that marks the company’s first major move into audio generation. Following the release of its transcription…
Introducing Amazon Polly Bidirectional Streaming: Real-time speech synthesis for conversational AI
Building natural conversational experiences requires speech synthesis that keeps pace with real-time interactions. Today, we’re excited to announce the new Bidirectional Streaming API for Amazon Polly,…
Cohere AI Releases Cohere Transcribe: A SOTA Automatic Speech Recognition (ASR) Model Powering Enterprise Speech Intelligence
In the landscape of enterprise AI, the bridge between unstructured audio and actionable text has often been a bottleneck of proprietary APIs and complex cascaded pipelines.…
Tencent AI Open Sources Covo-Audio: A 7B Speech Language Model and Inference Pipeline for Real-Time Audio Conversations and Reasoning
Tencent AI Lab has released Covo-Audio, a 7B-parameter end-to-end Large Audio Language Model (LALM). The model is designed to unify speech processing and language intelligence by…
Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models
Speech technology still has a data distribution problem. Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems have improved rapidly for high-resource languages, but many African languages…
IBM AI Releases Granite 4.0 1B Speech as a Compact Multilingual Speech Model for Edge AI and Translation Pipelines
IBM has released Granite 4.0 1B Speech, a compact speech-language model designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST). The release…
