NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Advancing voice intelligence with new models in the API (openai.com)
wild_egg 32 days ago [-]
> $32 / 1M audio input tokens ($0.40 for cached input tokens)

Anyone know how much audio is 1M tokens? I have no way of knowing if this is fine or prohibitively expensive.

andrewstuart 32 days ago [-]
“Perfect! “

Says the team at OpenAI whose job it is to ensure you thought that.

tjohnell 32 days ago [-]
I’ve been doing a ragtag version of this with sub-agents, TTS, and STT in Claude Code. Real-time would be pretty awesome, even if it’s just orchestrating other agents. I’m might have to try this on top of my Claude agents. I don’t think the model doing the talking necessarily needs to do the heavy reasoning - just needs to have context on your other agents, delegate, and remain present.
jiehong 32 days ago [-]
Looks like the GPT‑Realtime‑Whisper model isn’t open weight like the old whisper model. Too bad!

However, OpenAI had and still has a true lead on voice model interactions. That’s where Chinese AI companies don’t do as well: deepseek doesn’t have anything or like Kimi that can speak out in any language except English or Chinese.

andrewstuart 32 days ago [-]
Fortunately there’s real competition in the voice ai field.

Presumably because it’s genuinely useful - I can easily think of applications to make with a powerful voice ui.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 10:22:20 GMT+0000 (Coordinated Universal Time) with Vercel.