Voice Funtion like chatgpt
in progress
Merlin
in progress
cc: Ethan Cohen
Work is underway currently for the Merlin mobile app only.
Realtime API has high costs due to which implementation for prolonged use time is hindered. We're looking to keep it mobile only FOR NOW since the voice-talking experience is much better suited by design for hand-held devices (the way we envision it), and anything at the scale of our desktop apps would be unsustainable.
We're also figuring out a way to give users the ability to chat with voice with all text LLMs we offer on Merlin Chat. The quality of the experience is the bottleneck so far. Thanks!
Ethan Cohen
Hey team - do we have a timeline to release on this one for desktop? Feel like we're getting a lot of likes but haven't seen any movement. endu
G
Guy
endu Another thing for me is I would really like to be able to talk out e-mails, and then have it take that audio and make a clearer e-mail.
Or the same thing for notes/ideas - I speak out the note/idea, and it then make it clearer and organised.
Similar to what Letterly does - https://letterly.app/
Ethan Cohen
endu browser please and thank you! That is definitely the main use case for me, especially chatting with web pages RAG + voice
endu
Ethan Cohen okk but if its iteraction through Voice mode then we will have to wait on the API release of OpenAI voice mode(as other counter parts for this is not upto OAI level which has been trained on voice)! ... & if it's listening to the response generated by Merlin chat( read aloud feature basically) that we can now but it will be little costly for us and hence will have top be a pro only feature.... Do let us know which one would help your workflow...
Voice interaction v/s Read aloud
Also curious to know how that adds more value when on laptop(for mobile its easy for me to relate to..so we can skip that!) compared to the present mode of intercation with AI !
cc: Jean Chai ;Guy.
Alex as already posted with great response but stil tagging you if you wanna add more :)
G
Guy
endu To me, I don't care about it reading back to me out loud. I can see that people with Accessibility needs might, though.
I just want to be able to talk to it instead of typing.
G
Guy
endu Are AS LTD accounts "pro"?
Ethan Cohen
endu happy to add my thoughts. Reading aloud would not add value to my use case but voice mode and conversing with the chatbot would. The reason this would add value is because I am typically using Merlin for Q&A rather than summarization and reading. I typically ask it a question, it answers, and then I ask another questions. I would also then ask it to clarify my thinking but I want it to be specifically using the material on the page I'm working with because I don't want it to hallucinate and bring in other concepts or context from it's base knowledge.
For example -> "This is how I'm currently thinking about it, am I right based on the content?". "Can you role play using this material to help put me in the situation to make decisions with it?"
Ethan Cohen
Also endu - I only use Merlin for intense research and learning so I don't do any of that on my phone. It's a work tool more than anything, not really a consumer use case for me. Context is I work on product growth and venture investing so I'm not using it for light reading.
endu
Guy got it & noted
endu
Guy no they are not unlimited ... they have limits and expire after some time ..
endu
Ethan Cohen gotcha, for that we will have to wait for the release of GPT voice model API!
endu
Ethan Cohen this is helpful thanks for sharing :)
Ethan Cohen
endu of course. Looks like the day has finally come? :)
endu
Ethan Cohen yep yep and we are on it; will try to ship it as quick as possible :)
Ethan Cohen
endu thank you as always to you and the team. Continued to be impressed by your ability to balance speed to ship new features, continued iteration on current features, while fixing bugs asap. Kudos
J
Jean Chai
May I ask when do you plan to start?
endu
Jean Chai we are waiting for them open up the voice model API for public... :/ and not much alternates are there..
T
Tushar Ghige
planned
endu
under review
endu
We are trying to bring that feature with mobile first but rn the GPT-4o voice function API is not out yet, we are waiting on that.
But do let us know it the voice function is preferred in mobile only or for the laptop/browser usecases..
Alex
endu Trust that on the mobile will be easier to use for now, however, we are excited to have it on browser as well, so that we have more visual on the laptop instead of just voice over mobile.
Such as on mobile, we can take a photo of the question and get GPT-4o to explain to us using voice conversation with us on how to resolve the question step by step.
On the browser, not sure this is achievable that we can upload the photo or video and chat with GPT-4o using voice, then it will show visually on the browser step-by-step on how to resolve an equation or question.
It would be great if it can be a note taking GPT-4o just by adding it to a Teams meeting, Google Meet, Zoom or other browser access Meetings.
endu
Got it, Alex Thanks a ton for sharing your thoughts. we will try to discuss for the note taking GPT bit internally and let you know..
G
Guy
endu Definitely browser and desktop as well, so I can just talk to my computer and it get to doing operations quicker.
endu
Guy 🙌
Alex
Like the ChatGPT-4o mobile app where we can converse with the ChatGPT-4o.