Voice Funtion like chatgpt
in progress
M
Marlo
Voice input and output is a must in these days on mobile and also desktop, since ChatGPT and Gemini has is, I tend to use them, instead of Merlin and I am thinking about stopping the Merlin subscription.
It is way better and quick to interact with ai through voice
endu
Merged in a post:
Voice and music generation - e.g., ElevenLabs (voice)
G
Guy
Any chance of getting voice generation and music generation features, like image generation?
The best I know for voice generation being https://elevenlabs.io/
Merlin
in progress
cc: Ethan Cohen
Work is underway currently for the Merlin mobile app only.
Realtime API has high costs due to which implementation for prolonged use time is hindered. We're looking to keep it mobile only FOR NOW since the voice-talking experience is much better suited by design for hand-held devices (the way we envision it), and anything at the scale of our desktop apps would be unsustainable.
We're also figuring out a way to give users the ability to chat with voice with all text LLMs we offer on Merlin Chat. The quality of the experience is the bottleneck so far. Thanks!
Y
YM
Merlin . It makes me wonder if a lightweight opensource local llm is the answer for such a use case
Ethan Cohen
Hey team - do we have a timeline to release on this one for desktop? Feel like we're getting a lot of likes but haven't seen any movement. endu
G
Guy
endu Another thing for me is I would really like to be able to talk out e-mails, and then have it take that audio and make a clearer e-mail.
Or the same thing for notes/ideas - I speak out the note/idea, and it then make it clearer and organised.
Similar to what Letterly does - https://letterly.app/
Kira Kenjiro
I think Endu is right. Eleven labs is quite an expensive platform so it would 100% have to be behind premium or limited to tokens per month.
It's a cool thing to have 100%, I'd love to have eleven labs integration into merlin but at the moment at their current standpoint I don't think it's the right time for something like this. The best way to do it would be to use a alternative TTS api like the TTS-1 model by open ai and using SUNO for music generation. Other than that it's just outside merlin's ability at the moment
endu
Kira Kenjiro: hi & thanks for echoing(seconding.. :P) my views :)
endu
no Guy we are not going in that direction as of now; also the elevenlabs APIs are quite expensive so it can be a pro only feature!
G
Guy
endu So can it be left open for feedback and make it a pro feature?
endu
Guy ok cool that can be done keeping the ticket open then!
if we get in more pro buy ins maybe we can build it as well :P
G
Guy
endu Great!
Ethan Cohen
endu browser please and thank you! That is definitely the main use case for me, especially chatting with web pages RAG + voice
endu
Ethan Cohen okk but if its iteraction through Voice mode then we will have to wait on the API release of OpenAI voice mode(as other counter parts for this is not upto OAI level which has been trained on voice)! ... & if it's listening to the response generated by Merlin chat( read aloud feature basically) that we can now but it will be little costly for us and hence will have top be a pro only feature.... Do let us know which one would help your workflow...
Voice interaction v/s Read aloud
Also curious to know how that adds more value when on laptop(for mobile its easy for me to relate to..so we can skip that!) compared to the present mode of intercation with AI !
cc: Jean Chai ;Guy.
Alex as already posted with great response but stil tagging you if you wanna add more :)
G
Guy
endu To me, I don't care about it reading back to me out loud. I can see that people with Accessibility needs might, though.
I just want to be able to talk to it instead of typing.
G
Guy
endu Are AS LTD accounts "pro"?
Ethan Cohen
endu happy to add my thoughts. Reading aloud would not add value to my use case but voice mode and conversing with the chatbot would. The reason this would add value is because I am typically using Merlin for Q&A rather than summarization and reading. I typically ask it a question, it answers, and then I ask another questions. I would also then ask it to clarify my thinking but I want it to be specifically using the material on the page I'm working with because I don't want it to hallucinate and bring in other concepts or context from it's base knowledge.
For example -> "This is how I'm currently thinking about it, am I right based on the content?". "Can you role play using this material to help put me in the situation to make decisions with it?"
Ethan Cohen
Also endu - I only use Merlin for intense research and learning so I don't do any of that on my phone. It's a work tool more than anything, not really a consumer use case for me. Context is I work on product growth and venture investing so I'm not using it for light reading.
endu
Guy got it & noted
endu
Guy no they are not unlimited ... they have limits and expire after some time ..
endu
Ethan Cohen gotcha, for that we will have to wait for the release of GPT voice model API!
endu
Ethan Cohen this is helpful thanks for sharing :)
Ethan Cohen
endu of course. Looks like the day has finally come? :)
endu
Ethan Cohen yep yep and we are on it; will try to ship it as quick as possible :)
Ethan Cohen
endu thank you as always to you and the team. Continued to be impressed by your ability to balance speed to ship new features, continued iteration on current features, while fixing bugs asap. Kudos
J
Jean Chai
May I ask when do you plan to start?
endu
Jean Chai we are waiting for them open up the voice model API for public... :/ and not much alternates are there..
T
Tushar Ghige
planned
Load More
→