Phoneme Awareness

This is the next level of this project. 

There are a lot of cool TTS systems in this space
- https://cartesia.ai/product/python-text-to-speech-api-tts
- https://inworld.ai/tts

I could keep using Piper since it's built on VITS and looks like it's now embedded in Piper Python https://github.com/rhasspy/piper-phonemize 

When I was looking into trying to implement [beeping](https://en.wikipedia.org/wiki/Bleep_censor) it looks like the difficult part is trying to get a phoneme mapping of the letters so like:

"F" (50ms), "U" (100ms), "CK" (80ms)

I keep the first phoneme, mute the middle, and unmute for CK.

Luckily I'm only working with English right now because it looks like for Arabic https://www.reddit.com/r/TextToSpeech/comments/1ooiabf/how_can_i_extract_phoneme_timings_for_lipsync/.

I could use ElevenLabs or Azure but we ball! Also, they provide Visemes to get exact millisecond start/end? Kinda crazy. 

<img width="500" height="500" alt="Image" src="https://github.com/user-attachments/assets/06931914-3897-42da-8d4a-0503ce2c7488" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phoneme Awareness #22

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Phoneme Awareness #22

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions