Skip to content

Phoneme Awareness #22

@ramonasuncion

Description

@ramonasuncion

This is the next level of this project.

There are a lot of cool TTS systems in this space

I could keep using Piper since it's built on VITS and looks like it's now embedded in Piper Python https://github.com/rhasspy/piper-phonemize

When I was looking into trying to implement beeping it looks like the difficult part is trying to get a phoneme mapping of the letters so like:

"F" (50ms), "U" (100ms), "CK" (80ms)

I keep the first phoneme, mute the middle, and unmute for CK.

Luckily I'm only working with English right now because it looks like for Arabic https://www.reddit.com/r/TextToSpeech/comments/1ooiabf/how_can_i_extract_phoneme_timings_for_lipsync/.

I could use ElevenLabs or Azure but we ball! Also, they provide Visemes to get exact millisecond start/end? Kinda crazy.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions