re: morshutalk-v2 dev thread

Today's goal is to figure out audio transcoding in Python so I can always have this thing output a smaller file for uploading elsewhere. I don't want to rely on subprocess because that makes things too OS-specific (I at least want to know this works on Windows and Linux) and I don't want to deal with any kind of external calls that I can't guarantee will stay the same between systems or (major) versions.

I at least want to be able to output to one other audio format - and if I can get away with feeding the encoder the bytearray (and thus not writing a wav file), even better. Target bitrate is going to be "good enough for speech" (so Low) and sample rate is going to also be "good enough for speech" (still low).

I did wonder if it'd be possible to use a different bit-depth, but after digging into things yesterday it looks like Python doesn't like that - unless I went from 16-bit to 32-bit, but that's way more than I need and thus a much larger file than warranted.

Anyways. Y'all will find out what ends up happening today.

#MorshuTalk_v2 #Python

re: morshutalk-v2 dev thread

Blessed be, someone has continued opuslib since it was deprecated forever ago, so I'll be working directly with opuslib_next for all of my opus output needs. Audio will at the very least be encoded both as the original wav file in addition to the OGG opus audio file. At least, assuming I can't find a higher-level option that's actively maintained.

Since I'll be working with opuslib more closelier (likely) it'll be More Annoying (TM) but a. it means I get to Learn python more gooder and 2. I'll get to figure that all out.

I think I'll abandon the idea of using anything but a 16-bit depth for the audio output - I can easily get away with a 22050 Hz sample-rate since we're talking voice lines, and Opus or whatever is going to compress things down a fair bit as well. We're also talking audio that's probably not much longer than 15 seconds barring the occasional jank, so an ever so slightly larger file because I can't be bothered to figure out narrower bit-depths is fine.

I'm still borrowing the original phoneme array since I don't want to sift through the original speech again to track down any possible improvements just yet. It honestly might be one of the few original bits that are exactly (or extremely) close to the original morshutalk, outside of the Morshu class.

I do want to tweak things to let him say numbers since g2p doesn't handle that eng -> eng-arpabet conversion at ALL (understandably so) so I need to figure out separating those if they're ever next to another character, and then convert the individual numerals into text (only covering 0 through 9, sorry he'll sound weird if you try to make him say 40 or whatever), which will then be handled properly by g2p.

No significant progress today other than finding opuslib_next, but that does give me a lead to dig into for encoding purposes. Whether I stick to using opuslib_next directly or use a higher-level system that's also cross-platform friendly IDK, but that'll come with time.

#MorshuTalk_v2 #Python

0

If you have a fediverse account, you can quote this note from your own instance. Search https://gts.social.senil.me/users/senil/statuses/01KD4Y1MZ1H1WA0WDNYHK2483T on your instance and quote it. (Note that quoting is not supported in Mastodon.)