Comment by ๐ฒ Half_Elf_Monk
I appreciate thinking about this topic, and the comments put here so far. I also like listening to audio, which works well in the workflow (or lack thereof) in a given day. The capacity to generate accurate transcriptions is helpful.
turboscribe.ai blew my mind when I found it. This was a way to generate reasonably accurate text transcription of the audio, and do it relatively easily. I haven't found a way to do this locally, but @requiem mentioned something about Whisper so maybe that's the answer I need.
I don't know why podcasting services couldn't do this automatically for all the podcasts they host/serve. Or why someone couldn't train an AI (using their GPU's), and then distribute the model for others to use on their less-intensive machines. Maybe I'm not understanding the complexity.
I'd be interested to hear @norayr 's thoughts on why that isn't a permacomputing solution. I guess it's not sustainable in a very long run sort of way, but if we train the voice models now, wouldn't we be able to use that model down the road reasonably well? If all of society goes down in a CME or war or something, I have way more important stuff to do than worrying about whether listeners get a transcript of my podcast.
In any case, if anyone is running a local AI to generate good transcripts, please report in with your experience. That sounds very very useful.
2024-11-14 ยท 1 year ago
3 Later Comments โ
๐ norayr [OP] ยท 2024-11-16 at 00:17:
well i think i explained i wasnt right and asked to excuse me, i had some feeling at the point of time which made me write that.
but to sum up
- ai is dependence on big corporations, lots of computational resources. not permacomputing. not possible to grow/train in a home lab.
- to me it is much easier to read than to listen. the radio genre and video genre are not text, they have other means of\expression which makes them different. still when i drive i listen podcasts or music. i even started to generate my own podcast xml from my anonradio dj set recordings. and gave up soundcloud. but yes if i can choose i choose reading texts.
๐ norayr [OP] ยท 2024-11-16 at 00:39:
- accessibility is very important. we need transcripts but that is a hard task, and i would like to avoid corporate ai. and we have almost no other ai, only corporate ais. also for mty language there are no even corporate ais that could make a transcript. youtube cannot transcript armenian too. maybe one day it will able to. still i would like to avoid dependance on corporations and saas.
๐ฒ Half_Elf_Monk [โ๏ธ] ยท 2024-11-16 at 21:42:
Ah, thanks @norayr , I think I understand you now. I also would rather the models were freed... from corporate and state dependence. I'm hopeful that such things will exist in the future. Cheers...
Original Post
post text, not audio โ publishing audio is convenient, but how to find it on the internet? we even agree on that images should have alt descriptions. otherwise we should rely on ai (which is not lowtech) to find us audio or video files that have the information we search for. p. s. that also relates to 'voice messages' in chats. it is easy to message, but it is not possible later to find the information in the chat log. again, ai may help, but do we want it to help? also, while it is easy...