project – document my aws experience trying to create a simple voice-to-text service

I want to have a simnple web service that can take my digital voice recored (hereafter dvr) and transcribe to text, send transcription (after my approvall?) to a llm like chatgpt or gemini or whavs, returning a summary, a summary half as long/twice as condensed, and optionally the soundtrack files, and optionally in other format/styles, return by emaik, or ideally direct to my preferred local storage (or why not just cloud? or hwy not pboth

chatgpt to summarizze; look at filename or other metadata for instucitno, for instance, if the filename or metad has “letter-to-editor”, then the results are returned as a leteer to the editor, or “letter to mom”, even having it guess by looking at content and past content, earning each detinees style of conversation.

currrntly i do the following manually

  1. Copy file from my dvr to my PC
  2. upload audio fiiles to AWS bucket
  3. on aws, create a transcription job, manually filing the details;
    1. I can do this at a rate of 10 ramblings in 15 minutes if nbot distracted; but tedious
    2. maybe name the files by the first 10 weords?
      1. summerizer steps below via llm probs do a bestter job
      2. maybe have it pause with suggestin? job transcribe is done, then make it wait until confirmation of nameine; this is important be cause so much can be done just with filenme system, not thing external, ex. maybe better use of properties llike os/2 metadata
  4. on competion (no mre than a minute so far), download the 3 files
    1. aws bucket explorer doesn’t let multiselected files to be downloads, so one at a time ffs
    2. aws returns the transcription in .json format, so need to extract the text
    3. should be trivial with python, or aws tools?
  5. send the transcripts to me by email
    1. email easy
    2. ideally files would be stored locally or in google drive or othjer cloud service
    3. better: mkae a post to blog or notion or wheves?

2024-04-09 – do 20 transcriptions manual, 2 x ~15 minutes each 10

2024-04-10 – did 10 more manual transcription jobs:

with 79 files in my output bucket:

2024-04-11 – further work on documenting how I do this manually using AWS Free Tier

extracting text of the transcription from a .json file is trivial; Gemini+ did it:

as can a .json file viewer:

2024-04-11 – i’ve bumped into my limit for tranjscriptions; 3600 seconds ( 60 mins), whcih might be the time of the audio playing time of the ~40 I did last few days?

Leave a Reply