{"id":33141,"date":"2024-04-11T21:49:30","date_gmt":"2024-04-12T01:49:30","guid":{"rendered":"https:\/\/ckms.ca\/?p=33141"},"modified":"2024-04-11T21:49:30","modified_gmt":"2024-04-12T01:49:30","slug":"project-document-my-aws-experience-trying-to-create-a-simple-voice-to-text-service","status":"publish","type":"post","link":"https:\/\/blog.ckms.ca\/?p=33141","title":{"rendered":"project &#8211; document my aws  experience trying to create a simple voice-to-text service"},"content":{"rendered":"\n<p>I want to have a simnple web service that can take my digital voice recored (hereafter dvr) and transcribe to text, send transcription (after my approvall?) to a llm like chatgpt or gemini or whavs, returning a summary, a summary half as long\/twice as condensed, and optionally the soundtrack files, and optionally in other format\/styles, return by emaik, or ideally direct to my preferred local storage (or why not just cloud?  or hwy not pboth<\/p>\n\n\n\n<p>chatgpt to summarizze; look at filename or other metadata for instucitno, for instance, if the filename or metad has &#8220;letter-to-editor&#8221;, then the results are returned as a leteer to the editor, or &#8220;letter to mom&#8221;, even having it guess by looking at content and past content, earning each detinees style of conversation.<\/p>\n\n\n\n<p>currrntly i do the following manually<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Copy file from my dvr to my PC<\/li><li>upload audio fiiles to AWS bucket<\/li><li>on aws, create a transcription job, manually filing the details;<ol><li>I can do this at a rate of 10 ramblings in 15 minutes if nbot distracted; but tedious<\/li><li>maybe name the files by the first 10 weords?<ol><li>summerizer steps below via llm probs do a bestter job<\/li><li>maybe have it pause with suggestin?  job transcribe is done, then make it wait until confirmation of nameine; this is important be cause so much can be done just with filenme system, not thing external, ex. maybe better use of properties llike os\/2 metadata<\/li><li><\/li><\/ol><\/li><\/ol><\/li><li>on competion (no mre than a minute so far), download the 3 files<ol><li>aws bucket explorer doesn&#8217;t let multiselected files to be downloads, so one at a time ffs<\/li><li>aws returns the transcription in .json format, so need to extract the text<\/li><li>should be trivial with python, or aws tools?<\/li><\/ol><\/li><li>send the transcripts to me by email<ol><li>email easy<\/li><li>ideally files would be stored locally or in google drive or othjer cloud service<\/li><li>better: mkae a post to blog or notion or wheves?<\/li><\/ol><\/li><li><\/li><\/ol>\n\n\n\n<p>2024-04-09 &#8211; do 20 transcriptions manual, 2 x ~15 minutes each 10<\/p>\n\n\n\n<p>2024-04-10 &#8211; did 10 more manual transcription jobs:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"262\" src=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01-1024x262.jpg\" alt=\"\" class=\"wp-image-33147\" srcset=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01-1024x262.jpg 1024w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01-300x77.jpg 300w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01-768x196.jpg 768w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01-1536x392.jpg 1536w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01-750x192.jpg 750w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01-1226x313.jpg 1226w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01-675x172.jpg 675w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-45-01.jpg 1734w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>with 79 files in my output bucket:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-1024x576.jpg\" alt=\"\" class=\"wp-image-33148\" srcset=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-1024x576.jpg 1024w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-300x169.jpg 300w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-768x432.jpg 768w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-1536x864.jpg 1536w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-750x422.jpg 750w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-480x270.jpg 480w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-1226x689.jpg 1226w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37-675x380.jpg 675w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11_21-46-37.jpg 1709w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>2024-04-11 &#8211; further work on documenting how I do this manually using AWS Free Tier<\/p>\n\n\n\n<p>extracting text of the transcription from a .json file is trivial; Gemini+ did it:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-07.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"671\" src=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-07-1024x671.jpg\" alt=\"\" class=\"wp-image-33145\" srcset=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-07-1024x671.jpg 1024w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-07-300x197.jpg 300w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-07-768x503.jpg 768w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-07-750x492.jpg 750w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-07-675x442.jpg 675w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-07.jpg 1184w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>as can a .json file viewer:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"310\" src=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21-1024x310.jpg\" alt=\"\" class=\"wp-image-33146\" srcset=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21-1024x310.jpg 1024w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21-300x91.jpg 300w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21-768x232.jpg 768w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21-750x227.jpg 750w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21-1226x371.jpg 1226w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21-675x204.jpg 675w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-31-21.jpg 1333w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>2024-04-11 &#8211; i&#8217;ve bumped into my limit for tranjscriptions; 3600 seconds ( 60 mins), whcih might be the time of the audio playing time of the ~40 I did last few days?<\/p>\n\n\n\n<figure class=\"wp-block-gallery columns-1 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\"><ul class=\"blocks-gallery-grid\"><li class=\"blocks-gallery-item\"><figure><a href=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-02-13-1024x468.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"468\" src=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-02-13-1024x468.jpg\" alt=\"\" data-id=\"33142\" data-link=\"https:\/\/blog.ckms.ca\/?attachment_id=33142\" class=\"wp-image-33142\" srcset=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-02-13-1024x468.jpg 1024w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-02-13-300x137.jpg 300w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-02-13-768x351.jpg 768w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-02-13-750x343.jpg 750w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-02-13-675x309.jpg 675w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-02-13.jpg 1063w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><a href=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"819\" src=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40-1024x819.jpg\" alt=\"\" data-id=\"33144\" data-full-url=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40.jpg\" data-link=\"https:\/\/blog.ckms.ca\/?attachment_id=33144\" class=\"wp-image-33144\" srcset=\"https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40-1024x819.jpg 1024w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40-300x240.jpg 300w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40-768x614.jpg 768w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40-1536x1229.jpg 1536w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40-750x600.jpg 750w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40-1226x981.jpg 1226w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40-675x540.jpg 675w, https:\/\/blog.ckms.ca\/wp-content\/uploads\/2024\/04\/voice-ramblings-to-summarized-text-aws-2024-04-11-20-10-40.jpg 1806w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure><\/li><\/ul><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>I want to have a simnple web service that can take my&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[282],"tags":[],"class_list":["post-33141","post","type-post","status-publish","format-standard","hentry","category-article","wpcat-282-id"],"_links":{"self":[{"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=\/wp\/v2\/posts\/33141","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=33141"}],"version-history":[{"count":1,"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=\/wp\/v2\/posts\/33141\/revisions"}],"predecessor-version":[{"id":33149,"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=\/wp\/v2\/posts\/33141\/revisions\/33149"}],"wp:attachment":[{"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=33141"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=33141"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ckms.ca\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=33141"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}