Skip to content

thorbjoernl/podwhisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Podcast whisper

A script for running each episode in a podcast rss through Whisper to transcribe the podcast.

Configure it using the .env file. WHISPER_MODEL can be any of the options described here.

Installation

Poetry is required for installation.

  • Clone the repository.
  • poetry install to install dependencies.
  • poetry shell to activate the virtual environments.

Running the program

  • python podwhisper.py [URL]

[URL] is a url to a podcast rss feed. It will produce a *.txt, *.json and a *.md file per episode.

  • The text file is a dump of the generated text without timestamps.
  • The json file is the segment data produced by Whisper formatted as a json.
  • The md file is a simple formatted Markdown file, with timestamps.

Example Output

Here is an example snippet of *.md for an episode of 99 Percent Invisible:

# 99% Invisible - 438- The Real Book [rebroadcast]
**00:00:00** This is 99% Invisible. I'm Roman Mars.

**00:00:06** Since the mid-1970s, almost every jazz musician has owned a copy of the same book. It has

**00:00:12** a peach-colored cover, a chunky 70s-style logo, and black plastic binding. It is delightfully

**00:00:19** homemade-looking, like it was printed by a bunch of teenagers at Kinko's. And inside

**00:00:24** is the sheet music for hundreds of common jazz tunes, also known as jazz standards,

**00:00:31** all meticulously notated by hand. It's called The Real Book.

**00:00:36** When I started playing jazz, I remember the first thing my guitar teacher said was, well,

**00:00:40** you gotta buy a real book.

**00:00:42** That's producer Michael McAvanot.

**00:00:44** Everybody had one. It just felt like something you were expected to own if you were a serious

**00:00:49** musician. My high school jazz teacher, Mr. Leonard, had stacks of real books on his desk.

**00:00:53** And he told me that he actually got his first real book at the place where they were originally

**00:00:58** published, Berkeley College of Music in Boston. He had just arrived for his freshman year.

About

Transcribe podcast rss feeds with OpenAI Whisper.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages