One-click YouTube video workflow: Download, transcribe, summarize and upload to Notion

Interested in a video but short on time to watch the whole thing? No worries, this I’ve get you covered: in this article, I’ll show a way…

One-click YouTube video workflow: Download, transcribe, summarize and upload to Notion

Interested in a video but short on time to watch the whole thing? No worries, this I’ve get you covered: in this article, I’ll show a way to download a YouTube video and summarize it within Notion with just a few clicks.

You’ll need Zapier with Google Drive and ChatGPT integrations. Go to https://zapier.com/app/connections and add the following integrations:

To integrate with Zapier, follow these steps.
To integrate with Zapier, follow the steps here.

Notion is optional. The TXT files can be stored as files on your Drive. The process basically looks like this:

1. Download a YouTube video, 2. Upload the MP3 to Google Drive, 3. Send to the file to ChatGPT, 4. Summarize, 5. Send the summary to Notion

1: Download a YouTube video you want to transcribe and summarize

First, download ytdlp-interface, a Windows graphical interface for yt-dlp, a simple YouTube downloader. Since v1.2, the interface also accepts non-YouTube URLs, so feel free to experiment.

Note: there are dozens of online tools for downloading YouTube videos, but they are riddled with ads and don’t always work as expected.

To use ytdlp-interface, unpack the archive in a new folder at a location of your choice, and run ytdlp-interface.exe.

Download link for the latest version (64 bit): https://github.com/ErrorFlynn/ytdlp-interface/releases/download/v2.14.1/ytdlp-interface.7z

yt-dlp interface showing two video downloads in progress,

Once you run ytdlp-interface, adjust a few options:

  1. Convert audio to MP3 : check this box. For transcription, we won’t need the video, only audio. This will save some time and tokens.
  2. Download folder: Choose a folder on Google Drive that is synced with the cloud. In my case, it’s G:\My Drive\YouTube Downloads. Whenever I download a video, the MP3 file is saved in the folder and sent to Google Drive.
  3. In the Settings menu, check the Automatically remove completed items setting.

How to use ytdlp-interface:

  1. Open the ytdlp-interface program
  2. Paste the link to the YouTube video you want to have transcribed
  3. Press Download
  4. Wait until it completes the task (downloads the video, extracts the MP3 and uploads the MP3 to Google Drive). It usually takes around 10–20 seconds.

2: Set up you Zapier automation

Zapier workflow setup with steps for automating file processing. Step 1 is a Google Drive trigger for ‘New File in Folder,’ with options to select the app, trigger event, and account. Additional steps involve ChatGPT for transcription, summarization, conversation, formatting, and creating a text file in Google Drive.

Step 1: New file in folder (Google Drive)

This step triggers the entire Zapier workflow whenever a new file (our MP3 audio file) is uploaded to a specific folder in Google Drive.

App: Google Drive is used as the app to monitor the folder for new file uploads.

Trigger Event: “New File in Folder,” which means Zapier will start the workflow each time it detects a file has been added to the designated folder.

Account: To integrate Google Drive with Zapier, follow these steps.

Now, adjust settings in the “Configure” section:

Drive: This field specifies which Google Drive account to use. In this case, “My Google Drive” is selected.

Folder: This field designates the specific folder within the Google Drive to monitor. Here, it’s set to “YouTube Downloads,”

Include Deleted Files? Here, it’s set to “Only return non-deleted files,” meaning the workflow will ignore any files that have been deleted in the folder and will only trigger for newly uploaded, active files.

Test the step and preceed to the next step of the workflow by pressing “+”

Step 2: ChatGPT transcription

In Step 2, “Create Transcription,” the fields are configured as follows:

  1. App: This field selects the application to be used for the action. Here, “ChatGPT” is chosen.
  2. Action event: This field specifies what action ChatGPT will perform. In this case, the action is set to “Create Transcription,” indicating that ChatGPT’s Whisper feature will transcribe the content of the newly detected file (MP3 audio file) into text.
  3. Account: Your connected ChatGPT account used to perform this action.

Now, in the “Configure” section:

File: This field specifies the file that will be transcribed. Here, it is connected to the file from Step 1 (“New File in Folder” in Google Drive), ensuring that the newly uploaded file in the designated folder will be used for transcription.

Prompt: This field allows the user to input specific instructions or a prompt for ChatGPT to follow during transcription. Leaving it blank will usually result in a standard transcription, while adding text can guide ChatGPT to, for example, focus on certain aspects or styles of transcription (like summarizing key points as it transcribes).

Response Format: This field specifies the format of the transcription output. It’s set to “Text,” which means the output will be in plain text format. Other formats (if available) might include structured data types, but “Text” is commonly chosen for straightforward transcriptions.

Language of the Audio: Here, the user can specify the language of the audio file to assist ChatGPT in correctly processing the transcription. This field can be left blank if the language is detected automatically, or filled in with the language code (e.g., “en” for English) to enhance accuracy.

Note: Please note that the ChatGPT Whisper model is designed primarily for audio-to-text transcription. However, a key limitation of Whisper is that it does not differentiate between speakers. This means that, in a conversation or multi-speaker audio file, Whisper will transcribe all spoken content into one continuous block of text without attributing specific lines to individual speakers. This is not a proplem for single-speaker videos, but may not produce best results when there are multiple speakers involved (especially when they’re disagreeing on a topic).

Step 2: ChatGPT summary

This step produces a TL;DW summary of the video. It’s a relatively short and rarely exceeds one paragraph.

Step 4: ChatGPT conversation

This is a step I added to get a more nuanced summary of the video. Set it up as follows:

The “Configure” section has the following settings:

  1. User Message: This field contains the prompt or message given to ChatGPT to guide its response. Here, the user prompt is "Write an extended summary of," followed by the output from Step 2 (the transcription content). This message directs ChatGPT to produce a more detailed and elaborative version of the summarized content from Step 3. Feel free to experiment with this prompt to control what ChatGPT does with the transcript.
  2. Model: This specifies the language model used for the task. The selected model here is "gpt-4-mini," indicating a smaller, resource-efficient version of GPT-4, tailored for quicker responses or lighter tasks.
  3. Memory Key: This field is optional and allows the user to set a unique memory key. This can be used to retain contextual information across multiple interactions within the workflow, providing ChatGPT with continuity in responses if needed.
  4. Image: leave blank, as no image is needed for the conversation.
  5. User Name: This field allows for the customization of the conversation style or tone by personalizing it with a user name. It could be filled in to make responses more tailored.

Step 5: Formatter by Zapier

This is a step I added to format the title of the summary that will be used in the resulting Notion card later on.

It basically takes the .mp3 out of the name. Set up the Configure section like this:

Step 6: Google Drive: Create file from text (optional step!)

This step is optional and only needed if you’d rather save the .txt files on Google Drive (and not in Notion).

Set this step up as follows:

Folder: This field designates where the new file will be saved in Google Drive. It’s set to YouTube Downloads, meaning the file will be stored in this specific folder.

File Name: This field sets the name of the file to be created. Here, it’s dynamically generated as Summary of [Output from Step 5]… .txt, where the content of Step 5 (Formatter output) is used to title the file without the .mp3 in the name.

File Content: This field specifies the content that will go into the file. It’s organized as follows:

This structured format ensures the file contains all versions of the processed text — summary, extended summary, and full transcription — in an organized manner.

Convert to Document?: This toggle allows the user to choose whether to convert the file to a Google Docs format. It is set to “False,” meaning the file will be saved as a plain text (.txt) file rather than a Google Docs document.

Step 7: Create a Notion page

In this step, the output the Transcript + Summary are saved as a new structured page within Notion.

In the Configure section:

In Step 7, “Create Page” in Notion, the configuration fields are set up as follows:

Parent Page: This field designates the location in Notion where the new page will be created. Here, “Personal” is selected as the parent page, meaning the new content will be added under this existing section in Notion.

Title: This field specifies the title of the new Notion page. It’s dynamically set to “Summary of [Output from Step 5],” with content drawn from the Formatter step, providing a descriptive and relevant title for the page based on the processed file’s topic.

Content: This field is where the main content for the Notion page is structured. It includes the following sections:

You can use basic Markdown syntax (##, ### etc.) for formatting H1, H2 etc. See guide here. So, the content section could be structured like this:

Icon: This optional field allows an icon to be added to the Notion page, helping visually categorize or personalize it. In this case, no icon has been specified.

Now, make sure the zap runs and forget about it!

How it works

So, once set up, I can copy-paste a YouTube link into ytdlp, and the remaining steps are all automated — a new structured Notion page is generated after the conversion.

So, for the example video:

I get this in Notion:

That’s it

If you liked the automation, buy me a coffee to keep me going, thanks!

Related

The art of the Almost-Made-in-America label

The art of the Almost-Made-in-America label

Major brands across tech, fashion, and home goods are employing creative labeling strategies to downplay the "Made in China" origin of their products, instead emphasizing design and brand heritage from Western countries.