← Back to Blog

Token Cost Calculator for YouTube Transcripts: Estimate Cost Before You Summarize

February 18, 2026 · Tokens

Try it now: Paste any YouTube URL and get subtitles free

Get Subtitles →

YouTube transcripts are one of the richest sources of free, high-quality text for AI workflows. Researchers summarise lectures, creators repurpose interviews, and developers build RAG pipelines from video content. But before you send a transcript to an OpenAI model, you should know how much it will cost. This guide walks you through estimating the token count and API cost for any YouTube video, step by step.

Step-by-Step Cost Estimation

Estimating the cost of processing a YouTube transcript with OpenAI takes three quick steps:

  1. Download the subtitles. Go to SubtitlesYT, paste the YouTube URL, choose TXT format (to avoid unnecessary timestamp tokens), and download the file.
  2. Count the tokens. Open the SubtitlesYT Token Counter, paste the transcript text, and select the encoding that matches your model (o200k_base for GPT-4o and newer, cl100k_base for GPT-4 and GPT-3.5-turbo). The tool displays the exact token count instantly.
  3. Calculate the cost. Multiply the token count by the per-token price for your chosen model. Use the tables below for quick reference.

Video Length to Token Count

How many tokens does a YouTube transcript contain? It depends on the speaker's pace, but here are practical estimates based on average speaking rates (about 150 words per minute):

Video Length Approx. Words Approx. Tokens (o200k_base)
5 minutes 750 ~800
10 minutes 1,500 ~1,500
30 minutes 4,500 ~4,500
1 hour 9,000 ~9,000
2 hours 18,000 ~18,000

These estimates assume TXT format (no timestamps). If you download SRT or VTT format, add roughly 30-40% more tokens for the timing metadata.

Cost by Model

Here is what it costs to process a one-hour YouTube transcript (~9,000 input tokens) with popular OpenAI models. The output cost assumes a 500-token summary response:

Model Input Cost (9K tokens) Output Cost (500 tokens) Total
GPT-4o $0.0225 $0.0050 $0.0275
GPT-4 Turbo $0.0900 $0.0150 $0.1050
GPT-3.5-turbo $0.0045 $0.0008 $0.0053
o1 $0.1350 $0.0300 $0.1650
o3-mini $0.0099 $0.0022 $0.0121

Key takeaway: processing a one-hour transcript costs less than three cents with GPT-4o and about half a cent with GPT-3.5-turbo. Even a two-hour video stays well under a dollar with any model.

Cost-Saving Tips

Here are practical ways to keep your API bill low when working with YouTube transcripts:

  • Use cheaper models for simple tasks. Summarising a transcript does not require the most powerful model. GPT-3.5-turbo or o3-mini can summarise, extract key points, and answer questions about a transcript at a fraction of the cost of GPT-4o or o1.
  • Download TXT format. Timestamps in SRT and VTT files add 30-40% more tokens without contributing useful information for most AI tasks. Always choose TXT when your goal is summarisation, translation, or analysis.
  • Remove filler words. Auto-generated subtitles often include "um", "uh", "you know", and false starts. A quick find-and-replace pass can trim 10-20% of the token count.
  • Chunk long transcripts. For videos over two hours, split the transcript into 30-minute or 60-minute segments and process each chunk separately. This avoids context-window limits and lets you use smaller, cheaper models. Then ask the model to combine partial summaries into a final overview.
  • Use prompt budgeting. Write your system prompt and instructions first, count those tokens, then calculate how many tokens remain for the transcript. This prevents context-window overflows and wasted API calls.
  • Cache results. If you repeatedly process the same video (e.g., during development), store the model's response locally so you do not pay for the same input twice.

Real-World Example

Let's walk through a concrete scenario. You want to summarise a 45-minute YouTube lecture on machine learning.

Step 1: Download the transcript

Go to SubtitlesYT, paste the video URL, select TXT format, and click Get Subtitles. The download gives you a clean text file — no timestamps, no formatting noise.

Step 2: Count the tokens

Open the Token Counter and paste the transcript. A 45-minute lecture at average speaking speed produces roughly 6,750 words, which tokenises to approximately 6,800 tokens with o200k_base.

Step 3: Calculate the cost

Using GPT-4o at $2.50 per million input tokens:

Input cost:  6,800 tokens x ($2.50 / 1,000,000) = $0.017
Output cost: 500 tokens   x ($10.00 / 1,000,000) = $0.005
Total:       $0.022

That is just over two cents for a detailed summary of a 45-minute lecture. Even if you add a follow-up question (another 500 output tokens), the total stays under five cents.

What if I use a cheaper model?

With GPT-3.5-turbo the same task costs:

Input cost:  6,800 tokens x ($0.50 / 1,000,000) = $0.0034
Output cost: 500 tokens   x ($1.50 / 1,000,000) = $0.00075
Total:       $0.0042

Less than half a cent. For a straightforward summary, GPT-3.5-turbo produces perfectly usable results at a twentieth of the cost of GPT-4o.

Ready to estimate your own costs? Start by downloading your YouTube subtitles, then count tokens with the Token Counter. For a deeper dive into how tokens work, read our OpenAI Token Counter Guide.

Ready to download subtitles? Paste a URL and get started.

Get Subtitles →