Hello DannyHamilton!
I’m sharing the instructions on how to do it.
You can skip using Whisper if you prefer to upload the audio files manually to Fireflies.
I can try to set this up for you on EC2, but I’ll admit - I haven’t worked much with EC2. If the free tier allows it, I can give it a shot.
I focused on YouTube - this script won’t work with X, but if you’d like, I can try to write something that would work with X.
It won’t work with X because yt-dlp doesn’t currently support livestreams from X, and the platform doesn’t provide direct access to live video streams in a downloadable format.
::How to Transcribe::1. Requirements-> Linux/macOS or Windows with WSL installed
-> Terminal access
-> Internet connection
2. Install required toolsOn Linux (Ubuntu/Debian) or WSL terminal:
sudo apt update
sudo apt install -y ffmpeg python3-pip
pip3 install yt-dlp
pip3 install openai-whisper
On macOS (with Homebrew):
brew install ffmpeg
pip3 install yt-dlp openai-whisper
3. Create the recording & transcription scriptCreate a file named (for example) record_and_transcribe.sh with this content:
#!/bin/bash
# Check if URL argument is given
if [ -z "$1" ]; then
echo "Usage: $0 <YouTube_Live_URL> [duration_in_seconds]"
exit 1
fi
URL=$1
DURATION=${2:-1800} # default 1800 seconds (30 minutes)
OUTPUT="live_audio_$(date +%Y%m%d_%H%M%S).wav"
TRANSCRIPT="transcript_$(date +%Y%m%d_%H%M%S).txt"
# Of course, the names of the OUTPUT and TRANSCRIPT files can be changed - we also can modify the script to accept them as arguments.
echo "Recording audio from: $URL for $DURATION seconds..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT"
echo "Recording finished: $OUTPUT"
echo "Starting transcription with Whisper..."
whisper "$OUTPUT" --model tiny --output_format txt --output_dir .
# Whisper's tiny model is fast and light.
# For better accuracy use larger models (base, small, medium, or large) but they require more resources.
# Rename transcript to consistent filename
mv "${OUTPUT%.*}.txt" "$TRANSCRIPT"
echo "Transcription complete."
echo "Audio file: $OUTPUT"
echo "Transcript file: $TRANSCRIPT"
Make it executable:
chmod +x record_and_transcribe.sh
4. Run the scriptFor example, to record 10 minutes:
./record_and_transcribe.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK 600
5. Result-> A .wav audio file with recorded live stream audio
-> A .txt file with the transcript of the audio
6. Optional notesAdjust the duration_in_seconds parameter
to change the recording length (it
'’s
one of the
argumentssecond argument you provide when running the script
- to change the recording length).
Whisper’s "tiny" model is fast and light; for better accuracy use larger models (base, small, medium, or large) but they require more resources.
For Windows users, use WSL Ubuntu or Git Bash with Linux tools installed.
::How to Summarize the Transcription::If this works for you, feel free to leave a tip
1. Open your terminal.
2. Clone the repository:
git clone https://github.com/ggerganov/llama.cpp
3. Change directory:
cd llama.cpp
4. Build the program:
make
5. Download the Mistral-7B-Instruct model in .gguf format and place it inside llama.cpp/models/mistral/.
For example: you can download it from here: https://huggingface.co/itlwas/Mistral-7B-Instruct-v0.1-Q4_K_M-GGUF/blob/main/mistral-7b-instruct-v0.1-q4_k_m.gguf
This model and version are lightweight and fast, making them perfect for local use without heavy hardware. It is instruction-tuned, so it handles commands like “Summarize this text” very well. The Q4_K_M quantization reduces the model size and speeds up inference with minimal quality loss. The GGUF format is optimized for llama.cpp, making it easy and efficient to run locally.
It strikes a good balance between speed, quality, and usability on typical personal computers.
6. Copy your transcript file (for example, the one generated earlier) into the llama.cpp folder and name it (for example) transcript.txt.
7. Run the summary command and save the output to a file:
./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat transcript.txt)" > summary.txt
(Of course, the output file name can be different.)
8. Open summary.txt to read the summary.
9. Notes & Tips
The script assumes you run it inside the llama.cpp folder where ./main and models/mistral/ reside.
Whisper’s tiny model is fast but less accurate. You can use other Whisper models (base, small, medium, large) by changing the --model flag.
If the transcript is very long, llama.cpp might not handle the entire text in one go (due to token limits). In that case, consider splitting the transcript and summarizing parts separately.
On Windows, use WSL for Linux compatibility.
::Complete Guide: Transcribe & Summarize YouTube Live Stream in One Script::
1. Prerequisites
Operating System: Linux/macOS or Windows with WSL
Terminal with internet access
Installed tools:
ffmpeg
yt-dlp
python3-pip
Python packages: openai-whisper
llama.cpp repository built with make
Mistral-7B-Instruct model downloaded in .gguf format and placed in llama.cpp/models/mistral/
2. Install Required Tools
Linux (Ubuntu/Debian) or WSL:
sudo apt update
sudo apt install -y ffmpeg python3-pip make git build-essential
pip3 install yt-dlp openai-whisper
macOS (with Homebrew):
brew install ffmpeg
pip3 install yt-dlp openai-whisper
3. Download and Prepare llama.cpp and Mistral Model
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
Download the Mistral-7B-Instruct model in .gguf format and place it inside:
llama.cpp/models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf
You can download it from:
https://huggingface.co/itlwas/Mistral-7B-Instruct-v0.1-Q4_K_M-GGUF/blob/main/mistral-7b-instruct-v0.1-q4_k_m.gguf
4. Create the Combined Script
Create a bash script file called record_transcribe_summarize.sh with the following content:
#!/bin/bash
if [ -z "$1" ]; then
echo "Usage: $0 <YouTube_Live_URL> [duration_in_seconds]"
exit 1
fi
URL=$1
DURATION=${2:-1800} # default 30 minutes
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
OUTPUT_AUDIO="live_audio_${TIMESTAMP}.wav"
TRANSCRIPT="transcript_${TIMESTAMP}.txt"
SUMMARY="summary_${TIMESTAMP}.txt"
echo "Recording audio from $URL for $DURATION seconds..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT_AUDIO"
echo "Audio recording finished: $OUTPUT_AUDIO"
echo "Starting transcription with Whisper..."
whisper "$OUTPUT_AUDIO" --model tiny --output_format txt --output_dir .
mv "${OUTPUT_AUDIO%.*}.txt" "$TRANSCRIPT"
echo "Transcription complete: $TRANSCRIPT"
echo "Starting summary generation with llama.cpp and Mistral-7B..."
./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat "$TRANSCRIPT")" > "$SUMMARY"
echo "Summary saved to: $SUMMARY"
Make it executable:
chmod +x record_transcribe_summarize.sh
5. Run the Script
Run the script providing the YouTube Live URL and optionally duration in seconds:
./record_transcribe_summarize.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK 600
This will record 10 minutes of audio from the live stream, transcribe it, then generate a summary.
Output files:
Audio: live_audio_TIMESTAMP.wav
Transcript: transcript_TIMESTAMP.txt
Summary: summary_TIMESTAMP.txt
6. Notes & Tips
The script assumes you run it inside the llama.cpp folder where ./main and models/mistral/ reside.
Whisper’s tiny model is fast but less accurate. You can use other Whisper models (base, small, medium, large) by changing the --model flag.
If the transcript is very long, llama.cpp might not handle the entire text in one go (due to token limits). In that case, consider splitting the transcript and summarizing parts separately.
On Windows, use WSL for Linux compatibility.
::How to Record Without Specifying Duration - Auto Stop When Live Ends::
To record until the YouTube live stream ends automatically without specifying duration, modify the recording command in your script to remove the duration limit.
Replace this line in the script:
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT_AUDIO"
with this:
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -c:a pcm_s16le "$OUTPUT_AUDIO"
What this does:
yt-dlp streams the audio continuously until the live stream ends
ffmpeg records all audio until yt-dlp stops
Recording automatically finishes when the live stream ends (no manual duration needed)
#!/bin/bash
if [ -z "$1" ]; then
echo "Usage: $0 <YouTube_Live_URL>"
exit 1
fi
URL=$1
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
OUTPUT_AUDIO="live_audio_${TIMESTAMP}.wav"
TRANSCRIPT="transcript_${TIMESTAMP}.txt"
SUMMARY="summary_${TIMESTAMP}.txt"
echo "Recording audio from $URL until live ends..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -c:a pcm_s16le "$OUTPUT_AUDIO"
echo "Recording finished: $OUTPUT_AUDIO"
echo "Starting transcription with Whisper..."
whisper "$OUTPUT_AUDIO" --model tiny --output_format txt --output_dir .
mv "${OUTPUT_AUDIO%.*}.txt" "$TRANSCRIPT"
echo "Transcription complete: $TRANSCRIPT"
echo "Starting summary generation with llama.cpp and Mistral-7B..."
./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat "$TRANSCRIPT")" > "$SUMMARY"
echo "Summary saved to: $SUMMARY"
Make sure it’s executable:
chmod +x record_transcribe_summarize.sh
Run it like this (no duration argument needed):
./record_transcribe_summarize.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK
::Cons of not specifying recording duration (auto-stop)::
-> Large files may quickly use up disk space.
-> Recording may stop early if the stream disconnects or buffers.
-> No control over how long you record.
-> Very long files take more time and resources to transcribe and summarize.
-> Whisper and llama.cpp may struggle with very large inputs.

bc1q955fz4agkyt9fy53gznlx99w30xyvl46e9ynndIf this works for you, a tip would make me do a happy dance
(bc1q955fz4agkyt9fy53gznlx99w30xyvl46e9ynnd)
If you want me to set this up on EC2, I can give it a try.
I can customize or extend the script to handle long transcription chunking or automate model downloads if needed.