Video Summaries & Timestamps Guide


Learn how to split videos into parts, add timestamps, and create short summaries for each section

This step-by-step guide shows you how to break down videos into segments, add timestamps, and generate concise summaries for each section. It simplifies learning, organization, and quick access to key video content.

Tools & Tech Stack

Folder Structure

1. Install Required Libraries

This command will install all required dependencies specified in requirements.txt file

pip install -r requirements.txt

Code Explanation:

The requirements.txt file contains the following libraries and dependencies for this project. You can also install them individually

pip install opencv-python==4.10.0.84
pip install pillow==11.1.0
pip install transformers==4.48.0
pip install whisper-timestamped==1.15.8
pip install openai==1.59.7
pip install ffmpeg==1.4

Step 2: Generate transcription if not available.

from src.utils import VideoFileClip
import os

def main(input_video):
    # Main function to process video and transcription
    transcription_json = "filtered_transcription.json"
    final_output_path = "chaptered_transcription.json"
    # Generate Transcription if Not Present
    if not os.path.exists(transcription_json):
        print(f"Transcription file '{transcription_json}' not found. Generating...")
        transcript = main_transcription(input_video, transcription_json)
        # The main_transcription function is located in src/utils.py and will be explained in Step 4
    else:
        print(f"Transcription file '{transcription_json}' already exists. Skipping transcription generation.")

if __name__ == "__main__":
    # Specify input video path
    input_video = "input/2_Modular_Code_for_Training.mp4"
    # Run the main process
    main(input_video)

Output:

(.venv) PS E:\video analyzer> python engine_openai.py
Transcription file 'filtered_transcription.json' not found. Generating...
100%|████████████████████████████████████████████████████████████████| 90506/90506 [01:59<00:00, 755.27frames/s]
Filtered transcription completed and saved to filtered_transcription.json
(.venv) PS E:\video analyzer>

Step 3: Import required libraries and functions

import json
import cv2
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import whisper_timestamped as whisper
import ssl
from openai import OpenAI
import logging
import time
import os

Code Explanation:

Step 4: main_transcription function

def main_transcription(input_video_path, output_json_path):
    # Generates filtered transcription from a video
    audio = load_audio(input_video_path)
    model = whisper.load_model("tiny", "cpu")
    transcription = transcribe_audio(model, audio)
    filtered_result = filter_transcription(transcription)
    save_transcription(output_json_path, filtered_result)
    print(f"Filtered transcription completed and saved to {output_json_path}")

Output:

Filtered transcription completed and saved to filtered_transcription.json

Code Explanation:

Step 4: main_transcription function

def main_transcription(input_video_path, output_json_path):
    # Generates filtered transcription from a video
    audio = load_audio(input_video_path)
    model = whisper.load_model("tiny", "cpu")
    transcription = transcribe_audio(model, audio)
    filtered_result = filter_transcription(transcription)
    save_transcription(output_json_path, filtered_result)
    print(f"Filtered transcription completed and saved to {output_json_path}")

Output:

Filtered transcription completed and saved to filtered_transcription.json

Code Explanation: