This step-by-step guide shows you how to break down videos into segments, add timestamps, and generate concise summaries for each section. It simplifies learning, organization, and quick access to key video content.
This step-by-step guide shows you how to break down videos into segments, add timestamps, and generate concise summaries for each section. It simplifies learning, organization, and quick access to key video content.
This command will install all required dependencies specified in requirements.txt file
pip install -r requirements.txt
Code Explanation:
pip
is Python's package installer (Package Installer for Python)
install
is the pip command to install packages
-r
tells pip to install from a requirements file rather than a single package
requirements.txt
is a text file containing the list of packages to install with their versions
The requirements.txt file contains the following libraries and dependencies for this project. You can also install them individually
pip install opencv-python==4.10.0.84
OpenCV-Python (v4.10.0.84)
is a computer vision library for video processing, image analysis, and object detection. It enables tasks like frame extraction, facial recognition, and real-time video analysis.
pip install pillow==11.1.0
Pillow 11.1.0
Python Imaging Library (Fork) for image processing (open, manipulate, save various formats).
pip install transformers==4.48.0
transformers 4.48.0
Hugging Face's NLP library providing state-of-the-art models (BERT, GPT, Whisper) for text, audio, and vision tasks.
pip install whisper-timestamped==1.15.8
whisper-timestamped 1.15.8
Python library for speech recognition with word-level timestamps using OpenAI's Whisper model. Enables precise audio transcription and alignment.
pip install openai==1.59.7
OpenAI 1.59.7
Python client library for accessing OpenAI's AI models (GPT, DALL·E, Whisper). Enables text generation, embeddings, and AI-powered content analysis.
pip install ffmpeg==1.4
FFmpeg 1.4
Multimedia framework for video/audio processing, conversion, and streaming. Supports all major formats.
from src.utils import VideoFileClip
import os
def main(input_video):
# Main function to process video and transcription
transcription_json = "filtered_transcription.json"
final_output_path = "chaptered_transcription.json"
# Generate Transcription if Not Present
if not os.path.exists(transcription_json):
print(f"Transcription file '{transcription_json}' not found. Generating...")
transcript = main_transcription(input_video, transcription_json)
# The main_transcription function is located in src/utils.py and will be explained in Step 4
else:
print(f"Transcription file '{transcription_json}' already exists. Skipping transcription generation.")
if __name__ == "__main__":
# Specify input video path
input_video = "input/2_Modular_Code_for_Training.mp4"
# Run the main process
main(input_video)
Output:
(.venv) PS E:\video analyzer> python engine_openai.py
Transcription file 'filtered_transcription.json' not found. Generating...
100%|████████████████████████████████████████████████████████████████| 90506/90506 [01:59<00:00, 755.27frames/s]
Filtered transcription completed and saved to filtered_transcription.json
(.venv) PS E:\video analyzer>
import json
import cv2
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import whisper_timestamped as whisper
import ssl
from openai import OpenAI
import logging
import time
import os
Code Explanation:
def main_transcription(input_video_path, output_json_path):
# Generates filtered transcription from a video
audio = load_audio(input_video_path)
model = whisper.load_model("tiny", "cpu")
transcription = transcribe_audio(model, audio)
filtered_result = filter_transcription(transcription)
save_transcription(output_json_path, filtered_result)
print(f"Filtered transcription completed and saved to {output_json_path}")
Output:
Filtered transcription completed and saved to filtered_transcription.json
Code Explanation:
def main_transcription(input_video_path, output_json_path):
# Generates filtered transcription from a video
audio = load_audio(input_video_path)
model = whisper.load_model("tiny", "cpu")
transcription = transcribe_audio(model, audio)
filtered_result = filter_transcription(transcription)
save_transcription(output_json_path, filtered_result)
print(f"Filtered transcription completed and saved to {output_json_path}")
Output:
Filtered transcription completed and saved to filtered_transcription.json
Code Explanation: