Automated Quiz Generation from Videsos

Automated Generation of Interactive Quizzes from Video Lectures Using OpenAI’s Whisper and GPT Models

This comprehensive guide walks you through the exact process of converting video lectures into interactive quizzes using OpenAI's technology, complete with detailed instructions for each step.

Tools & Tech Stack

Python 3.8+
Model Open-AI Whisper and GPT 4o
Google Colab (optional for deployment)
Google Drive (optional for deployment)
Libraries langchain, langchain-openai, openai, flask, streamlit, ffmpeg, moviepy, pydub

Folder Structure

cs_agent/
- main.py
- videoplayback.mp4
- requirements.txt

1. Install Required Libraries

This command will install all required dependencies specified in requirements.txt file

pip install -r requirements.txt

Code Explanation:

pip: pip is Python's package installer (Package Installer for Python)
install: install is the pip command to install packages
-r flag: -r tells pip to install from a requirements file rather than a single package
requirements.txt: requirements.txt is a text file containing the list of packages to install with their versions

2. Install the MoviePy Library for Video Editing in Python

To edit videos in Python, you need to install the MoviePy library. Run the following command

pip install moviepy

Code Explanation:

pip: pip is Python's package installer (Package Installer for Python)
install: install is the pip command to install packages

Step 3: Install and Configure FFmpeg

To use MoviePy for video and audio editing in Python, you must install FFmpeg - a powerful multimedia framework that handles encoding, decoding, and processing of media files.

!apt install -y ffmpeg

Code Explanation:

!: - The exclamation mark tells Jupyter/IPython to run the command in the system shell (terminal) instead of Python.
apt: - Used to install, update, and remove software.
install: is the pip command to install packages
-y: - Automatically answers "yes" to all prompts (non-interactive mode).
ffmpeg: -The package name for FFmpeg, the multimedia framework for video/audio processing.

Step 4: Python code to extract audio from a video file

This Python code extracts the audio track from a given video file and saves it as a separate audio file.

from moviepy.editor import VideoFileClipimport os
def convert_video_to_audio(video_path, output_ext="mp3"): 
    # Get the filename without extension    filename, _ = os.path.splitext(video_path)    filename = os.path.basename(filename)    # Load the video file    video = VideoFileClip(video_path)    # Extract audio    audio = video.audio
    # Save audio file    output_path = f"/content/drive/MyDrive/QuizFromVideo/{filename}.{output_ext}"    audio.write_audiofile(output_path)
    # Close the video and audio objects    audio.close()    video.close()    print(f"Successfully converted {video_path} to {output_path}")
# Example usageconvert_video_to_audio("/content/drive/MyDrive/QuizFromVideo/Vector-Database.mp4", "mp3")

Output:

MoviePy - Writing audio in /content/drive/MyDrive/QuizFromVideo/Vector-Database.mp3
MoviePy - Done.
Successfully converted /content/drive/MyDrive/QuizFromVideo/Vector-Database.mp4 to /content/drive/MyDrive/QuizFromVideo/Vector-Database.mp3

Step 5: Imports Python libraries and sets up an environment variable

This code snippet imports several Python libraries and sets up an environment variable for OpenAI's API key.

import osimport pandas as pdimport json as pdfrom  langchain_openai  import ChatOpenAIfrom langchain_core.prompts import PromptTemplatefrom openai import OpenAIimport pandas as pdfrom  moviepy.editor import VideoFileClipfrom pydub as AudioSegmentimport pandas import AudioSegmentos.environ['OPENAI_API_KEY'] = 'Get Open API Key'

Code Explanation:

import os - Used for interacting with the operating system (e.g., file paths, environment variables).
import pandas as pd - Pandas (pd) is used for data manipulation (e.g., working with CSV/Excel files).
import json Used for parsing and generating JSON data.
from langchain_openai import ChatOpenAI - Imports the ChatOpenAI class from LangChain, which allows interaction with OpenAI's chat models (e.g., GPT-3.5, GPT-4).
from langchain_core.prompts import PromptTemplate -Used to create reusable prompt templates for LangChain (e.g., structuring queries for LLMs).
from openai import OpenAI -Official OpenAI Python client for direct API calls (e.g., completions, embeddings).
from moviepy.editor import VideoFileClip -Used for video editing (e.g., extracting audio from videos).
from pydub import AudioSegment -pydub is used for audio manipulation (e.g., converting formats, trimming clips).

Step 6: Open an audio file in binary read mode

This line of code opens an audio file in binary read mode for processing. Here's a detailed breakdown:

audio_file =  open("/content/drive/MyDrive/QuizFromVideo/Vector-Database.mp3" , "rb")

Step 7: Transcribe an Audio File into Text Using OpenAI Whisper

In this step, we use OpenAI's Whisper-1 model to convert an audio file into text. This speech recognition tool works with many languages and handles background noise well. It automatically detects the language and gives you clear text you can use for subtitles, notes, or other applications.

client =  OpenAI()transcript =  client.audio.transcriptions.create(model="whisper-1",file=audio_file)

Step 8: Verify transcription output

Displays the transcribed text to validate the speech recognition results.

 print(transcript.text)

Output:

What is a vector database? Well they say a picture is worth a thousand words, so let's start with one. Now in case you can't tell, this is a picture of a sunset on a mountain vista. Beautiful. Now let's say this is a digital image and we want to store it. We want to put it in to a database and we're going to use a traditional database here called a relational database. Now what can we store in that relational database of this picture? Well we can put the actual picture binary data into our database to start with, so this is the actual image file, but we can also store some other information as well, like some basic metadata about the picture, so that would be things like the file format and the date that it was created, stuff like that, and we can also add some manually added tags to this as well, so we could say let's have tags for sunset and landscape and orange, and that sort of gives us a basic way to be able to retrieve this image, but it it kind of largely misses the image's overall semantic context, like how would you query for images with similar color palettes for example using this information, or images with landscapes of mountains in the background for example. Those concepts aren't really represented very well in these structured fields, and that disconnect between how computers store data and how humans understand it has a name. It's called the semantic gap. Now traditional database queries like select star where color equals orange, it kind of falls short because it doesn't really capture the nuanced multi-dimensional nature of unstructured data. Well that's where vector databases come in by representing data as mathematical vector embeddings, and what vector embeddings are is essentially an array of numbers. Now these vectors, they capture the semantic essence of the data where similar items are positioned close together in vector space, and dissimilar items are positioned far apart, and with vector databases we can perform similarity searches as mathematical operations, looking for vector embeddings that are close to each other, and that kind of translates to finding semantically similar content. Now we can represent all sorts of unstructured data in a vector database. What could we put in here? Well image files of course, like our mountain sunset. We could put in a text file as well, or we could even store audio files as well in here. All this is unstructured data, and these complex objects, they are actually transformed into vector embeddings, and those vector embeddings are then stored in the vector database. So what do these vector embeddings look like? Well I said they're arrays of numbers, and they're arrays of numbers where each position represents some kind of learned feature. So let's take a simplified example. So remember our mountain picture here? Yep, we can represent that as a vector embedding. Now let's say that the vector embedding for the mountain has a first dimension of say 0.91, then let's say the next one is 0.15, and then there's a third dimension of 0.83, and kind of so forth. What does all that mean? Well the 0.91 in the first dimension, that indicates significant elevation changes, because hey this is the mountains. Then 0.15, the second dimension here, that shows few urban elements. Don't see many buildings here, so that's why that score is quite low. 0.83 in the third dimension, that represents strong warm colors like a sunset, and so on. All sorts of other dimensions can be added as well. Now we can compare that to a different picture. What about this one, which is a sunset at the beach? So let's have a look at the vector embeddings for the beach example. So this would also have a series of dimensions. Let's say the first one is 0.12, then we have a 0.08, and then finally we have a 0.89, and then more dimensions to follow. Now notice how there are some similarities here. The third dimension, 0.83 and 0.89, pretty similar. That's because they both have warm colors. They're both pictures of sunsets. But the first dimension, that differs quite a lot here, because a beach has minimal elevation changes compared to the mountains. Now this is a very simplified example. In real machine learning systems, vector embeddings typically contain hundreds or even thousands of dimensions. And I should also say that individual dimensions like this, they rarely correspond to such clearly interpretable features, but you get the idea. And this all brings up the question of how are these vector embeddings actually created? Well the answer is through embedding models that have been trained on massive data sets. So each type of data has its own specialized type of embedding model that we can use. So I'm going to give you some examples of those. For example, clip. You might use clip for images. If you're working with text, you might use GloVe. And if you're working with audio, you might use WAV2VEC. These processes are all kind of pretty similar. Basically you have data that passes through multiple layers. And as it goes through the layers of the embedding model, each layer is extracting progressively more abstract features. So for images, the early layers might detect some pretty basic stuff, like let's say edges. And then as we get to deeper layers, we would recognize more complex stuff, like maybe entire objects. Or perhaps for text, these early layers would figure out the words that we're looking at, individual words. But then later, deeper layers would be able to figure out context and meaning. And how this essentially works is we take the high dimensional vectors from this deeper layer here. And those high dimensional vectors often have hundreds or maybe even thousands of dimensions that capture the essential characteristics of the input. Now we have vector embeddings created, we can perform all sorts of powerful operations that just weren't possible with those traditional relational databases. Things like similarity search, where we can find items that are similar to a query item by finding the closest vectors in the space. But when you have millions of vectors in your database, and those vectors are made up of hundreds or maybe even thousands of dimensions, you can't effectively and efficiently compare your query vector to every single vector in the database. It would just be too slow. So there is a process to do that, and it's called vector indexing. Now this is where vector indexing uses something called approximate nearest neighbor or ANN algorithms. And instead of finding the exact closest match, these algorithms quickly find vectors that are very likely to be among the closest matches. Now there are a bunch of approaches for this. For example, HNSW, that is hierarchical navigable small world, that creates multi-layered graphs connecting similar vectors. And there's also IVF, that's inverted file index, which divides the vector space into clusters and only searches the most relevant of those clusters. These indexing methods, they basically are trading a small amount of accuracy for pretty big improvements in search speed. Now vector databases are a core feature of something called RAG, retrieval augmented generation, where vector databases store chunks of documents and articles and knowledge bases as embeddings. And then when a user asks a question, the system finds the relevant text chunks by comparing vector similarity and feeds those to a large language model to generate responses using the retrieved information. So that's vector databases. They are both a place to store unstructured data and a place to retrieve it quickly and semantically.

Step 9: Creates a ChatOpenAI instanc

Creates a ChatOpenAI instance with OpenAI's advanced GPT-4o model.

llm = ChatOpenAI(temperature = 0 ,model_name = "gpt-4o")

Step 10: Defines instructions

Defines instructions for an AI to summarize a transcript

summarization_template = """
You are a helpful data science assistant. Summarize the following transcript and extract key concepts.
Strictly discuss the concept that is being discussed in the transcript and not the transcript itself.

Transcript: {transcript}
"""

Step 11: Creating a Reusable Prompt Template for AI Summarization

Creates a structured prompt template using LangChain's PromptTemplate class

summarization_prompt = PromptTemplate(
            template=summarization_template,
            input_variables=["transcript"]
            )

Step 12: Transcript Summarization Using Prompt Templates and LLMs

chain = summarization_prompt | llmoutput = chain.invoke({"transcript": transcript})print(output)

Code Explanation:

Pipeline Creation: Combines a predefined prompt template (summarization_prompt) with a language model (llm) using LangChain's pipe operator (|) to form an executable chain.
Execution The chain processes a transcript (passed via {"transcript": transcript}), applying the template's instructions before generating a summary with the LLM.
Output Prints the AI's concept-focused summary, extracted strictly from the transcript content as specified in the prompt.

Output:

content='The transcript discusses the concept of vector databases, which are designed to handle unstructured data by representing it as mathematical vector embeddings. Key concepts include:\n\n1. **Semantic Gap**: Traditional databases struggle to capture the nuanced, multi-dimensional nature of unstructured data, leading to a disconnect between how computers store data and how humans understand it.\n\n2. **Vector Embeddings**: These are arrays of numbers that capture the semantic essence of data. Similar items are positioned close together in vector space, while dissimilar items are far apart. This allows for similarity searches based on semantic content.\n\n3. **Unstructured Data**: Vector databases can store various types of unstructured data, such as images, text, and audio, by transforming them into vector embeddings.\n\n4. **Embedding Models**: These models, trained on large datasets, create vector embeddings. Different models are used for different data types, such as CLIP for images, GloVe for text, and WAV2VEC for audio.\n\n5. **Similarity Search**: Vector databases enable powerful operations like similarity search, where items similar to a query are found by identifying the closest vectors in the space.\n\n6. **Vector Indexing**: To efficiently search through millions of vectors, vector indexing uses approximate nearest neighbor (ANN) algorithms, such as HNSW and IVF, which trade a small amount of accuracy for significant improvements in search speed.\n\n7. **Retrieval Augmented Generation (RAG)**: Vector databases are integral to RAG, where they store document chunks as embeddings. When a user queries, the system retrieves relevant text chunks based on vector similarity and uses them to generate responses with a large language model.\n\nOverall, vector databases provide a way to store and retrieve unstructured data quickly and semantically, overcoming limitations of traditional databases.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 363, 'prompt_tokens': 1565, 'total_tokens': 1928, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_ff25b2783a', 'id': 'chatcmpl-C1w2DZEjh5xN6KYtn9R73jrlw9SJk', 'finish_reason': 'stop', 'logprobs': None} id='run--9f0d9194-3ffd-41a0-aa8f-fdb1819b0727-0' usage_metadata={'input_tokens': 1565, 'output_tokens': 363, 'total_tokens': 1928, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}

Step 13: Structuring LLM Quiz Outputs with Pydantic and JSON Parsing

from  langchain_core.output_parsers  import  JsonOutputParserfrom  langchain_core.pydantic_v1  import  BaseModel, Field

# Define your desired schema for quiz questions
class QuizQuestion(BaseModel):      question:str  = Field(description= "The quiz question.")     options:dict  = Field(description= "Answer options as a dictionary.")     correct_answer:str  = Field(description= "The correct answer (e.g., 'a', 'b').")     explanation:str  = Field(description = "Explanation for why the answer is correct.")
# Set up the parser with the schema
parser = JsonOutputParser(pydantic_object =QuizQuestion)

Code Explanation:

Creates a Pydantic BaseModel to enforce a strict format for quiz questions
Defines 4 required fields with descriptions:
1. question: The quiz question text
2. options: Multiple-choice answers as a dictionary (e.g., {'a':'Paris','b':'London'})
3. correct_answer: Key of the right option (e.g., 'a')
4. explanation: Reasoning behind the correct answer
JSON Parser Setup:
Initializes a JsonOutputParser configured with the QuizQuestion schema.
Will automatically
1. Validate LLM outputs against the schema
2. Convert valid responses to JSON format
3. Reject malformed quiz questions

Step 14: Generating Format Instructions for Structured LLM Outputs

Returns a text prompt explaining exactly how the LLM should format its output

parser.get_format_instructions()

Step 15: Automated Quiz Generation Pipeline with Structured LLM Outputs

This code creates an automated quiz generation pipeline using LangChain.

summarization_prompt = PromptTemplate(
            template = "You are a helpful data science and engineering expert tasked with creating tough quizzes on the following topic:\n{summary}\n, create a set of 10 quiz questions in the following format:\n{format_instructions}\n",,
            input_variables=["summary"], 
            partial_variables={"format_instructions": parser.get_format_instructions()}
            )# Chain the prompt, model, and parser togetherchain = prompt | llm | parserquestions = chain.invoke({"summary": output.content})print(questions)

Code Explanation:

This code creates an AI quiz generator that:
1. Takes a topic summary and generates 10 tough quiz questions,
2. Ensures proper formatting using provided rules,
3. Outputs ready-to-use questions with answers and explanations.

Output:

[{'question': 'What is the primary challenge that vector databases address compared to traditional databases?', 'options': {'a': 'Handling large volumes of structured data', 'b': 'Capturing the nuanced, multi-dimensional nature of unstructured data', 'c': 'Improving transaction processing speed', 'd': 'Reducing storage costs'}, 'correct_answer': 'b', 'explanation': 'Vector databases are designed to handle unstructured data by representing it as vector embeddings, which capture the semantic essence of the data, addressing the semantic gap that traditional databases struggle with.'}, {'question': 'What are vector embeddings?', 'options': {'a': 'Arrays of numbers that capture the semantic essence of data', 'b': 'A type of database index', 'c': 'A method for compressing data', 'd': 'A way to encrypt data'}, 'correct_answer': 'a', 'explanation': 'Vector embeddings are arrays of numbers that represent the semantic content of data, allowing similar items to be positioned close together in vector space.'}, {'question': 'Which of the following is NOT an example of an embedding model?', 'options': {'a': 'CLIP', 'b': 'GloVe', 'c': 'WAV2VEC', 'd': 'SQL'}, 'correct_answer': 'd', 'explanation': 'SQL is a language for managing and querying structured data in databases, not an embedding model. CLIP, GloVe, and WAV2VEC are examples of embedding models for images, text, and audio, respectively.'}, {'question': 'How do vector databases perform similarity searches?', 'options': {'a': 'By using SQL queries', 'b': 'By comparing the size of data files', 'c': 'By locating the closest vectors in space', 'd': 'By checking data redundancy'}, 'correct_answer': 'c', 'explanation': 'Vector databases perform similarity searches by finding items similar to a query through locating the closest vectors in the vector space.'}, {'question': 'What is the role of vector indexing in vector databases?', 'options': {'a': 'To compress data for storage', 'b': 'To encrypt data for security', 'c': 'To efficiently search large databases using ANN algorithms', 'd': 'To convert structured data into unstructured data'}, 'correct_answer': 'c', 'explanation': 'Vector indexing uses approximate nearest neighbor (ANN) algorithms to efficiently search large databases and quickly find vectors likely to be among the closest matches.'}, {'question': 'What is Retrieval Augmented Generation (RAG) in the context of vector databases?', 'options': {'a': 'A method for compressing vector data', 'b': 'A technique for generating random data', 'c': 'A process where document chunks are stored as embeddings and retrieved based on vector similarity', 'd': 'A way to encrypt vector data'}, 'correct_answer': 'c', 'explanation': 'RAG involves storing document chunks as embeddings in vector databases. When a user queries, the system retrieves relevant text chunks based on vector similarity and uses them to generate responses with a large language model.'}, {'question': 'Which of the following types of data can be stored in vector databases?', 'options': {'a': 'Only text data', 'b': 'Only image data', 'c': 'Only audio data', 'd': 'Various types of unstructured data, such as images, text, and audio'}, 'correct_answer': 'd', 'explanation': 'Vector databases can store various types of unstructured data, including images, text, and audio, by transforming them into vector embeddings.'}, {'question': 'What is the semantic gap in the context of databases?', 'options': {'a': 'The difference between data storage costs and retrieval costs', 'b': 'The disconnect between how computers store data and how humans understand it', 'c': 'The gap between structured and unstructured data', 'd': 'The difference in data processing speeds'}, 'correct_answer': 'b', 'explanation': 'The semantic gap refers to the disconnect between how computers store data and how humans understand it, particularly in capturing the nuanced, multi-dimensional nature of unstructured data.'}, {'question': 'Which algorithm is commonly used for vector indexing in vector databases?', 'options': {'a': 'Bubble Sort', 'b': 'HNSW', 'c': "Dijkstra's Algorithm", 'd': 'Quick Sort'}, 'correct_answer': 'b', 'explanation': 'HNSW (Hierarchical Navigable Small World) is a commonly used algorithm for vector indexing in vector databases, enabling efficient approximate nearest neighbor searches.'}, {'question': 'Why are vector databases considered powerful for handling unstructured data?', 'options': {'a': 'They use traditional indexing methods', 'b': 'They store data in a compressed format', 'c': 'They represent data as vector embeddings, capturing semantic content', 'd': 'They are cheaper to maintain than traditional databases'}, 'correct_answer': 'c', 'explanation': 'Vector databases are powerful for handling unstructured data because they represent data as vector embeddings, which capture the semantic content and allow for efficient similarity searches.'}]

Automated Quiz Generation

from Videos

Tools & Tech Stack

Folder Structure

1. Install Required Libraries

2. Install the MoviePy Library for Video Editing in Python

Step 3: Install and Configure FFmpeg

Step 4: Python code to extract audio from a video file

Step 5: Imports Python libraries and sets up an environment variable

Step 6: Open an audio file in binary read mode

Step 7: Transcribe an Audio File into Text Using OpenAI Whisper

Step 8: Verify transcription output

Step 9: Creates a ChatOpenAI instanc

Step 10: Defines instructions

Step 11: Creating a Reusable Prompt Template for AI Summarization

Step 12: Transcript Summarization Using Prompt Templates and LLMs

Step 13: Structuring LLM Quiz Outputs with Pydantic and JSON Parsing

Step 14: Generating Format Instructions for Structured LLM Outputs

Step 15: Automated Quiz Generation Pipeline with Structured LLM Outputs