AI-Powered Tutor: Detailed Answers in Text & Audio.
This AI Tutor Chatbot acts as your personal learning assistant, instantly generating detailed answers to your questions. It then uses lifelike text-to-speech technology to read those explanations aloud. This multi-sensory approach caters to different learning styles and enhances understanding.
Open the Integrated Terminal from the top menu: View > Terminal
Create a virtual environment by running this command:
python -m venv venv
Activate the virtual environment by running this command:
.\venv\Scripts\activate
2. Install Required Libraries
Run the following command to install the required dependencies:
pip install flask
pip install gtts
pip install google.generativeai
pip install IPython
3. Core AI and Text-to-Speech Utilities (helper.py)
This file contains two core utility functions that provide essential services for an AI chatbot application:
1. gtts_tts()
This function is the "voice" of the application. It takes any text (ideally the AI's response) and converts it into an audio file (MP3) using Google's Text-to-Speech (gTTS) service. Its purpose is to create a spoken version of the text, enabling the audio playback feature in the application.
defgtts_tts(text, lang='en', output_file="gTTS_output.mp3"): if not textraiseValueError("Text for gTTS cannot be empty.")tts = gTTS(text=text, lang=lang, slow=False)tts.save(output_file)print(f"Audio saved as {output_file}")return output_path
Arguments:
text (str): The text to convert.
lang (str): The language code. Defaults to 'en' (English).
output_file (str): The filename for the audio output. Defaults to "gTTS_output.mp3".
Returns: The path to the generated audio file.
2. ask_gemini()
This function is the "brain" of the operation. It takes a user's question, sends it to Google's Gemini AI model with a specific prompt requesting a concise answer, and returns the generated text response. Its purpose is to intelligently process queries and provide informative, brief answers.
defask_gemini(user_question): import google.generativeai as genai# Configure the Gemini APIgenai.configure(api_key="Paste your key here")# Initialize the Gemini model model = genai.GenerativeModel("gemini-1.5-flash")ifnot user_question:print("No question entered. Please try again.") return None prompt = f"Give me a quick, clear answer (3-4 lines) to this question: {user_question}. Be informative yet brief."try: response = model.generate_content(prompt)returnresponse.text.strip()except Exception as e:print(f"Error generating response with Gemini: {e}")return None
Note: You must obtain your own Gemini API key from Google AI Studio. The key in the code ("Paste your key here") is a placeholder and will not work. You should replace it with your valid key and never expose it publicly.
Arguments:
user_question: The question posed by the user.
Process: The function first constructs a detailed prompt from the user's question and then passes it to the model.generate_content() method to retrieve the response from the Gemini model.
Step 4: Frontend User Interface (index.html)
The index.html file serves as the front-end user interface for this web-based "Study Assistant Chatbot." It provides an interactive chat environment where users can ask questions and receive AI-generated answers in both text and audio form.
Key Components and Their Purpose:
1. Structure & Styling (HTML & CSS):
It creates a familiar chat application layout with a header, a message history area, and an input box at the bottom.
The CSS is designed to be responsive, meaning it will look good on both desktop and mobile devices (as seen in the @media query).
Visual distinctions are made between user messages (aligned right, greenish background) and bot messages (aligned left, light grey background).
2. Core Functionality (JavaScript):
Question Handling: It captures user questions from the input field, sends them to a backend server (via a POST request to /ask), and displays the server's response in the chat.
Audio Features: This is a standout feature. The code can receive an audio file URL from the server, create a custom audio player with play, pause, and replay buttons, and display a progress bar for the spoken answer.
Question Limit: It enforces a limit (15 questions) to manage usage, disabling the input field once the limit is reached.
Step 5: Server-Side Application Logic (app.py)
The app.py file is the backend server for the "Study Assistant Chatbot," built using the Flask framework. It acts as the brain of the operation, handling all the logic that the front-end interface (index.html) cannot.
Key Components and Their Purpose:
API Endpoint (/ask):
It receives questions from the front-end, processes them using an AI model (via helper.ask_gemini()), and returns a structured JSON response containing the answer text and a generated audio file URL.
Text-to-Speech (TTS) Conversion: It uses the gTTS (Google Text-to-Speech) library to convert every AI-generated text response into an audio file (.mp3) for playback on the front-end..
User Session & Rate Limiting: It tracks users (by IP address) and enforces a question limit (15 questions per hour) to prevent abuse and manage server resources.
Resource Management: A key feature is its robust system for cleaning up generated audio files. It automatically deletes files on a schedule (every 5 minutes), when a user's session resets, when the front-end requests it, and when the server shuts down. This prevents the server from running out of disk space.