Build an AI Picture Book Generator with Stable Diffusion Models
AIM of the Project
The aim of this project is to build an AI-powered children's picture book generation system that can:
Create captivating and moral-driven children's stories that align with various Themes, Settings, Characters and Narrative Tones using advanced natural language generation models.
Generate visually consistent and high-quality illustrations for each part of the story using Stable Diffusion models.
By leveraging advanced AI models like Google Gemini and image-generation models such as Stable Diffusion, this system automates story generation and illustration design to produce interactive and imaginative picture books. The goal is to empower publishers, educators, and parents to create high-quality, customizable content for children that is both immersive and impactful.
What is AI-Powered Picture Book Generation?
AI-powered picture book generation uses artificial intelligence to create engaging, interactive children's stories along with visual prompts for illustrations. By integrating AI with text-to-image models like Stable Diffusion, this technology can generate captivating narratives and corresponding images, making it easier to produce personalized picture books that children can enjoy.
In this project, the Gemini model is utilized to craft unique stories with themes like friendship, adventure, kindness, and bravery. These stories are divided into parts with each having a visual prompt for image generation, transforming the narrative into a visual experience. The generated images are based on the protagonist's description and key moments in the story, giving each part a vibrant, illustrated look.
Setup for IPYNB
For the best experience, please stay connected to the internet while executing this Project
Click the "File" menu in the new notebook and choose "Upload notebook."
Select the IPYNB file you want to upload.
Once the file is uploaded, click on the "Runtime" menu and choose "Run all" to execute all the cells in the notebook.
The Default version of Python that Colab uses currently is 3.10
Important Libraries
torchThe core library of PyTorch, a deep learning framework used for high-performance tensor computation and model execution. In this project, it enables efficient computation for the Stable Diffusion pipelines. Refer to the documentation for more information.
diffusersProvides tools for implementing and fine-tuning diffusion models like Stable Diffusion. It is used here for generating visually captivating images for the story based on scene prompts and protagonist descriptions. Refer to the documentation for more information.
matplotlibA versatile library for data visualization in Python. In this project, it is used to display generated images during the execution of the code. Refer to the documentation for more information.
PIL (Python Imaging Library): Used for image processing tasks like saving and displaying generated images. This ensures the generated illustrations are preserved in high-quality formats for use in picture books. Refer to the documentation for more information.
google.generativeai: Used to interact with Google's Generative AI models for creative storytelling. It helps generate detailed stories, protagonist descriptions, and scene prompts with imaginative and child-friendly content. Refer to the documentation for more information.
Important Libraries
torchThe core library of PyTorch, a deep learning framework used for high-performance tensor computation and model execution. In this project, it enables efficient computation for the Stable Diffusion pipelines. Refer to the documentation for more information.
diffusersProvides tools for implementing and fine-tuning diffusion models like Stable Diffusion. It is used here for generating visually captivating images for the story based on scene prompts and protagonist descriptions. Refer to the documentation for more information.
matplotlibA versatile library for data visualization in Python. In this project, it is used to display generated images during the execution of the code. Refer to the documentation for more information.
PIL (Python Imaging Library): Used for image processing tasks like saving and displaying generated images. This ensures the generated illustrations are preserved in high-quality formats for use in picture books. Refer to the documentation for more information.
google.generativeai: Used to interact with Google's Generative AI models for creative storytelling. It helps generate detailed stories, protagonist descriptions, and scene prompts with imaginative and child-friendly content. Refer to the documentation for more information.
Obtain API keys for any external services (e.g., Google Generative AI).
Implement GEMINI Functionalities: Run the generate_story() function to implement the functionality of using GEMINI to generate stories by randomly selecting the themes and settings and generate prompts for each of the scenes.
Implement the Stable Diffusion Functionalities: Run the generate_images_from_gemini() function that will generate images of the protagonist and the scenes based on the prompts that were generated by gemini in the previous step.
Execute the entire pipeline: Run the final main function that combines the functionalities of the generate_story() function and the generate_images_from_gemini() function and displays the final story, prompts for the scenes and the picture book generated.
IMPLEMENTATION
Integrating Google Gemini for Story Generation
The Google Gemini model is a state-of-the-art generative AI tool designed to handle tasks like natural language generation and storytelling with remarkable creativity and coherence. In this project, Gemini is used to generate engaging narratives for picture books, including character descriptions, plot outlines, and scene prompts. By leveraging its advanced capabilities, we can ensure high-quality, imaginative content tailored for children's stories.
Next, click on the Create an API Key Icon and give it a name and copy the generated API key safely for future use.
Gemini API Setup
importgoogle.generativeaiasgenaiimportrandom# Configure the Gemini APIapi_key="Your Gemini API Key"genai.configure(api_key=api_key)# Initialize the Gemini modelmodel=genai.GenerativeModel("gemini-2.5-pro")
Code Breakdown
API Key: The api_key used in this function should be a valid API key associated with your Google Cloud project. Ensure that you have set up the key for using Google Gemini.
Model Version: The "gemini-1.5-flash" model is a specific version of the Gemini model. You can use a different version of the models and experiment with them from the available models in the API key.
Generate Your Picture Book Story and Visual Prompts
Introduction
Welcome to the "Generate Story" function!
This block of code brings creativity to life by crafting a full-fledged children's story, a detailed protagonist description, and visually engaging prompts for each scene. By leveraging Google Gemini, this function ensures that the generated story and visuals are rich in imagination and perfectly tailored for picture books.
This function is designed to:
Randomly select a theme and setting for the story.
Generate a story with a clear moral or lesson, divided into 5 logical parts.
Provide a protagonist description to maintain character consistency across scenes.
Create scene prompts for each story part, compatible with Stable Diffusion for generating high-quality illustrations.
How It Works?
Random Theme and Setting: A random selection is made from pre-defined lists of themes (e.g., friendship, bravery) and settings (e.g., forest, ocean).
Google Gemini Prompting: A detailed prompt is sent to the Gemini API to generate the story content.
Story Structure:
A beginning, middle, and end with a clear narrative.
A protagonist with distinct physical and personality traits.
Logical divisions into 5 parts, each tied to a specific scene.
Scene Prompts: Highly descriptive visual prompts for use with Stable Diffusion to create illustrations.
Key Parameters
Theme: The central idea or moral of the story, chosen randomly. Examples: kindness, adventure, honesty.
Setting: The backdrop for the story, chosen randomly. Examples: castle, jungle, village.
Protagonist Description: A standalone description of the main character, including physical appearance and personality traits.
Story Structure: The story is divided into 5 parts, each highlighting a key moment.
Generate Story Function
importjson# Import json for parsingdefgenerate_story():""" Generates a children's story, protagonist description, and scene prompts using Gemini. Returns: dict: A dictionary containing the story, protagonist description, and scene prompts. """# Randomly select a theme and settingtheme=random.choice(["friendship","adventure","kindness","bravery","overcoming fears","honesty"])setting=random.choice(["forest","castle","ocean","mountain","desert","jungle","village","busy city"])gemini_prompt=f"""You are a creative storyteller tasked with generating simple, engaging children's stories that can be illustrated as picture books using Stable Diffusion for image generation. Each story must resemble timeless classics like "The Tortoise and the Hare," with a strong narrative and visually clear prompts. Follow these guidelines carefully:1. **Story Generation:** - Create a captivating children's story based on the theme "{theme}" and set in "{setting}". - Use simple, kid-friendly language and imaginative yet relatable scenarios. - Ensure the story has: - One main protagonist with consistent traits throughout. - A clear challenge, conflict, or adventure the protagonist embarks on and resolves. - A moral or lesson that children can learn from. - Keep the story concise and divide it into 5 logical parts.2. **Protagonist Description:** - Write a detailed, standalone description of the protagonist that remains consistent across all scenes: - Physical traits: size, age, clothing, and distinctive features. - Personality traits: brave, kind, curious, or determined. - Any unique items or accessories they carry that appear in every scene.3. **Story Division into 5 Parts:** - Divide the story into exactly 5 sequential moments: - Each part should represent a distinct event in the narrative, progressing logically toward the resolution. - Clearly establish the setting, protagonist's actions, and emotions in each part.4. **Scene Prompt Generation for Stable Diffusion:** - For each of the 5 parts, write a **highly descriptive visual prompt** that Stable Diffusion can use to generate images. - Ensure prompts are: - Clear, specific, and unambiguous, avoiding abstract or overly complex descriptions. - Focused on describing the protagonist's appearance, actions, surroundings, and emotions. - Consistent with the protagonist's traits and accessories described earlier. - Example of a well-crafted prompt: - "A small tortoise with a green shell and a bright red scarf stands proudly on a dirt path in a sunny forest, with trees and flowers around, while a confident hare smirks nearby."5. **Output the following as a JSON object.** The JSON object should have the following keys: - 'theme': The chosen theme. - 'setting': The chosen setting. - 'protagonist_description': A single string containing the detailed protagonist description. - 'story_text': A single string containing the complete story divided into parts, with each part clearly separated (e.g., using newlines). - 'scene_prompts': A list of 5 strings, where each string is a clear visual prompt for a scene.Example JSON output:```json{{ "theme": "friendship", "setting": "forest", "protagonist_description": "A curious squirrel named Sammy, with a bushy tail and a small red scarf, who is adventurous and helpful.", "story_text": "Part 1: Once upon a time...\\nPart 2: Sammy met a shy rabbit...\\nPart 3: They went on an adventure...\\nPart 4: They faced a challenge...\\nPart 5: They learned about friendship...", "scene_prompts": [ "A small squirrel with a red scarf explores a sunny forest, surrounded by towering trees and colorful flowers.", "The squirrel meets a shy rabbit near a sparkling stream, with sunlight reflecting off the water.", "Sammy and the rabbit discover a hidden cave, illuminated by glowing crystals.", "They work together to cross a treacherous river, building a bridge from fallen branches.", "Sammy and the rabbit share a celebratory nut, their bond of friendship stronger than ever, under a starlit sky." ]}}```"""# Generate the response using the Gemini modelresponse=model.generate_content(gemini_prompt)ifnotresponseornotresponse.text.strip():return{"error":"Gemini returned an empty or invalid response.","theme":theme,"setting":setting}# Initialize gemini_output with theme and settinggemini_output={"theme":theme,"setting":setting}try:# Extract pure JSON string from the response textjson_start=response.text.find('```json')json_end=response.text.rfind('```')ifjson_start!=-1andjson_end!=-1:json_string=response.text[json_start+7:json_end].strip()else:json_string=response.text.strip()# Attempt to parse the extracted JSON stringparsed_output=json.loads(json_string)gemini_output["protagonist_description"]=parsed_output.get("protagonist_description")gemini_output["story_text"]=parsed_output.get("story_text")gemini_output["scene_prompts"]=parsed_output.get("scene_prompts",[])exceptjson.JSONDecodeErrorase:return{"error":f"Failed to parse Gemini response as JSON: {e} - Response: {response.text}","theme":theme,"setting":setting}returngemini_output
Before Running the Code....
Dependencies: Ensure the required libraries (random and google.generativeai) are installed and properly configured.
API Key: Provide the necessary API key to authenticate with Google Gemini.
Example Output
{
"theme": "friendship",
"setting": "forest",
"protagonist_description": "A curious squirrel named Sammy, with a bushy tail and a small red scarf, who is adventurous and helpful.",
"story_text": "Once upon a time...",
"scene_prompts": [
"A small squirrel with a red scarf explores a sunny forest, surrounded by towering trees and colorful flowers.",
"The squirrel meets a shy rabbit near a sparkling stream, with sunlight reflecting off the water.",
...
]
}
Creating Stunning Illustrations from Story Prompts with Stable Diffusion
Introduction
This Python function, generate_images_from_gemini_output, leverages pre-trained Stable Diffusion model to generate a sequence of images based on the thematic and descriptive prompts provided by the gemini_output.
Purpose of the code
The function generates:
Protagonist Image: Based on the description of a protagonist.
Scene Images: Based on the scene prompts, starting with the protagonist's image and progressively transforming it.
It uses Stable Diffusion, a popular deep learning model for text-to-image and image-to-image generation.
Code Components
Function:generate_images_from_gemini_output
This function is responsible for generating images based on prompts.
Parameters
gemini_output (dict): A dictionary containing:
protagonist_description: Text describing the main character.
scene_prompts: List of textual descriptions for each scene.
model_name (str, optional): Name of the pre-trained model. Defaults to "sd-legacy/stable-diffusion-v1-5".
device (str, optional): Device to run the model, e.g., "cuda" for GPUs or "cpu" for CPUs. Defaults to "cuda".
strength (float, optional): Controls the degree of transformation for image-to-image generation. Ranges from 0.0 (no transformation) to 1.0 (maximum transformation). Defaults to 0.8.
Returns
A list of generated images (list of PIL.Image.Image objects).
Helper Function:
display_and_save_image : Displays the generated image using matplotlib and saves it to a file.
Workflow
Import Necessary Libraries
diffusers: Provides Stable Diffusion pipelines for text-to-image and image-to-image generation.
torch: For managing model weights and running computations on the specified device.
gc: For explicit garbage collection to optimize memory usage.
Initialize Pipelines
Text-to-Image Pipeline (StableDiffusionPipeline): Converts text descriptions into images.
Image-to-Image Pipeline (StableDiffusionImg2ImgPipeline): Modifies images based on new text prompts.
Both pipelines are loaded with the specified model_name and cast to float16 for efficient memory usage.
Input Validation
Checks if protagonist_description and scene_prompts are present in gemini_output. Raises errors if missing.
Generate Images
Protagonist Image:
The protagonist description is fed into the text-to-image pipeline.
The result is stored and displayed.
Scene Images:
Iterates through scene_prompts, applying the image-to-image pipeline to the protagonist's image using the specified strength.
Each result is stored, displayed, and saved.
Memory Management
Explicitly clears GPU memory using torch.cuda.empty_cache() and Python garbage collection (gc.collect()).
Save and Display Images
Uses the helper function display_and_save_image to visualize and store images in .png format.
Customization Options
Hyperparameters
strength: Adjust transformation intensity for image-to-image generation.
guidance_scale: Tune the balance between following the text prompt strictly or being creative (default values: 8 for protagonist image, 9 for scene images).
Pipeline and Device
model_name: Replace with other pre-trained models for different styles or quality.
device: Use "cpu" if a GPU is unavailable, though it will be slower.
Prompts
Modify the protagonist_description or scene_prompts in gemini_output for different narratives or themes.
File Saving
Adjust file paths and formats in display_and_save_image.
Key Considerations
Memory Management: Stable Diffusion models require significant memory, especially on GPUs. Explicit memory clearing (gc.collect()) is essential.
Input Validation: Ensures required fields are provided, avoiding runtime errors.
Dependencies:
diffusers: Install using pip install diffusers.
torch: Install using pip install torch.
Illustrations Generation using stable-diffusion-v1-5
defgenerate_images_from_gemini_output(gemini_output,model_name="sd-legacy/stable-diffusion-v1-5",device="cuda",strength=0.8):""" Generates a sequence of images based on the output from the Gemini function. Args: gemini_output (dict): Dictionary containing the theme, setting, protagonist description, and scene prompts. model_name (str): Pretrained model name. device (str): Device to run the model on (e.g., "cuda" or "cpu"). strength (float): Strength of transformation for image-to-image generation (0.0 to 1.0). Returns: list: List of generated images. """fromdiffusersimportStableDiffusionPipeline,StableDiffusionImg2ImgPipelineimporttorchimportgc# Initialize text-to-image pipelinepipe=StableDiffusionPipeline.from_pretrained(model_name,torch_dtype=torch.float16).to(device)# Initialize image-to-image pipelineimg2img_pipe=StableDiffusionImg2ImgPipeline.from_pretrained(model_name,torch_dtype=torch.float16).to(device)protagonist_description=gemini_output.get("protagonist_description","")scene_prompts=gemini_output.get("scene_prompts",[])ifnotprotagonist_description:raiseValueError("Protagonist description is missing in the Gemini output.")ifnotscene_prompts:raiseValueError("Scene prompts are missing in the Gemini output.")generated_images=[]# Step 1: Generate protagonist imageprint(f"Generating protagonist image with prompt: {protagonist_description}")protagonist_image=pipe(prompt=protagonist_description,guidance_scale=8).images[0]generated_images.append(protagonist_image)display_and_save_image(protagonist_image,"protagonist_image.png")# Free memorytorch.cuda.empty_cache()gc.collect()# Step 2: Generate scene imagesfori,promptinenumerate(scene_prompts):print(f"Generating scene image {i + 1} with prompt: {prompt}")current_image=img2img_pipe(prompt=prompt,image=protagonist_image,strength=strength,guidance_scale=9).images[0]generated_images.append(current_image)display_and_save_image(current_image,f"scene_image_{i + 1}.png")# Free memorytorch.cuda.empty_cache()gc.collect()returngenerated_imagesdefdisplay_and_save_image(image,filename):""" Displays the generated image and saves it to a file. Args: image (PIL.Image.Image): Image to be displayed and saved. filename (str): Path to save the image. """importmatplotlib.pyplotaspltplt.figure(figsize=(6,6))plt.imshow(image)plt.axis("off")# Turn off axisplt.show()image.save(filename)print(f"Image saved as '{filename}'")
Main Functionality for Story Generation and Image Creation
Purpose
This block of code generates a story and corresponding images based on its content. The generated story includes a randomly selected theme and setting, as well as details about the protagonist, story text, and scene prompts.
Code Components
Entry Point Check
if__name__=="__main__":
This condition ensures that the following block of code is executed only when the script is run directly (not when it's imported as a module in another script).
Story Generation
gemini_output=generate_story()
generate_story() Function: Randomly generates a story with attributes such as Theme, Setting, Story Text, Protagonist Description, Scene Prompts, Output
Story Details Printing
Outputs the generated story details to the console.
Key Components:
gemini_output['theme']: Displays the story's theme.
gemini_output['setting']: Displays the story's setting.
gemini_output['story_text']: Prints the main story content.
gemini_output['protagonist_description']: Prints details about the protagonist.
gemini_output['scene_prompts']: Iterates over scene prompts and prints each with a corresponding part number.
Purpose: Creates visual representations (images) based on the details in gemini_output.
Input: The story dictionary (gemini_output) containing the theme, setting, and prompts.
Output: A list or dictionary of generated images corresponding to different parts of the story.
Expected Generated Images
Below are the images generated by our AI Picture Book Generator using Stable Diffusion models. Each image was created based on specific prompts that describe scenes from the story about Pip, a brave mountain goat kid on an adventure.
protagonist_image.png
Prompt:
Pip is a small, fluffy white mountain goat kid with big, curious brown eyes and tiny, curved horns just beginning to sprout. He is determined and brave, always ready for an adventure. Around his neck, he always wears a bright blue, hand-knitted bell that jingles softly with every step he takes.
scene_image_1.png
Prompt:
A small, fluffy white mountain goat kid with a bright blue bell around its neck stands at the base of a huge, rocky mountain, looking up with determination at the sunlit peak. The scene is a cheerful, sunny day with green grass and wildflowers at the bottom.
scene_image_2.png
Prompt:
The small, fluffy white mountain goat kid with a blue bell carefully climbs a steep, rocky path on the side of a mountain. He is focused and determined, using his small size to navigate the difficult terrain. The morning light is soft and golden.
scene_image_3.png
Prompt:
The small, fluffy white mountain goat kid with a blue bell stands at the edge of a wide, windy chasm on a mountain path, looking thoughtfully at a fallen log that spans the gap. He looks slightly worried but hopeful. The sky is a clear blue.
scene_image_4.png
Prompt:
The small, fluffy white mountain goat kid with a blue bell is bravely and carefully walking across a narrow log bridge over a chasm high on a mountain. He is halfway across, focused and balanced, with a proud expression. The wind gently ruffles his white fur.
scene_image_5.png
Prompt:
The small, fluffy white mountain goat kid with a blue bell stands triumphantly on the very top of a mountain peak, watching a beautiful, vibrant sunrise. The sky is filled with orange, pink, and gold clouds. He looks happy and proud, his little bell shining in the morning sun.