AIM of the Project

The aim of this project is to build an AI-powered children's picture book generation system that can:

By leveraging advanced AI models like Google Gemini and image-generation models such as Stable Diffusion, this system automates story generation and illustration design to produce interactive and imaginative picture books. The goal is to empower publishers, educators, and parents to create high-quality, customizable content for children that is both immersive and impactful.

What is AI-Powered Picture Book Generation?

AI-powered picture book generation uses artificial intelligence to create engaging, interactive children's stories along with visual prompts for illustrations. By integrating AI with text-to-image models like Stable Diffusion, this technology can generate captivating narratives and corresponding images, making it easier to produce personalized picture books that children can enjoy.

In this project, the Gemini model is utilized to craft unique stories with themes like friendship, adventure, kindness, and bravery. These stories are divided into parts with each having a visual prompt for image generation, transforming the narrative into a visual experience. The generated images are based on the protagonist's description and key moments in the story, giving each part a vibrant, illustrated look.

Setup for IPYNB

For the best experience, please stay connected to the internet while executing this Project

Running an IPYNB on Google Colab:

Important Libraries

Important Libraries

Execution Instructions

  1. Setup:
    • Install necessary libraries:
      pip install diffusers torch pandas PIL google-generativeai
    • Obtain API keys for any external services (e.g., Google Generative AI).
  2. Implement GEMINI Functionalities: Run the generate_story() function to implement the functionality of using GEMINI to generate stories by randomly selecting the themes and settings and generate prompts for each of the scenes.
  3. Implement the Stable Diffusion Functionalities: Run the generate_images_from_gemini() function that will generate images of the protagonist and the scenes based on the prompts that were generated by gemini in the previous step.
  4. Execute the entire pipeline: Run the final main function that combines the functionalities of the generate_story() function and the generate_images_from_gemini() function and displays the final story, prompts for the scenes and the picture book generated.

IMPLEMENTATION

Integrating Google Gemini for Story Generation

The Google Gemini model is a state-of-the-art generative AI tool designed to handle tasks like natural language generation and storytelling with remarkable creativity and coherence. In this project, Gemini is used to generate engaging narratives for picture books, including character descriptions, plot outlines, and scene prompts. By leveraging its advanced capabilities, we can ensure high-quality, imaginative content tailored for children's stories.

Step to create Gemini API Key

  1. Click on the https://aistudio.google.com/apikey
  2. Click on the Get API Key Icon
  3. Next, click on the Create an API Key Icon and give it a name and copy the generated API key safely for future use.

Gemini API Setup


import google.generativeai as genai
import random
# Configure the Gemini API
api_key = "Your Gemini API Key"
genai.configure(api_key=api_key)

# Initialize the Gemini model
model = genai.GenerativeModel("gemini-2.5-pro")

Code Breakdown

Generate Your Picture Book Story and Visual Prompts

Introduction

Welcome to the "Generate Story" function!

This block of code brings creativity to life by crafting a full-fledged children's story, a detailed protagonist description, and visually engaging prompts for each scene. By leveraging Google Gemini, this function ensures that the generated story and visuals are rich in imagination and perfectly tailored for picture books.

This function is designed to:

How It Works?

Key Parameters

Generate Story Function

import json # Import json for parsing

def generate_story():
    """
    Generates a children's story, protagonist description, and scene prompts using Gemini.

    Returns:
        dict: A dictionary containing the story, protagonist description, and scene prompts.
    """
    # Randomly select a theme and setting
    theme = random.choice(["friendship", "adventure", "kindness", "bravery", "overcoming fears", "honesty"])
    setting = random.choice(["forest", "castle", "ocean", "mountain", "desert", "jungle", "village", "busy city"])

    gemini_prompt = f"""
You are a creative storyteller tasked with generating simple, engaging children's stories that can be illustrated as picture books using Stable Diffusion for image generation. Each story must resemble timeless classics like "The Tortoise and the Hare," with a strong narrative and visually clear prompts. Follow these guidelines carefully:

1. **Story Generation:**
   - Create a captivating children's story based on the theme "{theme}" and set in "{setting}".
   - Use simple, kid-friendly language and imaginative yet relatable scenarios.
   - Ensure the story has:
     - One main protagonist with consistent traits throughout.
     - A clear challenge, conflict, or adventure the protagonist embarks on and resolves.
     - A moral or lesson that children can learn from.
   - Keep the story concise and divide it into 5 logical parts.

2. **Protagonist Description:**
   - Write a detailed, standalone description of the protagonist that remains consistent across all scenes:
     - Physical traits: size, age, clothing, and distinctive features.
     - Personality traits: brave, kind, curious, or determined.
     - Any unique items or accessories they carry that appear in every scene.

3. **Story Division into 5 Parts:**
   - Divide the story into exactly 5 sequential moments:
     - Each part should represent a distinct event in the narrative, progressing logically toward the resolution.
     - Clearly establish the setting, protagonist's actions, and emotions in each part.

4. **Scene Prompt Generation for Stable Diffusion:**
   - For each of the 5 parts, write a **highly descriptive visual prompt** that Stable Diffusion can use to generate images.
   - Ensure prompts are:
     - Clear, specific, and unambiguous, avoiding abstract or overly complex descriptions.
     - Focused on describing the protagonist's appearance, actions, surroundings, and emotions.
     - Consistent with the protagonist's traits and accessories described earlier.
     - Example of a well-crafted prompt:
       - "A small tortoise with a green shell and a bright red scarf stands proudly on a dirt path in a sunny forest, with trees and flowers around, while a confident hare smirks nearby."

5. **Output the following as a JSON object.** The JSON object should have the following keys:
   - 'theme': The chosen theme.
   - 'setting': The chosen setting.
   - 'protagonist_description': A single string containing the detailed protagonist description.
   - 'story_text': A single string containing the complete story divided into parts, with each part clearly separated (e.g., using newlines).
   - 'scene_prompts': A list of 5 strings, where each string is a clear visual prompt for a scene.
Example JSON output:
```json
{{
  "theme": "friendship",
  "setting": "forest",
  "protagonist_description": "A curious squirrel named Sammy, with a bushy tail and a small red scarf, who is adventurous and helpful.",
  "story_text": "Part 1: Once upon a time...\\nPart 2: Sammy met a shy rabbit...\\nPart 3: They went on an adventure...\\nPart 4: They faced a challenge...\\nPart 5: They learned about friendship...",
  "scene_prompts": [
    "A small squirrel with a red scarf explores a sunny forest, surrounded by towering trees and colorful flowers.",
    "The squirrel meets a shy rabbit near a sparkling stream, with sunlight reflecting off the water.",
    "Sammy and the rabbit discover a hidden cave, illuminated by glowing crystals.",
    "They work together to cross a treacherous river, building a bridge from fallen branches.",
    "Sammy and the rabbit share a celebratory nut, their bond of friendship stronger than ever, under a starlit sky."
  ]
}}
```
"""

    # Generate the response using the Gemini model
    response = model.generate_content(gemini_prompt)
    if not response or not response.text.strip():
        return {"error": "Gemini returned an empty or invalid response.", "theme": theme, "setting": setting}

    # Initialize gemini_output with theme and setting
    gemini_output = {"theme": theme, "setting": setting}

    try:
        # Extract pure JSON string from the response text
        json_start = response.text.find('```json')
        json_end = response.text.rfind('```')
        if json_start != -1 and json_end != -1:
            json_string = response.text[json_start + 7:json_end].strip()
        else:
            json_string = response.text.strip()

        # Attempt to parse the extracted JSON string
        parsed_output = json.loads(json_string)
        gemini_output["protagonist_description"] = parsed_output.get("protagonist_description")
        gemini_output["story_text"] = parsed_output.get("story_text")
        gemini_output["scene_prompts"] = parsed_output.get("scene_prompts", [])
    except json.JSONDecodeError as e:
        return {"error": f"Failed to parse Gemini response as JSON: {e} - Response: {response.text}", "theme": theme, "setting": setting}

    return gemini_output

Before Running the Code....

Example Output

{
    "theme": "friendship",
    "setting": "forest",
    "protagonist_description": "A curious squirrel named Sammy, with a bushy tail and a small red scarf, who is adventurous and helpful.",
    "story_text": "Once upon a time...",
    "scene_prompts": [
        "A small squirrel with a red scarf explores a sunny forest, surrounded by towering trees and colorful flowers.",
        "The squirrel meets a shy rabbit near a sparkling stream, with sunlight reflecting off the water.",
        ...
    ]
}
        

Image Generation Pipeline Setup

from diffusers import DiffusionPipeline, AutoPipelineForImage2Image
import torch
from PIL import Image
import matplotlib.pyplot as plt
import time
import gc

Creating Stunning Illustrations from Story Prompts with Stable Diffusion

Introduction

This Python function, generate_images_from_gemini_output, leverages pre-trained Stable Diffusion model to generate a sequence of images based on the thematic and descriptive prompts provided by the gemini_output.

Purpose of the code

The function generates:

  1. Protagonist Image: Based on the description of a protagonist.
  2. Scene Images: Based on the scene prompts, starting with the protagonist's image and progressively transforming it.

It uses Stable Diffusion, a popular deep learning model for text-to-image and image-to-image generation.

Code Components

Function: generate_images_from_gemini_output

This function is responsible for generating images based on prompts.

Parameters

Returns

A list of generated images (list of PIL.Image.Image objects).

Helper Function:

display_and_save_image : Displays the generated image using matplotlib and saves it to a file.

Workflow

  1. Import Necessary Libraries
    • diffusers: Provides Stable Diffusion pipelines for text-to-image and image-to-image generation.
    • torch: For managing model weights and running computations on the specified device.
    • gc: For explicit garbage collection to optimize memory usage.
  2. Initialize Pipelines
    • Text-to-Image Pipeline (StableDiffusionPipeline): Converts text descriptions into images.
    • Image-to-Image Pipeline (StableDiffusionImg2ImgPipeline): Modifies images based on new text prompts.

    Both pipelines are loaded with the specified model_name and cast to float16 for efficient memory usage.

  3. Input Validation

    Checks if protagonist_description and scene_prompts are present in gemini_output. Raises errors if missing.

  4. Generate Images
    1. Protagonist Image:

      The protagonist description is fed into the text-to-image pipeline.

      The result is stored and displayed.

    2. Scene Images:

      Iterates through scene_prompts, applying the image-to-image pipeline to the protagonist's image using the specified strength.

      Each result is stored, displayed, and saved.

  5. Memory Management

    Explicitly clears GPU memory using torch.cuda.empty_cache() and Python garbage collection (gc.collect()).

  6. Save and Display Images

    Uses the helper function display_and_save_image to visualize and store images in .png format.

Customization Options

Hyperparameters

Pipeline and Device

Prompts

Modify the protagonist_description or scene_prompts in gemini_output for different narratives or themes.

File Saving

Adjust file paths and formats in display_and_save_image.

Key Considerations

Illustrations Generation using stable-diffusion-v1-5

def generate_images_from_gemini_output(gemini_output, model_name="sd-legacy/stable-diffusion-v1-5", device="cuda", strength=0.8):
    """
    Generates a sequence of images based on the output from the Gemini function.

    Args:
        gemini_output (dict): Dictionary containing the theme, setting, protagonist description, and scene prompts.
        model_name (str): Pretrained model name.
        device (str): Device to run the model on (e.g., "cuda" or "cpu").
        strength (float): Strength of transformation for image-to-image generation (0.0 to 1.0).

    Returns:
        list: List of generated images.
    """
    from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline
    import torch
    import gc

    # Initialize text-to-image pipeline
    pipe = StableDiffusionPipeline.from_pretrained(
        model_name,
        torch_dtype=torch.float16
    ).to(device)

    # Initialize image-to-image pipeline
    img2img_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
        model_name,
        torch_dtype=torch.float16
    ).to(device)

    protagonist_description = gemini_output.get("protagonist_description", "")
    scene_prompts = gemini_output.get("scene_prompts", [])

    if not protagonist_description:
        raise ValueError("Protagonist description is missing in the Gemini output.")
    if not scene_prompts:
        raise ValueError("Scene prompts are missing in the Gemini output.")

    generated_images = []

    # Step 1: Generate protagonist image
    print(f"Generating protagonist image with prompt: {protagonist_description}")
    protagonist_image = pipe(prompt=protagonist_description, guidance_scale=8).images[0]
    generated_images.append(protagonist_image)
    display_and_save_image(protagonist_image, "protagonist_image.png")

    # Free memory
    torch.cuda.empty_cache()
    gc.collect()

    # Step 2: Generate scene images
    for i, prompt in enumerate(scene_prompts):
        print(f"Generating scene image {i + 1} with prompt: {prompt}")
        current_image = img2img_pipe(
            prompt=prompt,
            image=protagonist_image,
            strength=strength,
            guidance_scale=9
        ).images[0]
        generated_images.append(current_image)
        display_and_save_image(current_image, f"scene_image_{i + 1}.png")

        # Free memory
        torch.cuda.empty_cache()
        gc.collect()

    return generated_images

def display_and_save_image(image, filename):
    """
    Displays the generated image and saves it to a file.

    Args:
        image (PIL.Image.Image): Image to be displayed and saved.
        filename (str): Path to save the image.
    """
    import matplotlib.pyplot as plt

    plt.figure(figsize=(6, 6))
    plt.imshow(image)
    plt.axis("off")  # Turn off axis
    plt.show()

    image.save(filename)
    print(f"Image saved as '{filename}'")

Main Functionality for Story Generation and Image Creation

Purpose

This block of code generates a story and corresponding images based on its content. The generated story includes a randomly selected theme and setting, as well as details about the protagonist, story text, and scene prompts.

Code Components

  1. Entry Point Check
    if __name__ == "__main__":

    This condition ensures that the following block of code is executed only when the script is run directly (not when it's imported as a module in another script).

  2. Story Generation
    gemini_output = generate_story()

    generate_story() Function: Randomly generates a story with attributes such as Theme, Setting, Story Text, Protagonist Description, Scene Prompts, Output

  3. Story Details Printing

    Outputs the generated story details to the console.

    Key Components:

    • gemini_output['theme']: Displays the story's theme.
    • gemini_output['setting']: Displays the story's setting.
    • gemini_output['story_text']: Prints the main story content.
    • gemini_output['protagonist_description']: Prints details about the protagonist.
    • gemini_output['scene_prompts']: Iterates over scene prompts and prints each with a corresponding part number.
  4. Image Generation
    generated_images = generate_images_from_gemini_output(gemini_output)

    generate_images_from_gemini_output() Function:

    • Purpose: Creates visual representations (images) based on the details in gemini_output.
    • Input: The story dictionary (gemini_output) containing the theme, setting, and prompts.
    • Output: A list or dictionary of generated images corresponding to different parts of the story.

Expected Generated Images

Below are the images generated by our AI Picture Book Generator using Stable Diffusion models. Each image was created based on specific prompts that describe scenes from the story about Pip, a brave mountain goat kid on an adventure.