Skip to content

Ravi80335/WonderTale

Repository files navigation

✨ WonderTale

AI storytelling companion for children, designed with neurodiversity in mind — powered by Gemini Live API, Google ADK, and Google Cloud.

A child speaks a topic — WonderTale researches real facts, weaves a personalized adventure, illustrates it, and narrates with expressive voices, all in real-time.

Built for the Gemini Live Agent Challenge on Devpost.


Architecture

Architecture Overview

Agent Coordination

Quick Start

Fastest path: You only need a Google AI API key to run the full experience locally. No database, no cloud storage, no billing plan required — the app gracefully degrades when optional services are absent.

Prerequisites

Requirement Version Notes
Python 3.11+ python --version to verify
Node.js 18+ node --version to verify
Google AI API Key Free tier at aistudio.google.com/apikey

Optional (not needed for local demo):

  • PostgreSQL — enables story library persistence (without it, stories live in browser localStorage)
  • Cloudflare R2 — enables CDN image hosting (without it, illustrations are sent inline as base64)
  • Blaze billing plan — enables real Gemini image generation (without it, set MOCK_IMAGES=true for colorful placeholder illustrations)

Step 1 — Backend

cd WonderTale

# Create and activate virtual environment
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env

Edit .env — set these two values:

GOOGLE_API_KEY=your-key-here
MOCK_IMAGES=true          # Skip Blaze billing; remove this line if you have a paid API key

Start the server:

uvicorn main:app --reload --port 8000

You should see:

WonderTale server starting [log_level=INFO]
Initializing database...
Database disabled (no DATABASE_URL configured) — using localStorage-only mode
INFO:     Uvicorn running on http://0.0.0.0:8000

Step 2 — Frontend

In a second terminal:

cd WonderTale/frontend

# Install dependencies
npm install

# Start dev server
npm run dev

You should see:

VITE v7.x.x  ready in Xms
➜  Local:   http://localhost:5173/

Step 3 — Try It

  1. Open http://localhost:5173 in Chrome (recommended for Web Audio API support).

    Note on UX: WonderTale was designed exclusively for mobile screens. If viewing on a desktop browser, please use Responsive/Device Design Mode (F12) and set it to a mobile device (like an iPhone or Pixel), or resize your browser window to a narrow portrait aspect ratio. Alternatively, build and test the Android app via Capacitor.

  2. Complete onboarding — enter a name, age, interests, accessibility preferences, and companion name
  3. On the home screen, tap a suggestion or type your own topic to start a text-only session. Starting a session from anywhere else (like the mic button) will start a Live Audio session.
  4. Watch the story flow:
    • Researching indicator appears → Research Agent queries Google Search for real facts
    • Writing indicator → Story Architect weaves facts into a personalized adventure
    • Illustration fades in as the background → AI-generated (or placeholder) scene art
    • Text appears with word-by-word reveal → accessibility-formatted story paragraphs
    • Choices appear at the bottom → two AI-generated narrative branches
    • Discover panel (tap the book icon) → recap facts + comprehension quiz
  5. Pick a choice to continue to the next chapter — the cycle repeats

Audio Mode (Paid API Key Required)

To enable live voice conversation with the Gemini Live API:

AUDIO_MODE=true

This uses gemini-2.5-flash-native-audio-preview for bidirectional audio streaming. The child speaks naturally, the AI narrates with the Aoede voice, and barge-in (interruption) is supported. Requires a microphone and Chrome.

Troubleshooting

Issue Fix
GOOGLE_API_KEY error on startup Verify your key at aistudio.google.com/apikey
No illustrations appearing Set MOCK_IMAGES=true in .env if you don't have a Blaze billing plan
WebSocket connection refused Ensure the backend is running on port 8000; check frontend/.env has VITE_WS_URL="ws://localhost:8000/ws/session"
ModuleNotFoundError Ensure virtualenv is activated: which python should point to .venv/
Port 8000 already in use uvicorn main:app --reload --port 8001 and update frontend/.env accordingly
Database warnings Expected if DATABASE_URL is not set — the app falls back to localStorage mode

Environment Variables

Variable Default Description
GOOGLE_API_KEY Gemini API key (required)
AUDIO_MODE false true = native Gemini Live audio; false = text debug mode
IMAGE_MODEL gemini-2.5-flash-image Image generation model
MOCK_IMAGES false true = colourful placeholder PNGs, no API call
LOG_LEVEL INFO Python logging level (DEBUG, INFO, WARNING, ERROR)
DATABASE_URL PostgreSQL asyncpg URL — if unset, app uses localStorage only
R2_ACCOUNT_ID Cloudflare R2 account ID
R2_ACCESS_KEY_ID Cloudflare R2 access key
R2_SECRET_ACCESS_KEY Cloudflare R2 secret key
R2_BUCKET_NAME wondertale-assets R2 bucket name
R2_PUBLIC_DOMAIN Public CDN domain for R2 images
R2_ENDPOINT_URL R2 S3-compatible endpoint URL

Project Structure

WonderTale/
├── main.py                      # FastAPI entry point, lifespan, router includes
├── requirements.txt
├── alembic.ini                  # Alembic migration config
├── .env.example
│
├── core/                        # Agent definitions + shared singletons
│   ├── agent.py                 #   Root orchestrator agents (voice + text modes)
│   └── services.py              #   ADK Runner + SessionService singletons
│
├── agents/                      # Sub-agent definitions
│   ├── research.py              #   Research Agent (gemini-2.5-flash + Google Search)
│   ├── story.py                 #   Story Architect (gemini-2.5-flash)
│   ├── choices.py               #   Story Choices Agent — 2 narrative branches
│   └── quiz.py                  #   Quiz Agent — comprehension questions
│
├── tools/                       # FunctionTools called by the Orchestrator
│   ├── research_tool.py         #   research_topic()
│   ├── story_tool.py            #   generate_story() — runs pipeline + launches background tasks
│   ├── illustration_tool.py     #   generate_and_queue_illustration() (background)
│   ├── choices_tool.py          #   generate_story_choices() (background, 3s delay)
│   └── quiz_tool.py             #   generate_quiz() (background, 6s delay)
│
├── sessions/                    # Per-connection session handlers
│   ├── audio.py                 #   Audio mode: run_live bidi + 3-coroutine model
│   ├── text.py                  #   Text mode: run_async + drain_loop
│   ├── helpers.py               #   apply_profile() shared helper
│   ├── context.py               #   ContextVars: session_id, user_id, story_id
│   └── media_queue.py           #   Per-session asyncio.Queue (illustrations, text, choices, quiz)
│
├── routes/                      # FastAPI routers
│   ├── health.py                #   GET /health
│   ├── websocket.py             #   WS /ws/session (subscription-aware dispatch)
│   ├── profiles.py              #   GET/PUT /api/profiles/{user_id}
│   ├── stories.py               #   GET/DELETE /api/stories/{user_id}[/{story_id}]
│   └── subscriptions.py         #   GET/POST /api/subscriptions/{user_id}[/activate|/cancel]
│
├── subscriptions/               # Subscription system
│   ├── models.py                #   SubscriptionTier, SubscriptionStatus, TIER_CONFIG, ORM model
│   ├── crud.py                  #   DB operations: get, create, update, increment
│   └── service.py               #   SubscriptionService: limits, enforcement, serialisation
│
├── db/                          # Database layer
│   ├── __init__.py              #   Async engine, session factory, graceful degradation
│   ├── models.py                #   ORM models: Profile, Story, StorySegment, Illustration, …
│   ├── crud.py                  #   CRUD helpers for all tables
│   └── migrations/              #   Alembic migration scripts
│
├── storage/
│   └── r2.py                    # Cloudflare R2 client (gracefully disabled if unconfigured)
│
└── frontend/                    # React 19 + TypeScript + Vite 7 + TailwindCSS v4
    └── src/
        ├── App.tsx              #   Provider hierarchy root
        ├── AppRouter.tsx        #   Screen routing with Framer Motion transitions
        ├── context/             #   AppContext, AuthContext, SessionContext,
        │                        #   ThemeContext, SubscriptionContext
        ├── screens/             #   All screens (onboarding, home, story, library,
        │                        #   parent dashboard, subscription)
        ├── components/          #   Shared + story-specific UI components
        ├── hooks/               #   useWebSocket, useAudioCapture, useAudioPlayback
        ├── lib/                 #   api.ts, subscriptionApi.ts, auth.ts, storage.ts
        └── types/               #   index.ts, subscription.ts

Key Features

Voice-First

  • Bidirectional audio streaming via Gemini Live API
  • Aoede voice — warm, clear, child-friendly
  • Context window compression for sessions beyond 10 minutes
  • Session resumption on disconnect (token valid ~2 hours)
  • Barge-in support — children can interrupt naturally
  • ?mode=audio|text per-connection override

Personalized

  • Full onboarding: name, age, interests, companion name
  • Profile injected into every story generation prompt
  • Every story makes the child the hero
  • Google Search grounding for factual accuracy

Multimodal

  • Custom illustration generated per story scene via Gemini image generation
  • Illustrations delivered via side-channel asyncio.Queue — narration never blocked
  • Fade-in transitions with scene progress indicator
  • MOCK_IMAGES=true for local dev without a Blaze API key

Interactive

  • Story Choices — two AI-generated narrative branches to steer the adventure
  • Wand / Interruption — tap during narration to change direction
  • Discover Panel — recap facts, view research, and take a comprehension quiz

Accessible

  • Dyslexia mode — OpenDyslexic font, increased letter/word spacing, relaxed line height
  • ADHD pacing — short segments, animated progress indicator
  • Autism structure — predictable narrative scaffolding, emotion labels
  • Full audio narration + image alt-text for visual impairment
  • Parent Dashboard for managing all accessibility settings

Story Library

  • Completed stories persisted to PostgreSQL with illustrations stored on Cloudflare R2
  • Library screen with replay — re-read any past story without a new AI session
  • Story details: title, summary, themes, research facts, quiz, choices

Subscription System

Tier Price Audio Mode Illustrations / Story Stories / Day
Basic $5 / mo Text only 1 5
Plus $20 / mo Gemini Live audio 3 5
  • Free 30-day trial activated on first subscription (no payment gateway required)
  • Auto-creates a Plus trial for new users on first WebSocket connect
  • Tier enforcement at the WebSocket connection layer (audio downgrade) and story tool layer (daily limit, illustration cap)
  • Subscription management in Parent Dashboard → Manage Subscription

WebSocket Protocol

Client → Server

Format Content
Query param user_id=<uuid>, mode=audio|text, resume_token=<token>
Binary Raw PCM audio (16kHz, 16-bit, mono)
JSON { type: "text", text: "..." }
JSON { type: "profile", profile: { name, age, interests, accessibility }, is_resume? }
JSON { type: "story_resume", story_id, chapters_done, total_segments } — restore chapter tracker on resume

Server → Client

Type Content
Binary Raw PCM audio (24kHz, 16-bit, mono)
thinking Agent is processing
tool_call { name } — tool invocation started
tool_result { name } — tool completed
transcription { text } — agent speech transcript (finished utterances only; no partials)
turn_complete Agent turn finished
interrupted Barge-in detected; audio cancelled
session_resumption { token } — save for reconnect
illustration { data?, url?, alt, segment_index, total_segments }
accessibility_text { text, segment_index, total_segments }
story_choices { choices: [{ label, description, story_direction }] }
quiz_data { questions: [{ question, options, answer_idx, hint }] }
subscription_info { tier, status, audio_mode_allowed, max_illustrations_per_story, stories_used_today, max_stories_per_day, trial_end, days_left, is_active }
subscription_expired Trial or subscription is no longer active
limit_reached { limit_type: "stories"|"illustrations", message }
error { message }

REST API

Method Path Description
GET /health Server health check
GET /api/profiles/{user_id} Fetch child profile
PUT /api/profiles/{user_id} Create / update child profile
GET /api/stories/{user_id} List completed stories
GET /api/stories/{user_id}/{story_id} Full story detail (segments, illustrations, quiz, choices)
DELETE /api/stories/{story_id} Delete a story
GET /api/subscriptions/{user_id} Current subscription status
POST /api/subscriptions/{user_id}/activate Activate / change trial tier — body: { tier: "basic"|"plus" }
POST /api/subscriptions/{user_id}/cancel Cancel subscription

Tech Stack

Layer Choice
Backend language Python 3.11+
Web framework FastAPI
Agent framework Google ADK
AI — Voice gemini-2.5-flash-native-audio-preview-12-2025
AI — Text / Story gemini-2.5-flash
AI — Images gemini-2.5-flash-image
Database PostgreSQL (asyncpg + SQLAlchemy async)
Migrations Alembic
Image storage Cloudflare R2 (boto3 S3-compatible)
Frontend React 19 + TypeScript + Vite 7
TODO: Port to React Native / Expo
Styling TailwindCSS v4
Animations Framer Motion
Hosting Google Cloud Run

License

Built for the Gemini Live Agent Challenge hackathon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors