Guess What

Speech based AI childrens game

Problem

Real-time AI voice interaction has massive potential for creating natural, engaging user experiences – but most implementations feel like tech demos rather than genuinely fun applications. Children are the most honest user group: if an interface is clunky, confusing, or boring, they’ll disengage immediately. I wanted to prove that AI voice interaction could be seamless, delightful, and autonomous enough to guide users through a complete experience without human intervention.

AI generated, flamingo riding a roller coaster

Solution

I designed an animal guessing game where an AI voice assistant teaches the rules, dynamically generates visual puzzles, provides real-time feedback via voice, and adapts to each child’s progress. The challenge was to create a system that felt like magic – where children could have natural conversations with an AI that understood game state, if and when a child was confused or struggling, and responded with an appropriate tone and voice, while generating custom imagery on demand.

AI generated, lion reading a book in a library

Approach

I built the complete application using Next.js and TypeScript for rapid development and deployment. The core technical implementation included: integrating WebRTC with OpenAI’s Realtime API for natural voice conversation with sub-100ms response times; connecting DALL-E and GPT-IMAGE-1 for dynamic on-demand image generation based on game state; implementing custom FFT audio analysis to provide visual feedback during voice interaction; designing a progressive blur-to-reveal mechanic that creates anticipation and provides hints when children struggle; using AI function calling to manage game state transitions autonomously; building Netlify blob-based image caching to reduce API costs on repeated plays; and creating comprehensive error handling for robust real-world testing with children.

AI generated, lion surfing a huge wave

Results

The final application successfully demonstrates seamless integration of four AI technologies working in concert (GPT-4 reasoning, GPT-IMAGE-1 generation, Realtime API voice, and function calling orchestration), real-time voice interaction that feels natural and responsive, dynamic image generation that creates unique experiences every time, and a clean, accessible interface that children intuitively understand. Children naturally converse with the AI to play the game, with visual feedback making the experience engaging and the progressive reveal system creating excitement while helping struggling players. The project proves that multi-modal AI systems can create genuinely delightful user experiences when properly orchestrated.

It took a lot of iterations to get there, and prototyping with children was a delight…

This project is hidden online to save bandwidth and API costs. Please email me for access.