# Nano Banana Agent A Dexto agent that provides access to Google's **Gemini 2.5 Flash Image** model for image generation and editing through a lean, powerful MCP server. ## 🎯 What is Gemini 2.5 Flash Image? Gemini 2.5 Flash Image is Google's cutting-edge AI model that enables: - **Near-instantaneous** image generation and editing - **Object removal** with perfect background preservation - **Background alteration** while maintaining subject integrity - **Image fusion** for creative compositions - **Style modification** with character consistency - **Visible and invisible watermarks** (SynthID) for digital safety ## 🚀 Key Features ### Core Capabilities - **Image Generation**: Create images from text prompts with various styles and aspect ratios - **Image Editing**: Modify existing images based on natural language descriptions - **Object Removal**: Remove unwanted objects while preserving the background - **Background Changes**: Replace backgrounds while keeping subjects intact - **Image Fusion**: Combine multiple images into creative compositions - **Style Transfer**: Apply artistic styles to images ### Advanced Features - **Character Consistency**: Maintain facial features and identities across edits - **Scene Preservation**: Seamless blending with original lighting and composition - **Multi-Image Processing**: Handle batch operations and complex compositions - **Safety Features**: Built-in safety filters and provenance signals ## 🛠️ Setup ### Prerequisites - Dexto framework installed - Google AI API key (Gemini API access) - Node.js 20.0.0 or higher ### Installation 1. **Set up environment variables**: ```bash export GOOGLE_GENERATIVE_AI_API_KEY="your-google-ai-api-key" # or export GEMINI_API_KEY="your-google-ai-api-key" ``` 2. **Run the agent** (the MCP server will be automatically downloaded via npx): ```bash # From the dexto repository root npx dexto -a agents/nano-banana-agent/nano-banana-agent.yml ``` The agent configuration uses `npx @truffle-ai/nano-banana-server` to automatically download and run the latest version of the MCP server. ## 📋 Available Tools The agent provides access to 3 essential tools: ### 1. `generate_image` Generate new images from text prompts. **Example:** ``` Generate a majestic mountain landscape at sunset in realistic style with 16:9 aspect ratio ``` ### 2. `process_image` Process existing images based on detailed instructions. This tool can handle any image editing task including object removal, background changes, style transfer, adding elements, and more. **Example:** ``` Remove the red car in the background from /path/to/photo.jpg ``` **Example:** ``` Change the background of /path/to/portrait.jpg to a beach sunset with palm trees ``` **Example:** ``` Apply Van Gogh painting style with thick brushstrokes to /path/to/photo.jpg ``` ### 3. `process_multiple_images` Process multiple images together based on detailed instructions. This tool can combine images, create collages, blend compositions, or perform any multi-image operation. **Example:** ``` Place the person from /path/to/person.jpg into the landscape from /path/to/landscape.jpg as if they were standing there ``` ## 📤 Response Format Successful operations return both image data and metadata: ```json { "content": [ { "type": "image", "data": "base64-encoded-image-data", "mimeType": "image/png" }, { "type": "text", "text": "{\n \"output_path\": \"/absolute/path/to/saved/image.png\",\n \"size_bytes\": 12345,\n \"format\": \"image/png\"\n}" } ] } ``` ## 🎨 Popular Use Cases ### 1. **Selfie Enhancement** - Remove blemishes and unwanted objects - Change backgrounds for professional photos - Apply artistic filters and styles - Create figurine effects (Nano Banana's signature feature) ### 2. **Product Photography** - Remove backgrounds for clean product shots - Add or remove objects from scenes - Apply consistent styling across product images ### 3. **Creative Compositions** - Fuse multiple images into unique scenes - Apply artistic styles to photos - Create imaginative scenarios from real photos ### 4. **Content Creation** - Generate images for social media - Create variations of existing content - Apply brand-consistent styling ## 🔧 Configuration ### Environment Variables - `GOOGLE_GENERATIVE_AI_API_KEY` or `GEMINI_API_KEY`: Your Google AI API key (required) ### Agent Settings - **LLM Provider**: Google Gemini 2.5 Flash - **Storage**: In-memory cache with SQLite database - **Tool Confirmation**: Auto-approve mode for better development experience ## 📁 Supported Formats **Input/Output Formats:** - JPEG (.jpg, .jpeg) - PNG (.png) - WebP (.webp) - GIF (.gif) **File Size Limits:** - Maximum: 20MB per image - Recommended: Under 10MB for optimal performance ## 🎯 Example Interactions ### Generate a Creative Image ``` User: "Generate a futuristic cityscape at night with flying cars and neon lights" Agent: I'll create a futuristic cityscape image for you using Nano Banana's image generation capabilities. ``` ### Remove Unwanted Objects ``` User: "Remove the power lines from this photo: /path/to/landscape.jpg" Agent: I'll remove the power lines from your landscape photo while preserving the natural background. ``` ### Create Figurine Effect ``` User: "Transform this selfie into a mini figurine on a desk: /path/to/selfie.jpg" Agent: I'll create Nano Banana's signature figurine effect, transforming your selfie into a mini figurine displayed on a desk. ``` ### Change Background ``` User: "Change the background of this portrait to a professional office setting: /path/to/portrait.jpg" Agent: I'll replace the background with a professional office setting while keeping you as the main subject. ``` ## 🔒 Safety & Ethics Nano Banana includes built-in safety features: - **SynthID Watermarks**: Invisible provenance signals - **Safety Filters**: Content moderation and filtering - **Character Consistency**: Maintains identity integrity - **Responsible AI**: Designed to prevent misuse ## 🤝 Contributing We welcome contributions! Please see our [Contributing Guidelines](../../CONTRIBUTING.md) for details. ## 📄 License This project is licensed under the MIT License - see the [LICENSE](../../LICENSE) file for details. --- **Note**: This agent provides access to Google's Gemini 2.5 Flash Image model through the MCP protocol. The implementation returns both image content (base64-encoded) and text metadata according to MCP specifications, allowing for direct image display in compatible clients. A valid Google AI API key is required and usage is subject to Google's terms of service and usage limits.