Gemini CLI Architecture: Core vs CLI Abstraction

Overview

The Gemini CLI project is structured as a monorepo with two main packages that follow a clear separation of concerns:

@google/gemini-cli-core: The core logic package containing all business logic, AI interactions, and tool implementations
@google/gemini-cli: The CLI package providing the user interface layer (both interactive and non-interactive modes)

High-Level Architecture

Package Structure

Core Package (`packages/core`)

The core package is a headless library that provides:

AI model interactions (Gemini, Claude)
Tool system and registry
Configuration management
File discovery and Git services
Telemetry and logging
Authentication handling

CLI Package (`packages/cli`)

The CLI package is the user-facing application that provides:

Interactive terminal UI (using React + Ink)
Command-line argument parsing
User settings management
Theme system
Interactive prompts and confirmations
Non-interactive mode for scripting

Package Dependencies

Key Interfaces

1. Configuration Interface

The core provides a Config class that encapsulates all configuration:

TSX
// Core exports
export class Config {
  // Model configuration
  getModel(): string;
  getEmbeddingModel(): string;

  // Directory and file paths
  getWorkingDir(): string;
  getProjectRoot(): string;
  getProjectTempDir(): string;

  // Services
  getGeminiClient(): GeminiClient;
  getToolRegistry(): ToolRegistry;
  getFileService(): FileDiscoveryService;
  getGitService(): GitService;

  // Feature flags
  getDebugMode(): boolean;
  getCheckpointingEnabled(): boolean;
  getFullContext(): boolean;

  // Tool configuration
  getCoreTools(): string[];
  getExcludeTools(): string[];
  getToolDiscoveryCommand(): string | undefined;

  // Authentication
  getContentGeneratorConfig(): ContentGeneratorConfig;
  refreshAuth(authType: AuthType): Promise<void>;
}

The CLI creates and configures this Config instance based on:

Command-line arguments
User settings files
Environment variables
Project-specific configuration

2. Gemini Client Interface

The core provides GeminiClient for AI interactions:

TSX
export class GeminiClient {
  // Initialization
  initialize(config: ContentGeneratorConfig): Promise<void>;

  // Chat management
  getChat(): GeminiChat;
  resetChat(): Promise<void>;

  // History management
  getHistory(): Promise<Content[]>;
  setHistory(history: Content[]): Promise<void>;
  addHistory(content: Content): Promise<void>;

  // Message streaming
  sendMessageStream(
    query: PartListUnion,
    signal: AbortSignal
  ): AsyncIterable<ServerGeminiStreamEvent>;
}

3. Streaming Events Interface

The core defines a comprehensive event system for streaming responses:

TSX
export enum GeminiEventType {
  Content = "content",
  ToolCallRequest = "tool_call_request",
  ToolCallConfirmation = "tool_call_confirmation",
  ToolCallResponse = "tool_call_response",
  Error = "error",
  UserCancelled = "user_cancelled",
  ChatCompressed = "chat_compressed",
  UsageMetadata = "usage_metadata",
  Thought = "thought",
}

export type ServerGeminiStreamEvent =
  | ServerGeminiContentEvent
  | ServerGeminiToolCallRequestEvent
  | ServerGeminiErrorEvent
  | ServerGeminiChatCompressedEvent
  | ServerGeminiUsageMetadataEvent
  | ServerGeminiThoughtEvent;
// ... etc

Event Flow Diagram

4. Tool System Interface

The core provides a comprehensive tool system:

TSX
// Base tool interface
export interface Tool<
  TParams = unknown,
  TResult extends ToolResult = ToolResult
> {
  name: string;
  displayName: string;
  description: string;
  schema: FunctionDeclaration;
  isOutputMarkdown: boolean;
  canUpdateOutput: boolean;

  validateToolParams(params: TParams): string | null;
  getDescription(params: TParams): string;
  shouldConfirmExecute(
    params: TParams,
    signal: AbortSignal
  ): Promise<ToolCallConfirmationDetails | false>;
  execute(
    params: TParams,
    signal: AbortSignal,
    updateOutput?: (output: string) => void
  ): Promise<TResult>;
}

// Tool registry for managing tools
export class ToolRegistry {
  registerTool(tool: Tool): void;
  discoverTools(): Promise<void>;
  getFunctionDeclarations(): FunctionDeclaration[];
  getAllTools(): Tool[];
  getTool(name: string): Tool | undefined;
}

// Tool execution result
export interface ToolResult {
  llmContent: PartListUnion; // Content for AI model
  returnDisplay: string | FileDiff; // Display for user
}

5. Tool Execution Flow

The CLI manages tool execution through several layers:

Tool Scheduling (CLI layer):
- useReactToolScheduler hook manages tool lifecycle
- Handles user confirmations
- Tracks tool status (pending, executing, completed)
- Updates UI in real-time
Tool Execution (Core layer):
- executeToolCall function in core handles actual execution
- Validates parameters
- Executes tool logic
- Returns results
Tool Response Flow:

TSX
// CLI receives tool request from AI ToolCallRequestInfo → scheduleToolCalls() // CLI prompts for confirmation if needed → shouldConfirmExecute() → User confirmation // Core executes tool → executeToolCall() → Tool.execute() // CLI sends response back to AI → submitQuery(toolResponses)

Tool Lifecycle State Machine

Key Abstractions

1. Separation of UI and Logic

The CLI package handles all UI concerns:

Terminal rendering (React + Ink components)
User input handling
Theme management
Loading indicators
Progress displays

The Core package handles all business logic:

AI model interactions
Tool implementations
File system operations
Authentication
Error handling

2. Event-Driven Architecture

The core uses an event-driven approach for streaming:

Core emits typed events during AI interactions
CLI consumes events and updates UI accordingly
Events include content, tool calls, errors, metadata

3. Tool System Abstraction

Tools are completely defined in Core:

Tool interface and base class
Built-in tool implementations
Tool discovery mechanisms (MCP, custom commands)

CLI only handles:

Tool confirmation UI
Progress display
Result rendering

4. Configuration Management

Core defines configuration structure, CLI populates it:

CLI reads from multiple sources (files, env, args)
CLI creates Config instance
Core uses Config for all operations

5. Authentication Abstraction

Core defines authentication types and interfaces:

TSX
export enum AuthType {
  USE_GEMINI = "use_gemini",
  USE_GOOGLE_AI_STUDIO = "use_google_ai_studio",
  USE_VERTEX = "use_vertex",
  USE_ANTHROPIC = "use_anthropic",
}

CLI handles:

Authentication UI flows
Token storage
User prompts for auth

Non-Interactive Mode

The CLI provides a non-interactive mode that demonstrates the clean separation:

TSX
// Minimal UI logic in non-interactive mode
export async function runNonInteractive(config: Config, input: string) {
  const geminiClient = config.getGeminiClient()
  const toolRegistry = await config.getToolRegistry()
  const chat = await geminiClient.getChat()

  // Direct streaming without UI
  const responseStream = await chat.sendMessageStream(...)

  // Simple stdout writing
  for await (const resp of responseStream) {
    process.stdout.write(getResponseText(resp))
  }
}

Extension Points

1. Custom Tools

Implement the Tool interface in a separate package
Register with ToolRegistry
CLI automatically handles UI for any tool

2. Custom Authentication

Implement ContentGenerator interface
Add new AuthType
CLI will handle auth flow UI

3. Custom Themes

Define theme in CLI package
Core remains unaware of presentation

4. Alternative UIs

Core can be used with any UI framework
Web UI, desktop app, VS Code extension possible
Just implement event handling and tool confirmation

Benefits of This Architecture

Testability: Core logic can be tested without UI
Reusability: Core can be used in different contexts
Maintainability: Clear boundaries between concerns
Extensibility: Easy to add new tools, auth methods, UIs
Type Safety: Strong interfaces between packages
Modularity: Packages can evolve independently

Data Flow Architecture

Summary

The Gemini CLI architecture demonstrates a clean separation between core business logic and user interface concerns. The Core package provides a comprehensive API for AI interactions, tool execution, and system management, while the CLI package focuses solely on user interaction and presentation. This separation enables the core functionality to be reused in different contexts while maintaining a rich, interactive terminal experience for users.

The diagrams above illustrate:

High-level architecture showing the main components and their relationships
Package dependencies showing external libraries used by each package
Event flow demonstrating the streaming interaction pattern
Tool lifecycle showing the state machine for tool execution
Configuration flow showing how settings are aggregated and used
Extension points showing how the architecture supports customization
Data flow showing the complete flow from user input to display

This architecture provides excellent separation of concerns, testability, and extensibility while maintaining a cohesive system.