OpenAI’s Offering to Assist Engineers to Build Agents

Here is a complete, structured list of the capabilities / “tools” OpenAI currently provides for agents (through ChatGPT and the API), summarized by OpenAI itself:

1. Reasoning & General Models

  • GPT models (GPT-4o, GPT-4.1, GPT-o3, GPT-o1, Codex variants)
    • Text, code, multimodal reasoning (text, images, audio).
    • Used as the “core brain” of agents.

2. Code & Data Execution

  • Code Interpreter / Python (a.k.a. Advanced Data Analysis, ADA)
    • Run Python code in a sandbox.
    • Upload, analyze, and transform files (CSV, Excel, JSON, images, etc.).
    • Generate visualizations, do math/stats, automate file processing.
  • Codex / GPT-5-Codex
    • Code generation, editing, refactoring, debugging, running tests.
    • Can propose PRs, work in IDEs, or run tasks in cloud sandboxes.

3. Search & Retrieval

  • Web Search Tool
    • Real-time web browsing for up-to-date information.
  • File Search (vector store memory)
    • Search across uploaded docs or persistent memory.
    • Custom GPTs can be hooked to private knowledge bases (via RAG).

4. Voice & Multimodal Interaction

  • Voice Mode / Voice Agents
    • Natural real-time conversation with low latency.
    • Speech-to-Text (Whisper) + expressive Text-to-Speech voices.
  • Image Input & Analysis
    • Upload images for description, OCR, analysis.
  • Audio Input & Output
    • Transcribe audio (Whisper).
    • Synthesize voices (Expressive TTS in GPT-4o).

5. Customizability & Extensions

  • Custom GPTs
    • No-code way to create specialized agents with instructions, knowledge, and API connections.
  • Function Calling / Tool Use
    • Structured way for GPT to call APIs, run functions, or trigger external systems.
  • Memory
    • Persistent long-term memory across conversations (can recall user preferences, past files).

6. Ecosystem & Integration

  • Plugins (legacy, now replaced by function calling / custom GPTs)
    • Connect to third-party apps (e.g., Slack, Zapier, Expedia).
  • Assistants API (for developers)
    • Exposes the same building blocks (threads, tool use, file handling, code execution) for embedding agents into apps.
  • Multi-modal foundation (GPT-4o)
    • One unified model that can handle text, code, images, audio, and voice in a single session.

But on its developer’s website, you’ll find an accurate list of tools!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.