Here is a complete, structured list of the capabilities / “tools” OpenAI currently provides for agents (through ChatGPT and the API), summarized by OpenAI itself:
1. Reasoning & General Models
- GPT models (GPT-4o, GPT-4.1, GPT-o3, GPT-o1, Codex variants)
- Text, code, multimodal reasoning (text, images, audio).
- Used as the “core brain” of agents.
2. Code & Data Execution
- Code Interpreter / Python (a.k.a. Advanced Data Analysis, ADA)
- Run Python code in a sandbox.
- Upload, analyze, and transform files (CSV, Excel, JSON, images, etc.).
- Generate visualizations, do math/stats, automate file processing.
- Codex / GPT-5-Codex
- Code generation, editing, refactoring, debugging, running tests.
- Can propose PRs, work in IDEs, or run tasks in cloud sandboxes.
3. Search & Retrieval
- Web Search Tool
- Real-time web browsing for up-to-date information.
- File Search (vector store memory)
- Search across uploaded docs or persistent memory.
- Custom GPTs can be hooked to private knowledge bases (via RAG).
4. Voice & Multimodal Interaction
- Voice Mode / Voice Agents
- Natural real-time conversation with low latency.
- Speech-to-Text (Whisper) + expressive Text-to-Speech voices.
- Image Input & Analysis
- Upload images for description, OCR, analysis.
- Audio Input & Output
- Transcribe audio (Whisper).
- Synthesize voices (Expressive TTS in GPT-4o).
5. Customizability & Extensions
- Custom GPTs
- No-code way to create specialized agents with instructions, knowledge, and API connections.
- Function Calling / Tool Use
- Structured way for GPT to call APIs, run functions, or trigger external systems.
- Memory
- Persistent long-term memory across conversations (can recall user preferences, past files).
6. Ecosystem & Integration
- Plugins (legacy, now replaced by function calling / custom GPTs)
- Connect to third-party apps (e.g., Slack, Zapier, Expedia).
- Assistants API (for developers)
- Exposes the same building blocks (threads, tool use, file handling, code execution) for embedding agents into apps.
- Multi-modal foundation (GPT-4o)
- One unified model that can handle text, code, images, audio, and voice in a single session.
But on its developer’s website, you’ll find an accurate list of tools!
