Agentforge
AgentForge is an autonomous agent I built from scratch — no LangChain, no framework — to understand how agentic systems actually work under the hood. You give it a goal, and an LLM plans a sequence of steps, then executes them one at a time using tools: web search, code execution, database queries.
AgentForge
An autonomous AI agent that plans and executes multi-step tasks using tool-calling — built from scratch, no frameworks.
No LangChain, no LlamaIndex, no CrewAI. Just an async FastAPI backend that drives OpenAI function-calling in a plan-act-observe loop, a pluggable tool registry, MongoDB persistence, and a Next.js streaming chat UI that shows every step the agent takes in real time.
Architecture
flowchart LR
U[User] -->|goal| FE[Next.js UI]
FE -->|POST /api/runs| BE[FastAPI backend]
FE <-->|SSE /stream| BE
BE --> LOOP[Agent loop<br/>plan · act · observe]
LOOP <-->|function calling| LLM[(OpenAI<br/>gpt-4o / gpt-4o-mini)]
LOOP --> REG[Tool registry]
REG --> T1[web_search]
REG --> T2[code_execution]
REG --> T3[db_query]
REG --> T4[task_complete]
LOOP -->|persist each step| DB[(MongoDB<br/>agent_runs)]
T3 --> DB
The stack runs as three containers — frontend, backend, and mongo —
orchestrated by docker-compose.yml.
Quick start
git clone <your-fork-url> agentforge
cd agentforge
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...
docker compose up --build
Then open:
- Frontend: http://localhost:3000
- Backend API docs: http://localhost:8000/docs
The sample MongoDB product catalog is seeded automatically on first boot, so
the db_query tool works immediately. web_search works with no extra keys
(keyless DuckDuckGo) and upgrades to Tavily if you set TAVILY_API_KEY.
Running the backend without Docker
cd backend
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Point at a local Mongo and export your key:
export OPENAI_API_KEY=sk-... MONGO_URI=mongodb://localhost:27017
uvicorn app.main:app --reload
How the agent loop works
The core lives in backend/app/agent/loop.py. It
implements a textbook plan → act → observe cycle:
sequenceDiagram
participant U as User
participant L as Agent loop
participant M as OpenAI
participant T as Tool
participant DB as MongoDB
U->>L: goal
loop until task_complete or max_iter
L->>M: messages + tool schemas (tool_choice=auto)
M-->>L: function_call (tool, args)
L->>T: execute(args) with timeout
T-->>L: result (or error string)
L->>DB: persist step
L-->>U: stream step (SSE)
L->>L: append tool result to history
end
L-->>U: stream final answer (SSE)
- User sends a goal via
POST /api/runs. - A system prompt tells the LLM it has tools and must work step by step.
- The LLM returns a
function_call(tool name + JSON arguments). - The backend executes the tool and captures the result.
- The result is appended to the message history as a tool message.
- Loop back to step 3 until either:
- the LLM calls the terminal
task_completetool, or - the max-iteration cap (default 10) is hit.
- the LLM calls the terminal
- The final answer is streamed to the frontend.
Built-in robustness
- Unknown tool → an error string is returned into context so the LLM self-corrects.
- Tool failure → wrapped in
try/except; the exception text becomes the tool result. - Context-window management → when the running history exceeds
TOKEN_THRESHOLDtokens, older steps are summarized (cheap model) while the original goal and recent steps are kept verbatim (context.py). - Per-tool timeouts → each tool call is bounded by
TOOL_TIMEOUT(default 30s); the sandboxed code executor has its own tighter timeout. - Crash-safe persistence → every step is
$push-ed to MongoDB as it happens, not just at the end.
Adding a custom tool
Adding a tool is one decorated async function. Drop a new file in
backend/app/tools/ — it is auto-discovered at startup.
# backend/app/tools/weather.py
from typing import Any
import httpx
from .registry import ToolContext, register_tool
@register_tool(
name="get_weather",
description="Get the current temperature for a city.",
parameters={
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
)
async def get_weather(args: dict[str, Any], ctx: ToolContext) -> str:
city = args["city"]
async with httpx.AsyncClient() as client:
resp = await client.get(
"https://wttr.in/" + city, params={"format": "3"}
)
return resp.text
That's it. No registration list to edit — the @register_tool decorator adds
it to the registry, its JSON schema is sent to the LLM automatically, and the
loop can call it. Mark a tool terminal=True to make calling it end the run
(that's how task_complete works).
Each tool receives a ToolContext carrying shared dependencies (settings,
the OpenAI client, the Mongo database) so tools stay easy to test.
Example run
Goal: "What's the cheapest in-stock book in the catalog, and is it cheaper than a USB-C Hub?"
| Step | Tool | Args | Observation (truncated) |
|---|---|---|---|
| 1 | db_query | {"question": "cheapest in-stock book"} | Clean Code — $33.99, in_stock: true |
| 2 | db_query | {"question": "price of USB-C Hub"} | USB-C Hub — $39.99, in_stock: false |
| 3 | code_execution | {"code": "print(33.99 < 39.99)"} | stdout: True |
| 4 | task_complete | {"answer": "..."} | (ends run) |
Final answer: "The cheapest in-stock book is Clean Code at $33.99. Yes — it's cheaper than the USB-C Hub ($39.99) by $6.00 (and the hub is out of stock anyway)."
You can watch each of these steps stream in as collapsible cards in the UI, and replay any past run from the history sidebar.
API reference
| Method | Path | Description |
|---|---|---|
POST | /api/runs | Start a new run. Body: { "goal": "...", "model"?, "max_iterations"? }. Returns { "run_id" }. |
GET | /api/runs/{run_id}/stream | Server-Sent Events: each step, then final, then done. Replays persisted steps for finished runs. |
GET | /api/runs/{run_id} | Full run document with every step. |
GET | /api/runs?limit=50 | Recent runs (compact summaries). |
GET | /api/health | Liveness probe. |
Run document shape (agent_runs collection)
{
"run_id": "…",
"goal": "…",
"status": "running | completed | failed | max_iter",
"created_at": "…", "completed_at": "…",
"model": "gpt-4o-mini",
"steps": [
{
"step_number": 1,
"tool_name": "db_query",
"tool_args": { "question": "…" },
"tool_result": "…",
"llm_response_raw": { "…": "…" },
"tokens_used": 812,
"latency_ms": 640,
"timestamp": "…"
}
],
"final_answer": "…",
"total_tokens": 3120,
"total_latency_ms": 4210
}
Tech stack
Backend
- Python 3.11+, FastAPI (async)
- openai ≥ 1.0 (function calling)
- motor (async MongoDB driver)
- httpx (async HTTP for tools)
- pydantic v2 + pydantic-settings (typed models & config)
- tiktoken (token accounting)
Frontend
- Next.js 14 (App Router) + React 18
- TypeScript (everything typed)
- Tailwind CSS
EventSourcefor SSE streaming
Infrastructure
- MongoDB 7
- Docker + Docker Compose
Project structure
agentforge/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI app + CORS + lifespan + SSE broker
│ │ ├── config.py # Settings via pydantic-settings
│ │ ├── agent/
│ │ │ ├── loop.py # The core plan-act-observe loop
│ │ │ ├── context.py # Message history + summarization
│ │ │ └── schemas.py # Pydantic models
│ │ ├── tools/
│ │ │ ├── registry.py # @register_tool decorator + discovery
│ │ │ ├── web_search.py
│ │ │ ├── code_exec.py
│ │ │ ├── db_query.py
│ │ │ └── task_complete.py
│ │ └── storage/
│ │ └── mongo.py # Run persistence + sample data seeding
│ ├── requirements.txt
│ └── Dockerfile
├── frontend/
│ ├── src/app/
│ │ ├── page.tsx # Main chat UI
│ │ ├── lib/api.ts # Typed API client + SSE helpers
│ │ └── components/
│ │ ├── ChatInput.tsx
│ │ ├── StepCard.tsx # Collapsible tool-call display
│ │ └── RunHistory.tsx
│ ├── package.json
│ └── Dockerfile
├── docker-compose.yml
├── .env.example
├── README.md
└── LICENSE
A note on the code sandbox
The code_execution tool runs snippets in a fresh, isolated Python subprocess
(-I isolated mode, a temporary working directory, a scrubbed environment, and
a hard timeout). This is appropriate for trusted/educational use. It is not
a hardened security boundary — for untrusted input, run it inside a container
with seccomp/gVisor and resource limits.
Built as a side project exploring agentic AI workflows.
License
MIT — see LICENSE.