Skip to main content

Agentforge

AgentForge is an autonomous agent I built from scratch — no LangChain, no framework — to understand how agentic systems actually work under the hood. You give it a goal, and an LLM plans a sequence of steps, then executes them one at a time using tools: web search, code execution, database queries.

Python

AgentForge

An autonomous AI agent that plans and executes multi-step tasks using tool-calling — built from scratch, no frameworks.

No LangChain, no LlamaIndex, no CrewAI. Just an async FastAPI backend that drives OpenAI function-calling in a plan-act-observe loop, a pluggable tool registry, MongoDB persistence, and a Next.js streaming chat UI that shows every step the agent takes in real time.


Architecture

flowchart LR
    U[User] -->|goal| FE[Next.js UI]
    FE -->|POST /api/runs| BE[FastAPI backend]
    FE <-->|SSE /stream| BE
    BE --> LOOP[Agent loop<br/>plan · act · observe]
    LOOP <-->|function calling| LLM[(OpenAI<br/>gpt-4o / gpt-4o-mini)]
    LOOP --> REG[Tool registry]
    REG --> T1[web_search]
    REG --> T2[code_execution]
    REG --> T3[db_query]
    REG --> T4[task_complete]
    LOOP -->|persist each step| DB[(MongoDB<br/>agent_runs)]
    T3 --> DB

The stack runs as three containers — frontend, backend, and mongo — orchestrated by docker-compose.yml.


Quick start

git clone <your-fork-url> agentforge
cd agentforge

cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...

docker compose up --build

Then open:

The sample MongoDB product catalog is seeded automatically on first boot, so the db_query tool works immediately. web_search works with no extra keys (keyless DuckDuckGo) and upgrades to Tavily if you set TAVILY_API_KEY.

Running the backend without Docker

cd backend
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Point at a local Mongo and export your key:
export OPENAI_API_KEY=sk-... MONGO_URI=mongodb://localhost:27017
uvicorn app.main:app --reload

How the agent loop works

The core lives in backend/app/agent/loop.py. It implements a textbook plan → act → observe cycle:

sequenceDiagram
    participant U as User
    participant L as Agent loop
    participant M as OpenAI
    participant T as Tool
    participant DB as MongoDB

    U->>L: goal
    loop until task_complete or max_iter
        L->>M: messages + tool schemas (tool_choice=auto)
        M-->>L: function_call (tool, args)
        L->>T: execute(args) with timeout
        T-->>L: result (or error string)
        L->>DB: persist step
        L-->>U: stream step (SSE)
        L->>L: append tool result to history
    end
    L-->>U: stream final answer (SSE)
  1. User sends a goal via POST /api/runs.
  2. A system prompt tells the LLM it has tools and must work step by step.
  3. The LLM returns a function_call (tool name + JSON arguments).
  4. The backend executes the tool and captures the result.
  5. The result is appended to the message history as a tool message.
  6. Loop back to step 3 until either:
    • the LLM calls the terminal task_complete tool, or
    • the max-iteration cap (default 10) is hit.
  7. The final answer is streamed to the frontend.

Built-in robustness

  • Unknown tool → an error string is returned into context so the LLM self-corrects.
  • Tool failure → wrapped in try/except; the exception text becomes the tool result.
  • Context-window management → when the running history exceeds TOKEN_THRESHOLD tokens, older steps are summarized (cheap model) while the original goal and recent steps are kept verbatim (context.py).
  • Per-tool timeouts → each tool call is bounded by TOOL_TIMEOUT (default 30s); the sandboxed code executor has its own tighter timeout.
  • Crash-safe persistence → every step is $push-ed to MongoDB as it happens, not just at the end.

Adding a custom tool

Adding a tool is one decorated async function. Drop a new file in backend/app/tools/ — it is auto-discovered at startup.

# backend/app/tools/weather.py
from typing import Any
import httpx
from .registry import ToolContext, register_tool


@register_tool(
    name="get_weather",
    description="Get the current temperature for a city.",
    parameters={
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
        },
        "required": ["city"],
    },
)
async def get_weather(args: dict[str, Any], ctx: ToolContext) -> str:
    city = args["city"]
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            "https://wttr.in/" + city, params={"format": "3"}
        )
    return resp.text

That's it. No registration list to edit — the @register_tool decorator adds it to the registry, its JSON schema is sent to the LLM automatically, and the loop can call it. Mark a tool terminal=True to make calling it end the run (that's how task_complete works).

Each tool receives a ToolContext carrying shared dependencies (settings, the OpenAI client, the Mongo database) so tools stay easy to test.


Example run

Goal: "What's the cheapest in-stock book in the catalog, and is it cheaper than a USB-C Hub?"

StepToolArgsObservation (truncated)
1db_query{"question": "cheapest in-stock book"}Clean Code — $33.99, in_stock: true
2db_query{"question": "price of USB-C Hub"}USB-C Hub — $39.99, in_stock: false
3code_execution{"code": "print(33.99 < 39.99)"}stdout: True
4task_complete{"answer": "..."}(ends run)

Final answer: "The cheapest in-stock book is Clean Code at $33.99. Yes — it's cheaper than the USB-C Hub ($39.99) by $6.00 (and the hub is out of stock anyway)."

You can watch each of these steps stream in as collapsible cards in the UI, and replay any past run from the history sidebar.


API reference

MethodPathDescription
POST/api/runsStart a new run. Body: { "goal": "...", "model"?, "max_iterations"? }. Returns { "run_id" }.
GET/api/runs/{run_id}/streamServer-Sent Events: each step, then final, then done. Replays persisted steps for finished runs.
GET/api/runs/{run_id}Full run document with every step.
GET/api/runs?limit=50Recent runs (compact summaries).
GET/api/healthLiveness probe.

Run document shape (agent_runs collection)

{
  "run_id": "…",
  "goal": "…",
  "status": "running | completed | failed | max_iter",
  "created_at": "…", "completed_at": "…",
  "model": "gpt-4o-mini",
  "steps": [
    {
      "step_number": 1,
      "tool_name": "db_query",
      "tool_args": { "question": "…" },
      "tool_result": "…",
      "llm_response_raw": { "…": "…" },
      "tokens_used": 812,
      "latency_ms": 640,
      "timestamp": "…"
    }
  ],
  "final_answer": "…",
  "total_tokens": 3120,
  "total_latency_ms": 4210
}

Tech stack

Backend

  • Python 3.11+, FastAPI (async)
  • openai ≥ 1.0 (function calling)
  • motor (async MongoDB driver)
  • httpx (async HTTP for tools)
  • pydantic v2 + pydantic-settings (typed models & config)
  • tiktoken (token accounting)

Frontend

  • Next.js 14 (App Router) + React 18
  • TypeScript (everything typed)
  • Tailwind CSS
  • EventSource for SSE streaming

Infrastructure

  • MongoDB 7
  • Docker + Docker Compose

Project structure

agentforge/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI app + CORS + lifespan + SSE broker
│   │   ├── config.py            # Settings via pydantic-settings
│   │   ├── agent/
│   │   │   ├── loop.py          # The core plan-act-observe loop
│   │   │   ├── context.py       # Message history + summarization
│   │   │   └── schemas.py       # Pydantic models
│   │   ├── tools/
│   │   │   ├── registry.py      # @register_tool decorator + discovery
│   │   │   ├── web_search.py
│   │   │   ├── code_exec.py
│   │   │   ├── db_query.py
│   │   │   └── task_complete.py
│   │   └── storage/
│   │       └── mongo.py         # Run persistence + sample data seeding
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   ├── src/app/
│   │   ├── page.tsx             # Main chat UI
│   │   ├── lib/api.ts           # Typed API client + SSE helpers
│   │   └── components/
│   │       ├── ChatInput.tsx
│   │       ├── StepCard.tsx     # Collapsible tool-call display
│   │       └── RunHistory.tsx
│   ├── package.json
│   └── Dockerfile
├── docker-compose.yml
├── .env.example
├── README.md
└── LICENSE

A note on the code sandbox

The code_execution tool runs snippets in a fresh, isolated Python subprocess (-I isolated mode, a temporary working directory, a scrubbed environment, and a hard timeout). This is appropriate for trusted/educational use. It is not a hardened security boundary — for untrusted input, run it inside a container with seccomp/gVisor and resource limits.


Built as a side project exploring agentic AI workflows.

License

MIT — see LICENSE.