Distributed Task Queue & Job Orchestrator

Production-grade distributed task queue system built with Python, FastAPI, and Redis Streams.

Features

Distributed Task Queue - Handle 10k+ jobs/min with horizontal scaling
Redis Streams + Consumer Groups - At-least-once delivery guarantee
FastAPI Control Plane - RESTful API for job lifecycle management
Retry & Backoff - Exponential backoff with configurable retry limits
Dead Letter Queue - Automatic handling of failed jobs after max retries
Docker Ready - Containerized deployment with Nginx reverse proxy
AWS EC2 Ready - Designed for zero-downtime rolling updates

Project Structure

app/
  config.py              # Configuration management
  models.py              # Pydantic models and enums
  redis_client.py        # Async Redis client
  api/                   # FastAPI routes
  worker/                # Worker processes
docker/                  # Docker configuration
ui/                      # React frontend

Quick Start

Prerequisites

Python 3.10+
Redis 7+
Docker & Docker Compose (for containerized deployment)

Local Development

Quick Start (Recommended)

Run both backend and frontend together:

Windows:

.\run-dev.ps1

Linux/Mac:

chmod +x run-dev.sh
./run-dev.sh

This will:

Start Redis, API, Worker, and Nginx via Docker Compose
Start the React frontend development server
Make everything available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- Nginx (proxied): http://localhost:80

To stop all services:

.\stop-dev.ps1

./stop-dev.sh

Manual Setup

Install backend dependencies:

pip install -r requirements.txt

Install frontend dependencies:

cd ui
npm install

Start backend services:

cd docker
docker-compose up -d

Start frontend:

cd ui
npm run dev

Configuration

See .env.example for all configuration options. Key settings:

REDIS_URL - Redis connection string
JOB_STREAM - Redis stream name for jobs
DLQ_STREAM - Dead letter queue stream name
MAX_RETRIES - Maximum retry attempts
INITIAL_BACKOFF_MS - Initial backoff delay in milliseconds
MAX_BACKOFF_MS - Maximum backoff delay in milliseconds

API Endpoints

Health

GET /health/live - Liveness probe
GET /health/ready - Readiness probe

Jobs

POST /jobs - Create a new job
GET /jobs/{job_id} - Get job status
GET /jobs - List jobs (paginated)
POST /jobs/{job_id}/cancel - Cancel a job

Metrics

GET /metrics - Job counts by status and DLQ depth

Deployment

Render + Netlify/Vercel (Recommended for Quick Deployment)

This deployment approach uses Render for the backend and Netlify/Vercel for the frontend, with no Docker required.

Backend (Render)

Create a new Web Service on Render:
- Connect your GitHub repository
- Select "Web Service" type
- Environment: Python
Configure Build & Start:
- Build Command: pip install -r requirements.txt
- Start Command (Option 1 - Recommended): ./render_backend_start.sh
- Start Command (Option 2 - Alternative): uvicorn app.api.main:app --host 0.0.0.0 --port $PORT
Note: If you get "No such file or directory" error, use Option 2 (direct command) instead.
Set Environment Variables:
- REDIS_URL - Your Redis connection string (Render Redis or external)
- FRONTEND_ORIGIN - Your frontend URL (e.g., https://your-app.netlify.app)
- ENVIRONMENT - Set to production to disable dev endpoints
Deploy:
- Render will automatically build and deploy on every push to main
- Your backend will be available at https://your-service.onrender.com

Note: Make sure render_backend_start.sh is executable. Render will handle this automatically, but if deploying manually, run:

chmod +x render_backend_start.sh

Frontend (Netlify or Vercel)

For Netlify:

Create a new site:
- Connect your GitHub repository
- Base directory: ui
- Build command: npm run build
- Publish directory: ui/dist
Set Environment Variables:
- VITE_API_BASE_URL - Your Render backend URL (e.g., https://your-service.onrender.com)
Deploy:
- Netlify will automatically build and deploy on every push to main

For Vercel:

Import your repository:
- Select the repository
- Root directory: ui
- Framework preset: Vite
Set Environment Variables:
- VITE_API_BASE_URL - Your Render backend URL (e.g., https://your-service.onrender.com)
Deploy:
- Vercel will automatically build and deploy on every push to main

Local Development with Proxy

For local development, the frontend uses a proxy to the backend:

Backend:

uvicorn app.api.main:app --reload --port 8000

Frontend:

cd ui
npm run dev

The frontend will proxy /api/* requests to http://localhost:8000, so API calls work seamlessly in development without CORS issues.

Docker Compose (Local)

Start all services:

cd docker
docker-compose up -d

Verify services are running:

docker-compose ps

Test the API:

curl http://localhost/health/live

Load Testing

Using Locust

A Locust load test script is provided in scripts/load_test/locustfile.py.

Prerequisites:

pip install locust

Run Load Test:

# Start backend services first
cd docker
docker-compose up -d

# Run Locust
cd ../scripts/load_test
locust -f locustfile.py --host=http://localhost:8000

# Open browser to http://localhost:8089
# Configure users and spawn rate, then start test

Observe Throughput:

Monitor /metrics endpoint for jobs_created_total and jobs_completed_total
Check worker logs for processing rate
Use Locust dashboard to see request rates

Example Test Scenarios:

Light Load: 10 users, spawn rate 2
Medium Load: 50 users, spawn rate 5
Heavy Load: 200 users, spawn rate 10

Synthetic Job Generation

For quick testing without load tools, use the dev endpoint:

# Generate 1000 synthetic jobs
curl -X POST http://localhost:8000/dev/generate-jobs \
  -H "Content-Type: application/json" \
  -d '{
    "count": 1000,
    "partition_key_prefix": "test",
    "task_type": "synthetic",
    "payload_template": {"test": true}
  }'

Note: This endpoint is disabled in production (environment=production).

AWS EC2 Deployment

Deployment Topology

The system is designed for deployment on AWS EC2 with the following architecture:

                    ┌─────────────────┐
                    │  Load Balancer  │
                    │   (ALB/NLB)     │
                    └────────┬────────┘
                             │
                ┌────────────┴────────────┐
                │                         │
        ┌───────▼──────┐         ┌────────▼──────┐
        │   EC2-1      │         │    EC2-2      │
        │              │         │               │
        │  ┌────────┐  │         │  ┌────────┐   │
        │  │ Nginx  │  │         │  │ Nginx  │   │
        │  └───┬────┘  │         │  └───┬────┘   │
        │      │       │         │      │        │
        │  ┌───▼────┐ │         │  ┌───▼────┐    │
        │  │  API   │ │         │  │  API  │    │
        │  └───┬────┘ │         │  └───┬────┘    │
        │      │       │         │      │        │
        │  ┌───▼────┐ │         │  ┌───▼────┐    │
        │  │ Worker │ │         │  │ Worker │    │
        │  └───┬────┘ │         │  └───┬────┘    │
        │      │       │         │      │        │
        └──────┼───────┘         └──────┼────────┘
               │                         │
               └──────────┬───────────────┘
                         │
                  ┌──────▼──────┐
                  │ Redis       │
                  │ (ElastiCache)│
                  └──────────────┘

Components:

Load Balancer: AWS Application Load Balancer (ALB) or Network Load Balancer (NLB)
EC2 Instances: 2+ instances running the full stack (Nginx, API, Worker)
Redis: AWS ElastiCache for Redis (or self-managed Redis cluster)

EC2 Instance Setup

Launch EC2 Instances:
- Use Amazon Linux 2 or Ubuntu 22.04 LTS
- Minimum: t3.medium (2 vCPU, 4GB RAM)
- Recommended: t3.large or larger for production
- Security Group: Allow HTTP (80), HTTPS (443), and SSH (22)
Install Dependencies:

# Update system
sudo yum update -y  # Amazon Linux
# or
sudo apt-get update && sudo apt-get upgrade -y  # Ubuntu

# Install Docker
sudo yum install docker -y  # Amazon Linux
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker ec2-user

# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

Deploy Application:

# Clone repository
git clone <repository-url>
cd DTQ

# Configure environment
cp .env.example .env
# Edit .env with production settings:
# - REDIS_URL pointing to ElastiCache endpoint
# - ENVIRONMENT=production

# Build and start services
cd docker
docker-compose up -d --build

Load Balancer Configuration

Create Target Group:
- Protocol: HTTP
- Port: 80
- Health Check Path: /health/ready
- Health Check Interval: 30 seconds
- Unhealthy Threshold: 3
Register EC2 Instances:
- Add all EC2 instances to the target group
- Ensure instances pass health checks
Configure Listener:
- HTTP (80) → Target Group
- Optional: HTTPS (443) with SSL certificate

Rolling Update Strategy

Zero-downtime deployments using rolling updates:

Prepare New Version:

# On deployment server or CI/CD
git pull origin main
docker-compose build

Deploy to Instance 1:

# SSH to EC2-1
cd /path/to/DTQ

# Drain instance from load balancer
# (Remove from target group via AWS Console/CLI)

# Wait for in-flight requests to complete (30-60 seconds)

# Deploy new version
git pull origin main
cd docker
docker-compose down
docker-compose up -d --build

# Verify health
curl http://localhost/health/ready

# Re-register instance to load balancer
# (Add back to target group via AWS Console/CLI)

# Wait for health checks to pass

Deploy to Instance 2:
- Repeat steps from Instance 1
Verification:

# Check all instances are healthy
aws elbv2 describe-target-health --target-group-arn <target-group-arn>

# Test API endpoints
curl https://<load-balancer-dns>/health/live
curl https://<load-balancer-dns>/metrics/

Redis Setup

Option 1: AWS ElastiCache (Recommended)

Create ElastiCache Redis cluster
Update REDIS_URL in .env to ElastiCache endpoint
Ensure EC2 security group allows access to ElastiCache

Option 2: Self-Managed Redis

Deploy Redis on dedicated EC2 instance or cluster
Use Redis Sentinel for high availability
Configure Redis persistence (AOF)

Monitoring & Observability

Health Checks:
- /health/live - Liveness probe (Kubernetes/ECS)
- /health/ready - Readiness probe
Metrics:
- /metrics/ - Job counts by status, DLQ depth
- Integrate with CloudWatch or Prometheus
Logging:
- Container logs: docker-compose logs -f api worker
- CloudWatch Logs: Configure log drivers

Scaling Considerations

Horizontal Scaling: Add more EC2 instances behind load balancer
Worker Scaling: Increase worker replicas per instance or add dedicated worker instances
Redis Scaling: Use ElastiCache cluster mode for Redis scaling
Auto Scaling: Configure ASG based on CPU/memory metrics

Security Best Practices

Network Security:
- Use VPC with private subnets for EC2 instances
- Use security groups to restrict access
- Enable SSL/TLS termination at load balancer
Secrets Management:
- Use AWS Secrets Manager or Parameter Store for sensitive config
- Never commit .env files to version control
Access Control:
- Use IAM roles for EC2 instances
- Restrict SSH access to bastion host

License

MIT

Distributed Task Queue Job Orchestrator