Distributed Transcription Architecture¶

This document describes the architecture of cast2md's distributed transcription system, which enables remote machines to process transcription jobs in parallel with the main server.

Overview¶

The distributed transcription system allows you to leverage multiple machines (M4 MacBooks, GPU PCs, RunPod pods) to transcribe podcast episodes faster. The main cast2md server acts as a coordinator, while remote "transcriber nodes" poll for work, process jobs locally, and upload results.

┌─────────────────────────────────────────────────────────────┐
│                     Main Server (cast2md)                    │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ Job Queue   │  │ Node        │  │ Audio Storage       │  │
│  │ (PostgreSQL)│  │ Registry    │  │ (filesystem)        │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              RemoteTranscriptionCoordinator          │    │
│  │  - Monitors node heartbeats                          │    │
│  │  - Reclaims stuck jobs                               │    │
│  │  - Tracks node status                                │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Local Transcription Worker              │    │
│  │  - Processes jobs not claimed by nodes               │    │
│  │  - Works in parallel with remote nodes               │    │
│  └─────────────────────────────────────────────────────┘    │
└──────────────────────────┬──────────────────────────────────┘
                           │
                    HTTP API (pull-based)
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
         ▼                 ▼                 ▼
   ┌───────────┐     ┌───────────┐     ┌───────────┐
   │  Node A   │     │  Node B   │     │  Node C   │
   │  M4 Mac   │     │  GPU PC   │     │  RunPod   │
   └───────────┘     └───────────┘     └───────────┘

Design Principles¶

Pull-Based Model¶

Nodes actively poll the server for work rather than the server pushing jobs. This provides:

NAT/Firewall Friendly -- nodes behind NAT work without configuration since they initiate all connections
Natural Load Balancing -- nodes only request work when ready, preventing overload
Simple Fault Tolerance -- if a node disappears, its job times out and becomes available again
No Node Discovery -- server doesn't need to know how to reach nodes

Parallel Processing¶

The local transcription worker and remote nodes work simultaneously:

Local worker processes jobs with assigned_node_id IS NULL
When a node claims a job, local worker skips it and moves to the next unclaimed job
This maximizes throughput when batching many episodes

Trusted Network Assumption¶

The system assumes operation on a trusted network (Tailscale, local LAN):

Simple API key authentication (no complex auth flows)
No HTTPS required (network already encrypted via Tailscale/WireGuard)
API keys generated on registration, stored locally on nodes

Server Components¶

TranscriberNode Model¶

@dataclass
class TranscriberNode:
    id: str                      # UUID
    name: str                    # Human-readable name
    url: str                     # Node's URL for connectivity tests
    api_key: str                 # Shared secret for authentication
    whisper_model: str | None    # Model configured on node
    whisper_backend: str | None  # "mlx" or "faster-whisper"
    status: NodeStatus           # online/offline/busy
    last_heartbeat: datetime     # Last heartbeat timestamp
    current_job_id: int | None   # Job being processed
    priority: int                # Lower = preferred for job assignment

RemoteTranscriptionCoordinator¶

Background thread that manages the distributed system:

Node health monitoring -- marks nodes offline if no heartbeat within timeout
Job reclamation -- resets jobs running too long on nodes
Check interval: 30 seconds

Node API Endpoints¶

Registration & Heartbeat:

Endpoint	Method	Description
`/api/nodes/register`	POST	Register new node, returns credentials
`/api/nodes/{id}/heartbeat`	POST	Keep-alive ping (every 30s)

Job Processing:

Endpoint	Method	Description
`/api/nodes/{id}/claim`	POST	Claim next available job
`/api/nodes/jobs/{job_id}/audio`	GET	Download audio file
`/api/nodes/jobs/{job_id}/complete`	POST	Submit transcript
`/api/nodes/jobs/{job_id}/fail`	POST	Report failure
`/api/nodes/jobs/{job_id}/release`	POST	Release job back to queue

Admin:

Endpoint	Method	Description
`/api/nodes`	GET	List all nodes
`/api/nodes`	POST	Manually add node
`/api/nodes/{id}`	DELETE	Remove node
`/api/nodes/{id}/test`	POST	Test connectivity

Node Components¶

NodeConfig¶

Manages node credentials stored in ~/.cast2md/node.json:

{
  "server_url": "http://server:8000",
  "node_id": "uuid-from-registration",
  "api_key": "generated-api-key",
  "name": "M4 MacBook"
}

TranscriberNodeWorker¶

Main worker class that:

Polls for jobs every 5 seconds
Sends heartbeats every 30 seconds
Processes jobs: download audio -> transcribe with local Whisper -> upload transcript

Node Prefetch Queue¶

The node worker uses a 3-slot prefetch queue to keep audio ready for instant transcription. This is important for backends like Parakeet that transcribe faster than download speed.

Data Flow¶

Job Lifecycle¶

1. Episode queued for transcription
   └─> job_queue entry created (status=queued, assigned_node_id=NULL)

2. Node polls /api/nodes/{id}/claim
   └─> Server finds unclaimed job
   └─> Updates job: status=running, assigned_node_id=<node>, claimed_at=<now>
   └─> Returns job details + audio URL

3. Node downloads audio via /api/nodes/jobs/{id}/audio
   └─> Server streams audio file

4. Node transcribes locally
   └─> Uses configured Whisper model/backend

5. Node submits result via /api/nodes/jobs/{id}/complete
   └─> Server saves transcript
   └─> Server updates episode status
   └─> Server indexes transcript for search
   └─> Server marks job completed
   └─> Server updates node status to online

Heartbeat Flow¶

Every 30 seconds:
  Node → POST /api/nodes/{id}/heartbeat
       → Server updates last_heartbeat
       → Server marks node online if was offline

Every 30 seconds (coordinator):
  Server checks for stale nodes (no heartbeat > 60s)
       → Marks stale nodes offline
       → Reclaims their jobs (if running > timeout)

Job State Synchronization¶

When the server restarts while nodes are processing jobs:

reset_running_jobs() only resets jobs with assigned_node_id IS NULL (local server jobs)
Remote node jobs keep their assignment -- the coordinator's timeout handles truly dead nodes
Nodes report state in each heartbeat (current_job_id, claimed_job_ids) enabling resync

Failure Handling¶

Scenario	Detection	Resolution
Node disappears mid-job	Heartbeat timeout (60s)	Job reclaimed after timeout
Node graceful shutdown	SIGTERM/SIGINT handler	Job released immediately via API
Network fails on upload	Node retry with backoff	Store locally, retry on restart
Server restarts	Node continues, resubmits	Accept result if job exists
Audio corrupted	Transcription error	Mark failed, can retry
Node crashes	No heartbeat	Marked offline, job reclaimed

Graceful Shutdown¶

Server: On SIGTERM/SIGINT, workers are stopped gracefully. On restart, reset_orphaned_jobs() resets jobs left in "running" state.

Node: On SIGTERM/SIGINT (or Ctrl+C), the node releases its current job back to the queue via the release API, making it immediately available for another worker.

Security¶

API Key Authentication -- all node requests require X-Transcriber-Key header
Network Security -- designed for trusted networks (Tailscale, LAN)
No Secrets in URLs -- API keys in headers, not URL parameters
Job Ownership -- nodes can only access jobs assigned to them

Performance Characteristics¶

Parameter	Value
Polling interval	5 seconds
Heartbeat interval	30 seconds
Coordinator check interval	30 seconds
Default job timeout	30 minutes
Audio transfer	Streamed, not buffered in memory
Prefetch queue	3 slots

Limitations¶

Single Job Per Node -- each node processes one job at a time
No Job Priorities for Nodes -- all nodes see the same queue (priority ordering)
No Partial Progress -- if node fails mid-transcription, job restarts from scratch
Trust Required -- API keys provide authentication, not fine-grained authorization