RunPod GPU Performance¶

Performance benchmarks and optimization guide for RunPod GPU transcription workers.

GPU Comparison¶

Parakeet-Compatible GPUs¶

GPU	Price/hr	Speed	$/episode-hr	Episodes/$
RTX A4000	$0.16	~50x realtime	$0.0032	312
RTX A5000	$0.22	~87x realtime	$0.0025	395
RTX 3090	$0.30	~80x realtime	$0.0038	267
RTX A6000	$0.45	~110x realtime	$0.0041	244

Best Value

RTX A5000 offers the best cost per episode despite not being the fastest or cheapest GPU.

Blocked GPUs (CUDA Error 35)¶

These GPUs fail with NeMo/Parakeet due to CUDA compatibility issues (Ada Lovelace architecture):

NVIDIA GeForce RTX 4090
NVIDIA GeForce RTX 4080
NVIDIA L4

Ampere GPUs (A-series, RTX 30-series) work fine.

Bandwidth Analysis¶

Upload Capacity vs GPU Speed¶

At typical podcast bitrate (~128 kbps = 0.96 MB/min of audio):

Bandwidth	MB/s	Realtime Equivalent
25 Mbit/s	3.1	~195x
50 Mbit/s	6.25	~390x
100 Mbit/s	12.5	~780x

Key Finding

Even the fastest GPU (~110x) only needs ~1 Mbit/s to stay saturated. At 50 Mbit/s upload, you can feed ~4-5 pods before bandwidth becomes a bottleneck.

Multi-Pod Scaling¶

Test Results (2x A5000 + MacBook)¶

Configuration:

2x RunPod A5000 pods (Parakeet)
1x MacBook local worker (Whisper large-v3-turbo)
50 Mbit/s upload bandwidth

Observed Performance:

Last hour: 79 episodes, 8271 audio minutes
Throughput: 138 hours of audio per wall-clock hour
Both pods stayed constantly busy (no idle time)

Per-Node Stats (24h sample)¶

Node	Jobs	Avg Time	Notes
RunPod A5000 (1)	78	320 sec	Parakeet
RunPod A5000 (2)	41	343 sec	Parakeet (newer pod)
MacBook	326	344 sec	Whisper large-v3-turbo

Scaling Recommendations¶

When to Add Pods¶

Queue Size	Pods	Reasoning
< 50	1	Single pod sufficient
50-200	2	Parallel processing, no bandwidth issues
200-500	3	Still within 50 Mbit/s capacity
> 500	3-4	Consider auto-scale

Bottlenecks (in order)¶

GPU processing -- primary bottleneck, scales with pods
Episode length -- longer episodes = lower throughput
Job coordination -- minor overhead between jobs
Bandwidth -- only limiting at 5+ pods

Verification¶

To confirm pods aren't bandwidth-limited:

# Check if pods are busy
curl -s https://server/api/nodes | jq '.nodes[] | {name, status}'

# Check running jobs (should be 2-3x pod count for prefetch)
curl -s https://server/api/queue/status | jq '.transcribe_queue.running'

Cost Optimization¶

A5000 Economics¶

Metric	Value
Hourly cost	~$0.22
Processing speed	87x realtime
Cost per episode-hour	$0.0025
100 episodes (avg 2 hrs each)	~$0.50

Comparison to Local Processing¶

Method	Speed	Cost/100 episodes
RunPod A5000	87x	$0.50
MacBook M1 (MLX)	~15x	Free (electricity)
Server CPU	~1x	Free (electricity)

Cost Control Tips¶

Start pods only when queue has work -- empty queue = wasted billing
Use auto-scale wisely -- only enable for regular large backlogs
Monitor stuck jobs -- 10-min idle timeout catches these
Server reliability -- 5-min unreachable timeout prevents orphaned pods