Performance Tuning

Overview

Sockudo is designed for high-performance real-time communication. This guide covers configuration options and best practices for tuning Sockudo to handle high concurrent connection loads efficiently.

Connection Pool Optimization

Database Connection Pools

Connection pooling is critical for production deployments using external databases. Environment Variables:

# Global pooling settings (applies to all SQL backends)
DATABASE_POOLING_ENABLED=true
DATABASE_POOL_MIN=2
DATABASE_POOL_MAX=10

# MySQL-specific overrides (takes precedence when set)
DATABASE_MYSQL_POOL_MIN=4
DATABASE_MYSQL_POOL_MAX=32

# PostgreSQL-specific overrides
DATABASE_POSTGRES_POOL_MIN=2
DATABASE_POSTGRES_POOL_MAX=16

Configuration File:

{
  "database": {
    "pooling_enabled": true,
    "pool_min": 2,
    "pool_max": 10
  }
}

DynamoDB uses the AWS SDK client which manages its own connection behavior. Pool settings do not apply to DynamoDB.

Redis Connection Pools

# Redis connection pool size
REDIS_CONNECTION_POOL_SIZE=10

Recommended Pool Sizes by Deployment Size:

Deployment Size	Database Pool Max	Redis Pool Size
Small (1-1K connections)	5-10	5
Medium (1K-10K connections)	10-20	10
Large (10K-50K connections)	20-40	20
Extra Large (50K+ connections)	40-100	50

WebSocket Buffer Configuration

Sockudo uses bounded buffers to protect against slow consumers that can’t keep up with message delivery.

Buffer Limit Modes

Mode 1: Message Count Only (Default - Fastest)

WEBSOCKET_MAX_MESSAGES=1000
WEBSOCKET_MAX_BYTES=none
WEBSOCKET_DISCONNECT_ON_BUFFER_FULL=true

{
  "websocket": {
    "max_messages": 1000,
    "max_bytes": null,
    "disconnect_on_buffer_full": true
  }
}

Mode 2: Byte Size Only (Precise Memory Control)

WEBSOCKET_MAX_MESSAGES=none
WEBSOCKET_MAX_BYTES=1048576  # 1MB
WEBSOCKET_DISCONNECT_ON_BUFFER_FULL=true

{
  "websocket": {
    "max_messages": null,
    "max_bytes": 1048576,
    "disconnect_on_buffer_full": true
  }
}

Mode 3: Both Limits (Most Precise)

WEBSOCKET_MAX_MESSAGES=1000
WEBSOCKET_MAX_BYTES=1048576
WEBSOCKET_DISCONNECT_ON_BUFFER_FULL=true

Buffer Behavior

When disconnect_on_buffer_full: true → Connection is closed with error code 4100
When disconnect_on_buffer_full: false → New messages are dropped silently (logged as warning)

Performance Characteristics

Mode	Overhead	Memory Control
Message-only	Zero (uses bounded channel)	Approximate
Byte-only	~1-2ns per message	Precise
Both	Atomic counter + channel check	Most precise

Memory Estimation

Message-only mode: ~1-2KB per message (typical)
Byte-only mode: Exact memory limit (e.g., 1MB = 1MB max)
10,000 connections with 1MB byte limit: ~10GB worst case

Cleanup Queue Configuration

The async cleanup queue processes WebSocket disconnections in the background to prevent mass disconnections from blocking new connections.

Configuration Options

# Enable async cleanup (recommended)
CLEANUP_ASYNC_ENABLED=true

# Fallback to sync if queue fails
CLEANUP_FALLBACK_TO_SYNC=true

# Queue buffer size per worker
CLEANUP_QUEUE_BUFFER_SIZE=10000

# Tasks processed per batch per worker
CLEANUP_BATCH_SIZE=25

# Max wait time to fill batch (milliseconds)
CLEANUP_BATCH_TIMEOUT_MS=50

# Number of cleanup worker threads ("auto" or specific number)
CLEANUP_WORKER_THREADS=auto

# Retry attempts before giving up
CLEANUP_MAX_RETRY_ATTEMPTS=2

Deployment Scenarios

Small Deployment (1vCPU/1GB RAM)

Use Case: Development, testing, small production instances

{
  "cleanup": {
    "async_enabled": true,
    "queue_buffer_size": 500,
    "batch_size": 10,
    "batch_timeout_ms": 100,
    "worker_threads": 1,
    "max_retry_attempts": 1
  }
}

Memory Usage: ~300KB queue buffer
CPU Impact: Minimal (1 worker)
Latency: 100ms max cleanup delay

Standard Deployment (2vCPU/2GB RAM) - Recommended

Use Case: Most common production deployments

{
  "cleanup": {
    "async_enabled": true,
    "queue_buffer_size": 50000,
    "batch_size": 25,
    "batch_timeout_ms": 50,
    "worker_threads": "auto",
    "max_retry_attempts": 2
  }
}

Memory Usage: ~30MB queue buffer per worker
CPU Impact: Low (auto selects 1 worker for 2vCPU)
Latency: 50ms max cleanup delay

High-Traffic Deployment (4vCPU/4GB+ RAM)

Use Case: High concurrent connection loads (>10K connections)

{
  "cleanup": {
    "async_enabled": true,
    "queue_buffer_size": 10000,
    "batch_size": 100,
    "batch_timeout_ms": 25,
    "worker_threads": 2,
    "max_retry_attempts": 3
  }
}

Memory Usage: ~6MB per worker (total: ~12MB with 2 workers)
CPU Impact: Moderate (2 workers)
Latency: 25ms max cleanup delay

Ultra High-Traffic Deployment (8vCPU/8GB+ RAM)

Use Case: Massive scale deployments (>50K connections)

{
  "cleanup": {
    "async_enabled": true,
    "queue_buffer_size": 50000,
    "batch_size": 500,
    "batch_timeout_ms": 10,
    "worker_threads": 4,
    "max_retry_attempts": 3
  }
}

Memory Usage: ~30MB per worker (total: ~120MB with 4 workers)
CPU Impact: High (4 workers)
Latency: 10ms max cleanup delay

Worker Threads Scaling

The worker_threads setting supports:

Fixed number: Specify exact worker count (e.g., 2)
Auto-detection: Use "auto" to scale based on CPU cores

When using "auto", the system uses 25% of available CPU cores (minimum 1, maximum 4):

1-7 CPUs → 1 worker
8-11 CPUs → 2 workers
12-15 CPUs → 3 workers
16+ CPUs → 4 workers

All configuration values (except worker_threads) are applied per worker, not as total system capacity.

Adapter Performance Tuning

Redis/Redis Cluster

# Queue processing concurrency
QUEUE_REDIS_CONCURRENCY=5

# Redis Cluster queue concurrency
REDIS_CLUSTER_QUEUE_CONCURRENCY=5

# Key prefix for namespace isolation
DATABASE_REDIS_KEY_PREFIX=sockudo:

NATS Configuration

# NATS servers (comma-separated)
NATS_SERVERS=nats://nats1:4222,nats://nats2:4222

# Connection timeouts (milliseconds)
NATS_CONNECTION_TIMEOUT_MS=5000
NATS_REQUEST_TIMEOUT_MS=5000

Socket Counting

Socket counting has performance overhead. Disable if you don’t need the get_sockets_count API.

# Disable socket counting for better performance
ADAPTER_ENABLE_SOCKET_COUNTING=false

When disabled, get_sockets_count returns 0 to avoid the overhead of tracking connection counts.

CPU Scaling Considerations

Worker Thread Auto-Scaling

Sockudo automatically scales cleanup workers based on available CPU:

# Use auto-detection (recommended)
CLEANUP_WORKER_THREADS=auto

This allocates 25% of CPU cores to cleanup, leaving 75% for main WebSocket processing.

Manual CPU Allocation

For fine-grained control:

# 4 CPU cores: Allocate 1 worker manually
CLEANUP_WORKER_THREADS=1

# 16 CPU cores: Allocate 4 workers manually
CLEANUP_WORKER_THREADS=4

Cache Configuration

# Cache driver selection
CACHE_DRIVER=redis  # Options: memory, redis, redis-cluster, none

# Cache TTL (seconds)
CACHE_TTL_SECONDS=300

# Memory cache settings
CACHE_CLEANUP_INTERVAL=60
CACHE_MAX_CAPACITY=10000

Configuration File:

{
  "cache": {
    "driver": "redis",
    "memory": {
      "ttl": 300,
      "cleanup_interval": 60,
      "max_capacity": 10000
    }
  }
}

Rate Limiting Configuration

# Enable rate limiting
RATE_LIMITER_ENABLED=true

# Rate limiter backend
RATE_LIMITER_DRIVER=redis

# API rate limiting
RATE_LIMITER_API_MAX_REQUESTS=100
RATE_LIMITER_API_WINDOW_SECONDS=60

# WebSocket rate limiting
RATE_LIMITER_WS_MAX_REQUESTS=20
RATE_LIMITER_WS_WINDOW_SECONDS=60

Performance Monitoring

Prometheus Metrics

Sockudo exposes metrics at /metrics (port 9601 by default):

METRICS_ENABLED=true
METRICS_HOST=0.0.0.0
METRICS_PORT=9601
METRICS_PROMETHEUS_PREFIX=sockudo_

Key Metrics to Monitor

sockudo_websocket_connections_total - Total active connections
sockudo_messages_received_total - Incoming message rate
sockudo_messages_sent_total - Outgoing message rate
sockudo_cleanup_queue_size - Cleanup queue depth
sockudo_adapter_operations_duration_seconds - Adapter operation latency

Quick Reference Table

Configuration by Server Size

Server Spec	queue_buffer_size	batch_size	batch_timeout_ms	worker_threads	pool_max
1vCPU/1GB	500	10	100	1	5
2vCPU/2GB	50000	25	50	auto (1)	10
4vCPU/4GB	10000	100	25	2	20
8vCPU/8GB	50000	500	10	4	40

Environment Variables Quick Reference

# Connection Pools
DATABASE_POOLING_ENABLED=true
DATABASE_POOL_MIN=2
DATABASE_POOL_MAX=10
REDIS_CONNECTION_POOL_SIZE=10

# WebSocket Buffers
WEBSOCKET_MAX_MESSAGES=1000
WEBSOCKET_MAX_BYTES=1048576
WEBSOCKET_DISCONNECT_ON_BUFFER_FULL=true

# Cleanup Queue
CLEANUP_ASYNC_ENABLED=true
CLEANUP_QUEUE_BUFFER_SIZE=10000
CLEANUP_BATCH_SIZE=25
CLEANUP_BATCH_TIMEOUT_MS=50
CLEANUP_WORKER_THREADS=auto

# Performance
ADAPTER_ENABLE_SOCKET_COUNTING=false
QUEUE_REDIS_CONCURRENCY=5
CACHE_TTL_SECONDS=300

Best Practices

Start Conservative: Begin with standard deployment settings and tune based on metrics
Monitor Actively: Watch queue health and connection latency during initial deployment
Test Load: Run mass disconnection tests before production
Use Auto-Scaling: Let CLEANUP_WORKER_THREADS=auto handle CPU allocation
Profile Regularly: Use Prometheus metrics to identify bottlenecks
Disable Unused Features: Turn off socket counting if not needed
Use Redis for Scale: Switch to Redis/Redis Cluster for multi-node deployments

Next Steps

Troubleshooting - Debug common performance issues
Monitoring & Metrics - Set up comprehensive monitoring
Security Best Practices - Secure your deployment

Integration Guides

Operations

Performance Tuning

Overview

Connection Pool Optimization

Database Connection Pools

Redis Connection Pools

WebSocket Buffer Configuration

Buffer Limit Modes

Buffer Behavior

Performance Characteristics

Memory Estimation

Cleanup Queue Configuration

Configuration Options

Deployment Scenarios

Worker Threads Scaling

Adapter Performance Tuning

Redis/Redis Cluster

NATS Configuration

Socket Counting

CPU Scaling Considerations

Worker Thread Auto-Scaling

Manual CPU Allocation

Cache Configuration

Rate Limiting Configuration

Performance Monitoring

Prometheus Metrics

Key Metrics to Monitor

Quick Reference Table

Configuration by Server Size

Environment Variables Quick Reference

Best Practices

Next Steps

Integration Guides

Operations

Documentation Index

​Overview

​Connection Pool Optimization

​Database Connection Pools

​Redis Connection Pools

​WebSocket Buffer Configuration

​Buffer Limit Modes

​Buffer Behavior

​Performance Characteristics

​Memory Estimation

​Cleanup Queue Configuration

​Configuration Options

​Deployment Scenarios

​Worker Threads Scaling

​Adapter Performance Tuning

​Redis/Redis Cluster

​NATS Configuration

​Socket Counting

​CPU Scaling Considerations

​Worker Thread Auto-Scaling

​Manual CPU Allocation

​Cache Configuration

​Rate Limiting Configuration

​Performance Monitoring

​Prometheus Metrics

​Key Metrics to Monitor

​Quick Reference Table

​Configuration by Server Size

​Environment Variables Quick Reference

​Best Practices

​Next Steps

Overview

Connection Pool Optimization

Database Connection Pools

Redis Connection Pools

WebSocket Buffer Configuration

Buffer Limit Modes

Buffer Behavior

Performance Characteristics

Memory Estimation

Cleanup Queue Configuration

Configuration Options

Deployment Scenarios

Worker Threads Scaling

Adapter Performance Tuning

Redis/Redis Cluster

NATS Configuration

Socket Counting

CPU Scaling Considerations

Worker Thread Auto-Scaling

Manual CPU Allocation

Cache Configuration

Rate Limiting Configuration

Performance Monitoring

Prometheus Metrics

Key Metrics to Monitor

Quick Reference Table

Configuration by Server Size

Environment Variables Quick Reference

Best Practices

Next Steps