Skip to main content
FlexPrice is designed to scale horizontally across all tiers. This guide covers scaling strategies for production workloads.

Scaling Overview

FlexPrice scales at three independent tiers:
API Tier        → Horizontal scaling (stateless)
Consumer Tier   → Partition-based scaling (Kafka)
Worker Tier     → Task queue scaling (Temporal)

API Tier Scaling

The API tier is stateless and scales horizontally with a load balancer.

Horizontal Pod Autoscaling (Kubernetes)

k8s/api-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: flexprice-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: flexprice-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 50
        periodSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60

Docker Compose Scaling

# Scale to 5 replicas
docker compose up -d --scale flexprice-api=5

# With resource limits
docker compose up -d --scale flexprice-api=5 \
  --cpus=2 --memory=2g

Load Balancer Configuration

upstream flexprice_api {
    least_conn;  # Use least connections algorithm
    
    server flexprice-api-1:8080 max_fails=3 fail_timeout=30s;
    server flexprice-api-2:8080 max_fails=3 fail_timeout=30s;
    server flexprice-api-3:8080 max_fails=3 fail_timeout=30s;
    
    keepalive 32;  # Keep connections alive
}

server {
    listen 80;
    server_name api.flexprice.io;

    location / {
        proxy_pass http://flexprice_api;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        
        # Timeouts
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    location /health {
        access_log off;
        proxy_pass http://flexprice_api/health;
    }
}

Resource Recommendations

Load LevelvCPUMemoryReplicasNotes
Low (less than 100 req/s)11GB2-3Minimum for HA
Medium (100-500 req/s)22GB3-5Baseline production
High (500-2000 req/s)2-42-4GB5-10High traffic
Very High (over 2000 req/s)44GB10-20+Enterprise scale
Always run at least 2 replicas for high availability, even during low traffic.

Consumer Tier Scaling

Consumer scaling is tied to Kafka partition count - you can have at most one consumer per partition.

Partition Strategy

1

Calculate partition count

Determine partitions based on peak event rate:
Partitions = (Peak Events/sec) / (Events per Consumer/sec)
Example: 10,000 events/sec ÷ 1,000 events/sec = 10 partitions
2

Create or increase partitions

# Increase partitions for events topic
docker compose exec kafka kafka-topics \
  --bootstrap-server kafka:9092 \
  --alter \
  --topic events \
  --partitions 10
Partition count can only be increased, never decreased. Plan accordingly.
3

Scale consumers to match

# Docker Compose
docker compose up -d --scale flexprice-consumer=10

# Kubernetes
kubectl scale deployment flexprice-consumer --replicas=10

Consumer Configuration

Optimize consumer performance with rate limits:
docker-compose.yml
services:
  flexprice-consumer:
    image: flexprice-app:latest
    environment:
      - FLEXPRICE_DEPLOYMENT_MODE=consumer
      # Rate limits (messages per second)
      - FLEXPRICE_EVENT_PROCESSING_RATE_LIMIT=100
      - FLEXPRICE_FEATURE_USAGE_TRACKING_RATE_LIMIT=50
      - FLEXPRICE_EVENT_POST_PROCESSING_RATE_LIMIT=75
    deploy:
      replicas: 5
      resources:
        limits:
          cpus: '2'
          memory: 2G

Consumer Group Strategy

FlexPrice uses multiple consumer groups for different processing stages:
events topic (10 partitions)

    ├─→ v1_event_processing (5 consumers) → Store in ClickHouse
    ├─→ v1_feature_tracking_service (3 consumers) → Track features
    └─→ v1_costsheet_usage_tracking_service (2 consumers) → Calculate costs
Each consumer group processes independently - scale them based on their specific load.

Partition Monitoring

Monitor consumer lag to identify scaling needs:
# Check consumer group lag
docker compose exec kafka kafka-consumer-groups \
  --bootstrap-server kafka:9092 \
  --describe \
  --group v1_event_processing
Output:
GROUP                 TOPIC     PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
v1_event_processing   events    0          1000           1000            0
v1_event_processing   events    1          950            1020            70
If LAG is consistently high, add more partitions and consumers.

Resource Recommendations

Event RatevCPUMemoryPartitionsConsumersNotes
Under 1K/s11GB32-3Small workload
1K-10K/s22GB105-10Medium workload
10K-50K/s2-42-4GB2010-20High workload
Over 50K/s44GB50+30-50Enterprise scale

Worker Tier Scaling

Temporal workers scale based on workflow concurrency.

Kubernetes Deployment

k8s/worker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flexprice-worker
spec:
  replicas: 3
  selector:
    matchLabels:
      app: flexprice
      component: worker
  template:
    metadata:
      labels:
        app: flexprice
        component: worker
    spec:
      containers:
      - name: flexprice
        image: flexprice-app:latest
        env:
        - name: FLEXPRICE_DEPLOYMENT_MODE
          value: "temporal_worker"
        - name: FLEXPRICE_TEMPORAL_ADDRESS
          value: "temporal:7233"
        - name: FLEXPRICE_TEMPORAL_TASK_QUEUE
          value: "billing-task-queue"
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 2000m
            memory: 2Gi

Task Queue Strategy

Use multiple task queues for different workflow priorities:
Workflows

    ├─→ billing-high-priority (5 workers) → Critical billing
    ├─→ billing-standard (3 workers) → Regular billing
    └─→ billing-background (2 workers) → Batch operations

Worker Monitoring

Monitor workflow execution in Temporal UI:
  • Queue depth: Number of pending workflows
  • Execution time: Average workflow duration
  • Worker utilization: Active workers / Total workers
Scale workers when queue depth remains high.

Resource Recommendations

Workflow ConcurrencyvCPUMemoryWorkersNotes
Under 10011GB2Small workload
100-50022GB3-5Medium workload
500-200022GB5-10High workload
Over 20002-42-4GB10-20Enterprise scale

Database Scaling

PostgreSQL

For high-traffic deployments, use read replicas:
config.yaml
postgres:
  host: postgres-primary.example.com
  port: 5432
  # Read replica configuration
  reader_host: postgres-replica.example.com
  reader_port: 5432
  # Connection pooling
  max_open_conns: 25
  max_idle_conns: 10
  conn_max_lifetime_minutes: 60
Environment Variables
FLEXPRICE_POSTGRES_HOST=postgres-primary
FLEXPRICE_POSTGRES_READER_HOST=postgres-replica
FLEXPRICE_POSTGRES_MAX_OPEN_CONNS=25

ClickHouse

ClickHouse scales by adding cluster nodes:
clickhouse-config.xml
<clickhouse>
    <remote_servers>
        <flexprice_cluster>
            <shard>
                <replica>
                    <host>clickhouse-1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>clickhouse-2</host>
                    <port>9000</port>
                </replica>
            </shard>
        </flexprice_cluster>
    </remote_servers>
</clickhouse>

Kafka Scaling

For high-throughput event processing:

Kafka Cluster

docker-compose.yml
services:
  kafka-1:
    image: confluentinc/cp-kafka:7.7.1
    environment:
      KAFKA_BROKER_ID: 1
      # ... kafka config

  kafka-2:
    image: confluentinc/cp-kafka:7.7.1
    environment:
      KAFKA_BROKER_ID: 2
      # ... kafka config

  kafka-3:
    image: confluentinc/cp-kafka:7.7.1
    environment:
      KAFKA_BROKER_ID: 3
      # ... kafka config

Replication Factor

Increase replication for durability:
docker compose exec kafka kafka-topics \
  --bootstrap-server kafka:9092 \
  --create \
  --topic events \
  --partitions 10 \
  --replication-factor 3 \
  --config min.insync.replicas=2

Redis Scaling

For distributed caching:
redis-cluster.yml
services:
  redis-master:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  redis-replica-1:
    image: redis:7-alpine
    command: redis-server --replicaof redis-master 6379

  redis-replica-2:
    image: redis:7-alpine
    command: redis-server --replicaof redis-master 6379
FLEXPRICE_REDIS_HOST=redis-master
FLEXPRICE_REDIS_PORT=6379
FLEXPRICE_REDIS_POOL_SIZE=20

Complete Production Example

Full production deployment with all scaling considerations:
docker-compose.prod.yml
services:
  # Load Balancer
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - flexprice-api

  # API Tier (horizontally scaled)
  flexprice-api:
    image: flexprice-app:latest
    environment:
      - FLEXPRICE_DEPLOYMENT_MODE=api
      - FLEXPRICE_POSTGRES_HOST=postgres-primary
      - FLEXPRICE_POSTGRES_READER_HOST=postgres-replica
      - FLEXPRICE_KAFKA_BROKERS=kafka-1:9092,kafka-2:9092,kafka-3:9092
      - FLEXPRICE_CLICKHOUSE_ADDRESS=clickhouse:9000
      - FLEXPRICE_REDIS_HOST=redis
      - FLEXPRICE_POSTGRES_MAX_OPEN_CONNS=25
    deploy:
      replicas: 5
      resources:
        limits:
          cpus: '2'
          memory: 2G
      restart_policy:
        condition: on-failure
        max_attempts: 3

  # Consumer Tier (partition-scaled)
  flexprice-consumer:
    image: flexprice-app:latest
    environment:
      - FLEXPRICE_DEPLOYMENT_MODE=consumer
      - FLEXPRICE_POSTGRES_HOST=postgres-primary
      - FLEXPRICE_KAFKA_BROKERS=kafka-1:9092,kafka-2:9092,kafka-3:9092
      - FLEXPRICE_CLICKHOUSE_ADDRESS=clickhouse:9000
      - FLEXPRICE_EVENT_PROCESSING_RATE_LIMIT=100
      - FLEXPRICE_FEATURE_USAGE_TRACKING_RATE_LIMIT=50
    deploy:
      replicas: 10
      resources:
        limits:
          cpus: '2'
          memory: 2G

  # Worker Tier (workflow-scaled)
  flexprice-worker:
    image: flexprice-app:latest
    environment:
      - FLEXPRICE_DEPLOYMENT_MODE=temporal_worker
      - FLEXPRICE_POSTGRES_HOST=postgres-primary
      - FLEXPRICE_TEMPORAL_ADDRESS=temporal:7233
    deploy:
      replicas: 5
      resources:
        limits:
          cpus: '1'
          memory: 1G

Monitoring Scaling Metrics

Key metrics to monitor:

API Tier

  • Request rate (requests/second)
  • Response latency (p50, p95, p99)
  • Error rate (%)
  • CPU and memory utilization

Consumer Tier

  • Consumer lag (messages)
  • Processing rate (events/second)
  • Error rate (%)
  • CPU and memory utilization

Worker Tier

  • Workflow queue depth
  • Average workflow duration
  • Worker utilization (%)
  • CPU and memory utilization

Infrastructure

  • Database connection pool usage
  • Kafka partition distribution
  • Redis cache hit rate
  • Network throughput

Scaling Decision Matrix

SymptomPossible CauseSolution
High API latencyToo few API replicasScale API tier horizontally
High consumer lagToo few partitionsIncrease Kafka partitions
Queue depth growingToo few workersScale worker tier
Database slow queriesConnection limitIncrease max_open_conns
High memory usageLarge payloadsIncrease memory limits
Network saturationHigh throughputAdd more nodes

Next Steps