Documentation Index
Fetch the complete documentation index at: https://mintlify.com/flexprice/flexprice/llms.txt
Use this file to discover all available pages before exploring further.
FlexPrice is designed to scale horizontally across all tiers. This guide covers scaling strategies for production workloads.
Scaling Overview
FlexPrice scales at three independent tiers:
API Tier → Horizontal scaling (stateless)
Consumer Tier → Partition-based scaling (Kafka)
Worker Tier → Task queue scaling (Temporal)
API Tier Scaling
The API tier is stateless and scales horizontally with a load balancer.
Horizontal Pod Autoscaling (Kubernetes)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: flexprice-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: flexprice-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 50
periodSeconds: 30
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
Docker Compose Scaling
# Scale to 5 replicas
docker compose up -d --scale flexprice-api=5
# With resource limits
docker compose up -d --scale flexprice-api=5 \
--cpus=2 --memory=2g
Load Balancer Configuration
upstream flexprice_api {
least_conn; # Use least connections algorithm
server flexprice-api-1:8080 max_fails=3 fail_timeout=30s;
server flexprice-api-2:8080 max_fails=3 fail_timeout=30s;
server flexprice-api-3:8080 max_fails=3 fail_timeout=30s;
keepalive 32; # Keep connections alive
}
server {
listen 80;
server_name api.flexprice.io;
location / {
proxy_pass http://flexprice_api;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
location /health {
access_log off;
proxy_pass http://flexprice_api/health;
}
}
Resource Recommendations
| Load Level | vCPU | Memory | Replicas | Notes |
|---|
| Low (less than 100 req/s) | 1 | 1GB | 2-3 | Minimum for HA |
| Medium (100-500 req/s) | 2 | 2GB | 3-5 | Baseline production |
| High (500-2000 req/s) | 2-4 | 2-4GB | 5-10 | High traffic |
| Very High (over 2000 req/s) | 4 | 4GB | 10-20+ | Enterprise scale |
Always run at least 2 replicas for high availability, even during low traffic.
Consumer Tier Scaling
Consumer scaling is tied to Kafka partition count - you can have at most one consumer per partition.
Partition Strategy
Calculate partition count
Determine partitions based on peak event rate:Partitions = (Peak Events/sec) / (Events per Consumer/sec)
Example: 10,000 events/sec ÷ 1,000 events/sec = 10 partitions Create or increase partitions
# Increase partitions for events topic
docker compose exec kafka kafka-topics \
--bootstrap-server kafka:9092 \
--alter \
--topic events \
--partitions 10
Partition count can only be increased, never decreased. Plan accordingly.
Scale consumers to match
# Docker Compose
docker compose up -d --scale flexprice-consumer=10
# Kubernetes
kubectl scale deployment flexprice-consumer --replicas=10
Consumer Configuration
Optimize consumer performance with rate limits:
services:
flexprice-consumer:
image: flexprice-app:latest
environment:
- FLEXPRICE_DEPLOYMENT_MODE=consumer
# Rate limits (messages per second)
- FLEXPRICE_EVENT_PROCESSING_RATE_LIMIT=100
- FLEXPRICE_FEATURE_USAGE_TRACKING_RATE_LIMIT=50
- FLEXPRICE_EVENT_POST_PROCESSING_RATE_LIMIT=75
deploy:
replicas: 5
resources:
limits:
cpus: '2'
memory: 2G
Consumer Group Strategy
FlexPrice uses multiple consumer groups for different processing stages:
events topic (10 partitions)
│
├─→ v1_event_processing (5 consumers) → Store in ClickHouse
├─→ v1_feature_tracking_service (3 consumers) → Track features
└─→ v1_costsheet_usage_tracking_service (2 consumers) → Calculate costs
Each consumer group processes independently - scale them based on their specific load.
Partition Monitoring
Monitor consumer lag to identify scaling needs:
# Check consumer group lag
docker compose exec kafka kafka-consumer-groups \
--bootstrap-server kafka:9092 \
--describe \
--group v1_event_processing
Output:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
v1_event_processing events 0 1000 1000 0
v1_event_processing events 1 950 1020 70
If LAG is consistently high, add more partitions and consumers.
Resource Recommendations
| Event Rate | vCPU | Memory | Partitions | Consumers | Notes |
|---|
| Under 1K/s | 1 | 1GB | 3 | 2-3 | Small workload |
| 1K-10K/s | 2 | 2GB | 10 | 5-10 | Medium workload |
| 10K-50K/s | 2-4 | 2-4GB | 20 | 10-20 | High workload |
| Over 50K/s | 4 | 4GB | 50+ | 30-50 | Enterprise scale |
Worker Tier Scaling
Temporal workers scale based on workflow concurrency.
Kubernetes Deployment
k8s/worker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: flexprice-worker
spec:
replicas: 3
selector:
matchLabels:
app: flexprice
component: worker
template:
metadata:
labels:
app: flexprice
component: worker
spec:
containers:
- name: flexprice
image: flexprice-app:latest
env:
- name: FLEXPRICE_DEPLOYMENT_MODE
value: "temporal_worker"
- name: FLEXPRICE_TEMPORAL_ADDRESS
value: "temporal:7233"
- name: FLEXPRICE_TEMPORAL_TASK_QUEUE
value: "billing-task-queue"
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
Task Queue Strategy
Use multiple task queues for different workflow priorities:
Workflows
│
├─→ billing-high-priority (5 workers) → Critical billing
├─→ billing-standard (3 workers) → Regular billing
└─→ billing-background (2 workers) → Batch operations
Worker Monitoring
Monitor workflow execution in Temporal UI:
- Queue depth: Number of pending workflows
- Execution time: Average workflow duration
- Worker utilization: Active workers / Total workers
Scale workers when queue depth remains high.
Resource Recommendations
| Workflow Concurrency | vCPU | Memory | Workers | Notes |
|---|
| Under 100 | 1 | 1GB | 2 | Small workload |
| 100-500 | 2 | 2GB | 3-5 | Medium workload |
| 500-2000 | 2 | 2GB | 5-10 | High workload |
| Over 2000 | 2-4 | 2-4GB | 10-20 | Enterprise scale |
Database Scaling
PostgreSQL
For high-traffic deployments, use read replicas:
postgres:
host: postgres-primary.example.com
port: 5432
# Read replica configuration
reader_host: postgres-replica.example.com
reader_port: 5432
# Connection pooling
max_open_conns: 25
max_idle_conns: 10
conn_max_lifetime_minutes: 60
FLEXPRICE_POSTGRES_HOST=postgres-primary
FLEXPRICE_POSTGRES_READER_HOST=postgres-replica
FLEXPRICE_POSTGRES_MAX_OPEN_CONNS=25
ClickHouse
ClickHouse scales by adding cluster nodes:
<clickhouse>
<remote_servers>
<flexprice_cluster>
<shard>
<replica>
<host>clickhouse-1</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-2</host>
<port>9000</port>
</replica>
</shard>
</flexprice_cluster>
</remote_servers>
</clickhouse>
Kafka Scaling
For high-throughput event processing:
Kafka Cluster
services:
kafka-1:
image: confluentinc/cp-kafka:7.7.1
environment:
KAFKA_BROKER_ID: 1
# ... kafka config
kafka-2:
image: confluentinc/cp-kafka:7.7.1
environment:
KAFKA_BROKER_ID: 2
# ... kafka config
kafka-3:
image: confluentinc/cp-kafka:7.7.1
environment:
KAFKA_BROKER_ID: 3
# ... kafka config
Replication Factor
Increase replication for durability:
docker compose exec kafka kafka-topics \
--bootstrap-server kafka:9092 \
--create \
--topic events \
--partitions 10 \
--replication-factor 3 \
--config min.insync.replicas=2
Redis Scaling
For distributed caching:
services:
redis-master:
image: redis:7-alpine
ports:
- "6379:6379"
redis-replica-1:
image: redis:7-alpine
command: redis-server --replicaof redis-master 6379
redis-replica-2:
image: redis:7-alpine
command: redis-server --replicaof redis-master 6379
FLEXPRICE_REDIS_HOST=redis-master
FLEXPRICE_REDIS_PORT=6379
FLEXPRICE_REDIS_POOL_SIZE=20
Complete Production Example
Full production deployment with all scaling considerations:
services:
# Load Balancer
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- flexprice-api
# API Tier (horizontally scaled)
flexprice-api:
image: flexprice-app:latest
environment:
- FLEXPRICE_DEPLOYMENT_MODE=api
- FLEXPRICE_POSTGRES_HOST=postgres-primary
- FLEXPRICE_POSTGRES_READER_HOST=postgres-replica
- FLEXPRICE_KAFKA_BROKERS=kafka-1:9092,kafka-2:9092,kafka-3:9092
- FLEXPRICE_CLICKHOUSE_ADDRESS=clickhouse:9000
- FLEXPRICE_REDIS_HOST=redis
- FLEXPRICE_POSTGRES_MAX_OPEN_CONNS=25
deploy:
replicas: 5
resources:
limits:
cpus: '2'
memory: 2G
restart_policy:
condition: on-failure
max_attempts: 3
# Consumer Tier (partition-scaled)
flexprice-consumer:
image: flexprice-app:latest
environment:
- FLEXPRICE_DEPLOYMENT_MODE=consumer
- FLEXPRICE_POSTGRES_HOST=postgres-primary
- FLEXPRICE_KAFKA_BROKERS=kafka-1:9092,kafka-2:9092,kafka-3:9092
- FLEXPRICE_CLICKHOUSE_ADDRESS=clickhouse:9000
- FLEXPRICE_EVENT_PROCESSING_RATE_LIMIT=100
- FLEXPRICE_FEATURE_USAGE_TRACKING_RATE_LIMIT=50
deploy:
replicas: 10
resources:
limits:
cpus: '2'
memory: 2G
# Worker Tier (workflow-scaled)
flexprice-worker:
image: flexprice-app:latest
environment:
- FLEXPRICE_DEPLOYMENT_MODE=temporal_worker
- FLEXPRICE_POSTGRES_HOST=postgres-primary
- FLEXPRICE_TEMPORAL_ADDRESS=temporal:7233
deploy:
replicas: 5
resources:
limits:
cpus: '1'
memory: 1G
Monitoring Scaling Metrics
Key metrics to monitor:
API Tier
- Request rate (requests/second)
- Response latency (p50, p95, p99)
- Error rate (%)
- CPU and memory utilization
Consumer Tier
- Consumer lag (messages)
- Processing rate (events/second)
- Error rate (%)
- CPU and memory utilization
Worker Tier
- Workflow queue depth
- Average workflow duration
- Worker utilization (%)
- CPU and memory utilization
Infrastructure
- Database connection pool usage
- Kafka partition distribution
- Redis cache hit rate
- Network throughput
Scaling Decision Matrix
| Symptom | Possible Cause | Solution |
|---|
| High API latency | Too few API replicas | Scale API tier horizontally |
| High consumer lag | Too few partitions | Increase Kafka partitions |
| Queue depth growing | Too few workers | Scale worker tier |
| Database slow queries | Connection limit | Increase max_open_conns |
| High memory usage | Large payloads | Increase memory limits |
| Network saturation | High throughput | Add more nodes |
Next Steps