Auto-Scaling Configuration
Enable automatic scaling based on CPU and memory usage to handle varying workloads efficiently. When enabled, the platform automatically adjusts the number of running instances to match demand.
Overview
The autoscaler monitors your app's resource usage and intelligently scales up during high demand and down during low usage periods, optimizing both performance and cost.
How It Works
- Metrics Collection: Platform monitors CPU and memory usage across all instances every polling interval (default: 30 seconds)
- Scale Up Decision: When either CPU or memory exceeds threshold, a new instance is added (one at a time)
- Scale Down Decision: When both CPU and memory are below 50% of threshold for sustained period, instances are removed
- Cooldown Periods: After scaling up (60s) or down (120s), the system waits before making another scaling decision
- Dedicated Nodes: Each new instance is provisioned on its own dedicated node for isolation and performance
The autoscaler uses intelligent algorithms to prevent scaling flapping and ensure smooth transitions between instance counts.
Enabling Autoscaling
During Deployment
- Navigate to Apps → Deploy App
- Scroll to the Autoscaling (Optional) section
- Check Enable Autoscaling
- Configure thresholds and limits based on your needs
After Deployment
- Navigate to app details page
- Click Configuration tab
- Scroll to Autoscaling section
- Toggle Enable Autoscaling
- Click Redeploy to apply changes
Configuration Options
| Setting | Default | Range | Description |
|---|---|---|---|
| Min Replicas | 1 | 1-20 | Minimum number of instances always running |
| Max Replicas | 10 | 2-50 | Maximum number of instances to scale to |
| CPU Threshold | 70% | 1-100% | Scale up when average CPU exceeds this |
| Memory Threshold | 80% | 1-100% | Scale up when average memory exceeds this |
| Polling Interval | 30s | 10-300s | How often to check metrics |
Understanding Thresholds
CPU Threshold: Percentage of allocated CPU (from resources.cpu_limit in manifest)
- If
cpu_limit: "1000m"and threshold is 70%, scaling triggers at 700m (0.7 cores) - Lower thresholds = more aggressive scaling
- Higher thresholds = fewer instances, lower cost
Memory Threshold: Percentage of allocated memory (from resources.memory_limit in manifest)
- If
memory_limit: "1Gi"and threshold is 80%, scaling triggers at 819MB - Critical for preventing OOM (Out of Memory) errors
- Recommended: 70-80% for memory-intensive apps
Scaling Behavior
Scale Up Triggers
A new instance is added when either condition is met:
- Average CPU usage > CPU threshold
- Average memory usage > Memory threshold
Example:
# Manifest resources
resources:
cpu_limit: "1000m"
memory_limit: "1Gi"
# Autoscaling config
cpu_threshold: 70% # Scale up at 700m CPU
memory_threshold: 80% # Scale up at 819MB memory
If current instance is using:
- CPU: 750m (75%) OR
- Memory: 850MB (83%)
→ Scale up triggered!
Scale Down Triggers
An instance is removed when both conditions are met:
- Average CPU usage < 35% (50% of 70% threshold)
- Average memory usage < 40% (50% of 80% threshold)
- Sustained for at least 2x polling interval
Example:
- CPU: 300m (30%) AND
- Memory: 300MB (29%)
- For at least 60 seconds
→ Scale down triggered!
Cooldown Periods
After scaling events, the system waits before making another decision:
- Scale Up Cooldown: 60 seconds
- Scale Down Cooldown: 120 seconds
This prevents rapid scaling oscillations ("flapping").
Example Configurations
Light Traffic API (Development)
Best for development and testing with predictable low traffic.
# Manifest
resources:
cpu_request: "100m"
memory_request: "256Mi"
cpu_limit: "500m"
memory_limit: "512Mi"
Autoscaling:
- Enabled: ❌ (Use fixed instances instead)
- Fixed Instances: 1
- Resource Tier: Small
Cost: Lowest - single instance always running
Use Case: Development, testing, personal projects
Production Web App (Moderate Traffic)
Balanced configuration for production apps with moderate traffic.
# Manifest
resources:
cpu_request: "200m"
memory_request: "512Mi"
cpu_limit: "1000m"
memory_limit: "1Gi"
Autoscaling:
- Enabled: ✅
- Min Replicas: 2
- Max Replicas: 10
- CPU Threshold: 70%
- Memory Threshold: 80%
- Polling Interval: 30s
Cost: Moderate - scales from 2 to 10 instances
Use Case: Production web apps, REST APIs, dashboards
Expected Behavior:
- Always 2 instances for high availability
- Scales to 10 during traffic spikes
- Handles 5x traffic increase automatically
High-Traffic API (Variable Load)
Aggressive scaling for APIs with unpredictable traffic patterns.
# Manifest
resources:
cpu_request: "500m"
memory_request: "1Gi"
cpu_limit: "2000m"
memory_limit: "2Gi"
Autoscaling:
- Enabled: ✅
- Min Replicas: 3
- Max Replicas: 20
- CPU Threshold: 60%
- Memory Threshold: 70%
- Polling Interval: 20s
Cost: High - scales from 3 to 20 instances
Use Case: High-traffic APIs, real-time services, production critical apps
Expected Behavior:
- Always 3 instances for redundancy
- Scales aggressively (lower thresholds)
- Fast response to load changes (20s polling)
- Can handle 6-7x traffic increases
CPU-Intensive Processing
Optimized for compute-heavy workloads.
# Manifest
resources:
cpu_request: "1000m"
memory_request: "512Mi"
cpu_limit: "4000m"
memory_limit: "1Gi"
Autoscaling:
- Enabled: ✅
- Min Replicas: 2
- Max Replicas: 15
- CPU Threshold: 60% # Lower for CPU-heavy apps
- Memory Threshold: 80%
- Polling Interval: 30s
Cost: Variable - scales based on CPU demand
Use Case: Image processing, video transcoding, data analysis
Memory-Intensive Application
Optimized for memory-heavy workloads.
# Manifest
resources:
cpu_request: "200m"
memory_request: "2Gi"
cpu_limit: "1000m"
memory_limit: "4Gi"
Autoscaling:
- Enabled: ✅
- Min Replicas: 2
- Max Replicas: 10
- CPU Threshold: 70%
- Memory Threshold: 70% # Lower to prevent OOM
- Polling Interval: 30s
Cost: High - large memory allocation per instance
Use Case: In-memory caching, ML inference, data processing
Monitoring Autoscaling
View autoscaling activity in your app details page:
Metrics Dashboard
- Current Replicas: Real-time instance count
- CPU Usage: Percentage of allocated CPU across all instances
- Memory Usage: Percentage of allocated memory across all instances
- Scaling Events: Timeline of scale up/down decisions
Scaling History
Track all scaling decisions with:
- Timestamp: When the scaling occurred
- Action: Scale up or scale down
- Reason: CPU threshold exceeded, memory threshold exceeded, etc.
- Before/After: Instance count before and after
- Duration: Time to complete scaling
Alerts
Set up alerts for scaling events:
- Scaled to Max: Alert when max replicas reached
- Scaling Failures: Alert when scaling fails
- High Resource Usage: Alert when approaching thresholds
Best Practices
Development vs Production
Development:
- ❌ Disable autoscaling
- Use 1-2 fixed instances
- Small resource tier
- Minimize costs
Production:
- ✅ Enable autoscaling
- Min replicas ≥ 2 for high availability
- Appropriate resource tier
- Balance cost and performance
Traffic Patterns
Predictable Traffic:
- Use fixed instances instead of autoscaling
- Right-size resource allocation
- Lower operational complexity
Unpredictable Traffic:
- Enable autoscaling with higher max replicas
- Set lower thresholds for faster response
- Monitor scaling patterns and adjust
Resource Allocation
CPU-Intensive Apps:
- Lower CPU threshold (60%)
- Higher CPU limits
- Prevents performance degradation
Memory-Intensive Apps:
- Lower memory threshold (70%)
- Higher memory limits
- Prevents OOM errors
Balanced Apps:
- Standard thresholds (70% CPU, 80% memory)
- Balanced resource allocation
Cost Optimization
- Start Conservative: Begin with low max replicas, increase if needed
- Monitor Patterns: Review scaling history to find optimal settings
- Use Min Replicas Wisely: Higher min = higher baseline cost
- Development Environments: Disable autoscaling, use 1 instance
- Off-Peak Scaling: Consider scheduled scaling for predictable patterns
Troubleshooting
Not Scaling Up
Problem: CPU/memory high but no new instances
Check:
- Reached max replicas limit?
- Currently in cooldown period?
- Thresholds configured correctly?
- Metrics being collected?
Solution:
# Lower thresholds for more aggressive scaling
cpu_threshold: 60% # Was 70%
memory_threshold: 70% # Was 80%
# Increase max replicas
max_replicas: 20 # Was 10
Scaling Flapping
Problem: Instances constantly scaling up and down
Check:
- Thresholds too close to scale-down threshold?
- Polling interval too short?
- Resource limits too low?
Solution:
# Increase polling interval
polling_interval: 60s # Was 30s
# Adjust thresholds for more hysteresis
cpu_threshold: 75% # Was 70%
memory_threshold: 85% # Was 80%
Slow Scaling
Problem: Scaling happens too slowly
Check:
- Polling interval too long?
- Thresholds too high?
Solution:
# Decrease polling interval
polling_interval: 20s # Was 30s
# Lower thresholds
cpu_threshold: 60% # Was 70%
memory_threshold: 70% # Was 80%
High Costs
Problem: Too many instances running
Check:
- Min replicas too high?
- Max replicas excessive?
- Scale-down thresholds too low?
Solution:
# Reduce min replicas
min_replicas: 1 # Was 3
# Reduce max replicas
max_replicas: 5 # Was 20
# Allow more aggressive scale-down
# (automatic - 50% of thresholds)
Infrastructure Costs
Each autoscaled instance receives its own dedicated compute node. This ensures performance isolation but may increase infrastructure costs during scale-up events. Monitor your usage in the FinOps dashboard.
Cost Calculation
- Base Cost: Min replicas × instance cost
- Variable Cost: Additional instances × instance cost × usage time
- Node Cost: Each instance includes dedicated node allocation
Example:
Configuration:
- Min Replicas: 2
- Max Replicas: 10
- Resource Tier: Medium ($50/month per instance)
Base Cost: 2 × $50 = $100/month
Max Cost: 10 × $50 = $500/month
Typical Cost: ~$150-250/month (3-5 avg instances)