Skip to main content

Auto-Scaling Configuration

Enable automatic scaling based on CPU and memory usage to handle varying workloads efficiently. When enabled, the platform automatically adjusts the number of running instances to match demand.

Overview

The autoscaler monitors your app's resource usage and intelligently scales up during high demand and down during low usage periods, optimizing both performance and cost.

How It Works

  1. Metrics Collection: Platform monitors CPU and memory usage across all instances every polling interval (default: 30 seconds)
  2. Scale Up Decision: When either CPU or memory exceeds threshold, a new instance is added (one at a time)
  3. Scale Down Decision: When both CPU and memory are below 50% of threshold for sustained period, instances are removed
  4. Cooldown Periods: After scaling up (60s) or down (120s), the system waits before making another scaling decision
  5. Dedicated Nodes: Each new instance is provisioned on its own dedicated node for isolation and performance
Smart Scaling

The autoscaler uses intelligent algorithms to prevent scaling flapping and ensure smooth transitions between instance counts.

Enabling Autoscaling

During Deployment

  1. Navigate to Apps → Deploy App
  2. Scroll to the Autoscaling (Optional) section
  3. Check Enable Autoscaling
  4. Configure thresholds and limits based on your needs

After Deployment

  1. Navigate to app details page
  2. Click Configuration tab
  3. Scroll to Autoscaling section
  4. Toggle Enable Autoscaling
  5. Click Redeploy to apply changes

Configuration Options

SettingDefaultRangeDescription
Min Replicas11-20Minimum number of instances always running
Max Replicas102-50Maximum number of instances to scale to
CPU Threshold70%1-100%Scale up when average CPU exceeds this
Memory Threshold80%1-100%Scale up when average memory exceeds this
Polling Interval30s10-300sHow often to check metrics

Understanding Thresholds

CPU Threshold: Percentage of allocated CPU (from resources.cpu_limit in manifest)

  • If cpu_limit: "1000m" and threshold is 70%, scaling triggers at 700m (0.7 cores)
  • Lower thresholds = more aggressive scaling
  • Higher thresholds = fewer instances, lower cost

Memory Threshold: Percentage of allocated memory (from resources.memory_limit in manifest)

  • If memory_limit: "1Gi" and threshold is 80%, scaling triggers at 819MB
  • Critical for preventing OOM (Out of Memory) errors
  • Recommended: 70-80% for memory-intensive apps

Scaling Behavior

Scale Up Triggers

A new instance is added when either condition is met:

  • Average CPU usage > CPU threshold
  • Average memory usage > Memory threshold

Example:

# Manifest resources
resources:
cpu_limit: "1000m"
memory_limit: "1Gi"

# Autoscaling config
cpu_threshold: 70% # Scale up at 700m CPU
memory_threshold: 80% # Scale up at 819MB memory

If current instance is using:

  • CPU: 750m (75%) OR
  • Memory: 850MB (83%)

→ Scale up triggered!

Scale Down Triggers

An instance is removed when both conditions are met:

  • Average CPU usage < 35% (50% of 70% threshold)
  • Average memory usage < 40% (50% of 80% threshold)
  • Sustained for at least 2x polling interval

Example:

  • CPU: 300m (30%) AND
  • Memory: 300MB (29%)
  • For at least 60 seconds

→ Scale down triggered!

Cooldown Periods

After scaling events, the system waits before making another decision:

  • Scale Up Cooldown: 60 seconds
  • Scale Down Cooldown: 120 seconds

This prevents rapid scaling oscillations ("flapping").

Example Configurations

Light Traffic API (Development)

Best for development and testing with predictable low traffic.

# Manifest
resources:
cpu_request: "100m"
memory_request: "256Mi"
cpu_limit: "500m"
memory_limit: "512Mi"

Autoscaling:

  • Enabled: ❌ (Use fixed instances instead)
  • Fixed Instances: 1
  • Resource Tier: Small

Cost: Lowest - single instance always running

Use Case: Development, testing, personal projects


Production Web App (Moderate Traffic)

Balanced configuration for production apps with moderate traffic.

# Manifest
resources:
cpu_request: "200m"
memory_request: "512Mi"
cpu_limit: "1000m"
memory_limit: "1Gi"

Autoscaling:

  • Enabled: ✅
  • Min Replicas: 2
  • Max Replicas: 10
  • CPU Threshold: 70%
  • Memory Threshold: 80%
  • Polling Interval: 30s

Cost: Moderate - scales from 2 to 10 instances

Use Case: Production web apps, REST APIs, dashboards

Expected Behavior:

  • Always 2 instances for high availability
  • Scales to 10 during traffic spikes
  • Handles 5x traffic increase automatically

High-Traffic API (Variable Load)

Aggressive scaling for APIs with unpredictable traffic patterns.

# Manifest
resources:
cpu_request: "500m"
memory_request: "1Gi"
cpu_limit: "2000m"
memory_limit: "2Gi"

Autoscaling:

  • Enabled: ✅
  • Min Replicas: 3
  • Max Replicas: 20
  • CPU Threshold: 60%
  • Memory Threshold: 70%
  • Polling Interval: 20s

Cost: High - scales from 3 to 20 instances

Use Case: High-traffic APIs, real-time services, production critical apps

Expected Behavior:

  • Always 3 instances for redundancy
  • Scales aggressively (lower thresholds)
  • Fast response to load changes (20s polling)
  • Can handle 6-7x traffic increases

CPU-Intensive Processing

Optimized for compute-heavy workloads.

# Manifest
resources:
cpu_request: "1000m"
memory_request: "512Mi"
cpu_limit: "4000m"
memory_limit: "1Gi"

Autoscaling:

  • Enabled: ✅
  • Min Replicas: 2
  • Max Replicas: 15
  • CPU Threshold: 60% # Lower for CPU-heavy apps
  • Memory Threshold: 80%
  • Polling Interval: 30s

Cost: Variable - scales based on CPU demand

Use Case: Image processing, video transcoding, data analysis


Memory-Intensive Application

Optimized for memory-heavy workloads.

# Manifest
resources:
cpu_request: "200m"
memory_request: "2Gi"
cpu_limit: "1000m"
memory_limit: "4Gi"

Autoscaling:

  • Enabled: ✅
  • Min Replicas: 2
  • Max Replicas: 10
  • CPU Threshold: 70%
  • Memory Threshold: 70% # Lower to prevent OOM
  • Polling Interval: 30s

Cost: High - large memory allocation per instance

Use Case: In-memory caching, ML inference, data processing

Monitoring Autoscaling

View autoscaling activity in your app details page:

Metrics Dashboard

  • Current Replicas: Real-time instance count
  • CPU Usage: Percentage of allocated CPU across all instances
  • Memory Usage: Percentage of allocated memory across all instances
  • Scaling Events: Timeline of scale up/down decisions

Scaling History

Track all scaling decisions with:

  • Timestamp: When the scaling occurred
  • Action: Scale up or scale down
  • Reason: CPU threshold exceeded, memory threshold exceeded, etc.
  • Before/After: Instance count before and after
  • Duration: Time to complete scaling

Alerts

Set up alerts for scaling events:

  • Scaled to Max: Alert when max replicas reached
  • Scaling Failures: Alert when scaling fails
  • High Resource Usage: Alert when approaching thresholds

Best Practices

Development vs Production

Development:

  • ❌ Disable autoscaling
  • Use 1-2 fixed instances
  • Small resource tier
  • Minimize costs

Production:

  • ✅ Enable autoscaling
  • Min replicas ≥ 2 for high availability
  • Appropriate resource tier
  • Balance cost and performance

Traffic Patterns

Predictable Traffic:

  • Use fixed instances instead of autoscaling
  • Right-size resource allocation
  • Lower operational complexity

Unpredictable Traffic:

  • Enable autoscaling with higher max replicas
  • Set lower thresholds for faster response
  • Monitor scaling patterns and adjust

Resource Allocation

CPU-Intensive Apps:

  • Lower CPU threshold (60%)
  • Higher CPU limits
  • Prevents performance degradation

Memory-Intensive Apps:

  • Lower memory threshold (70%)
  • Higher memory limits
  • Prevents OOM errors

Balanced Apps:

  • Standard thresholds (70% CPU, 80% memory)
  • Balanced resource allocation

Cost Optimization

  1. Start Conservative: Begin with low max replicas, increase if needed
  2. Monitor Patterns: Review scaling history to find optimal settings
  3. Use Min Replicas Wisely: Higher min = higher baseline cost
  4. Development Environments: Disable autoscaling, use 1 instance
  5. Off-Peak Scaling: Consider scheduled scaling for predictable patterns

Troubleshooting

Not Scaling Up

Problem: CPU/memory high but no new instances

Check:

  • Reached max replicas limit?
  • Currently in cooldown period?
  • Thresholds configured correctly?
  • Metrics being collected?

Solution:

# Lower thresholds for more aggressive scaling
cpu_threshold: 60% # Was 70%
memory_threshold: 70% # Was 80%

# Increase max replicas
max_replicas: 20 # Was 10

Scaling Flapping

Problem: Instances constantly scaling up and down

Check:

  • Thresholds too close to scale-down threshold?
  • Polling interval too short?
  • Resource limits too low?

Solution:

# Increase polling interval
polling_interval: 60s # Was 30s

# Adjust thresholds for more hysteresis
cpu_threshold: 75% # Was 70%
memory_threshold: 85% # Was 80%

Slow Scaling

Problem: Scaling happens too slowly

Check:

  • Polling interval too long?
  • Thresholds too high?

Solution:

# Decrease polling interval
polling_interval: 20s # Was 30s

# Lower thresholds
cpu_threshold: 60% # Was 70%
memory_threshold: 70% # Was 80%

High Costs

Problem: Too many instances running

Check:

  • Min replicas too high?
  • Max replicas excessive?
  • Scale-down thresholds too low?

Solution:

# Reduce min replicas
min_replicas: 1 # Was 3

# Reduce max replicas
max_replicas: 5 # Was 20

# Allow more aggressive scale-down
# (automatic - 50% of thresholds)

Infrastructure Costs

Dedicated Nodes

Each autoscaled instance receives its own dedicated compute node. This ensures performance isolation but may increase infrastructure costs during scale-up events. Monitor your usage in the FinOps dashboard.

Cost Calculation

  • Base Cost: Min replicas × instance cost
  • Variable Cost: Additional instances × instance cost × usage time
  • Node Cost: Each instance includes dedicated node allocation

Example:

Configuration:
- Min Replicas: 2
- Max Replicas: 10
- Resource Tier: Medium ($50/month per instance)

Base Cost: 2 × $50 = $100/month
Max Cost: 10 × $50 = $500/month
Typical Cost: ~$150-250/month (3-5 avg instances)

Next Steps