Skip to main content

Logs, Metrics & Monitoring

Monitor your application's health, performance, and resource usage in real-time.

Overview

The Strongly platform provides monitoring capabilities through Kubernetes metrics-server:

  • Real-time Logs: Stream application logs in real-time
  • Resource Metrics: CPU and memory usage percentages
  • Health Checks: Kubernetes liveness and readiness probes
  • Replica Status: Ready replicas, total replicas, and availability

Viewing Logs

Real-time Log Streaming

  1. Navigate to your app details page
  2. Click Logs tab
  3. View real-time log output from all instances

Features:

  • Live streaming (updates every 2-3 seconds)
  • Multi-instance aggregation
  • Searchable and filterable
  • Downloadable for analysis

Log Levels

Applications should implement structured logging with levels:

Node.js Example:

const winston = require('winston');

const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.json(),
transports: [
new winston.transports.Console({
format: winston.format.simple()
})
]
});

logger.error('Database connection failed', { error: err.message });
logger.warn('High memory usage detected', { usage: '85%' });
logger.info('User logged in', { userId: 123 });
logger.debug('Processing request', { requestId: 'abc123' });

Python Example:

import logging
import os

# Configure logging
logging.basicConfig(
level=os.environ.get('LOG_LEVEL', 'INFO'),
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger(__name__)

logger.error(f'Database connection failed: {err}')
logger.warning(f'High memory usage: 85%')
logger.info(f'User logged in: {user_id}')
logger.debug(f'Processing request: {request_id}')

Best Practices for Logging

  1. Use Structured Logging: JSON format for easy parsing
  2. Include Context: Request IDs, user IDs, timestamps
  3. Appropriate Levels: DEBUG for development, INFO for production
  4. Avoid Sensitive Data: Never log passwords, tokens, or PII
  5. Log Errors with Stack Traces: Include full error details

Good Logging:

logger.info('User login successful', {
userId: user.id,
email: user.email,
ip: req.ip,
timestamp: new Date().toISOString()
});

logger.error('Payment processing failed', {
orderId: order.id,
amount: order.amount,
error: err.message,
stack: err.stack
});

Bad Logging:

console.log('User logged in');  // No context
console.log(user); // Too much data, potential PII
console.log('Error: ' + err); // No stack trace

Available Resource Metrics

The platform collects metrics from Kubernetes metrics-server. The following metrics are actually available and provide meaningful data:

CPU Usage

  • CPU usage percent: Approximate percentage of allocated CPU being used
  • Calculated from Kubernetes metrics-server pod resource usage

Memory Usage

  • Memory usage percent: Approximate percentage of allocated memory being used
  • Calculated from Kubernetes metrics-server pod resource usage

Replica Status

  • Ready replicas: Number of pods that are ready to serve traffic
  • Total replicas: Total number of desired pods
  • Availability: Whether the deployment has minimum available pods
Metrics Limitations

The platform uses Kubernetes metrics-server for resource monitoring. This provides CPU and memory usage at the pod level. Disk usage, network I/O, and per-request metrics are not available through the platform's built-in monitoring. For detailed application-level metrics, implement your own metrics collection within your application.

Health Checks

Kubernetes uses health checks to monitor application health:

Liveness Probe

Determines if the application is running. If it fails, Kubernetes restarts the container.

Configuration (from manifest):

health_check:
path: /health
port: 3000
initial_delay: 10 # Wait before first check
period: 30 # Check every 30 seconds
timeout: 3 # Timeout after 3 seconds
failure_threshold: 3 # Restart after 3 failures

Implementation:

// Express.js
app.get('/health', (req, res) => {
res.status(200).json({
status: 'ok',
timestamp: new Date().toISOString()
});
});

Readiness Probe

Determines if the application is ready to receive traffic. If it fails, Kubernetes stops sending requests.

Configure via the manifest's runtime.readiness_check_path field (defaults to the health check path if not specified).

Use for:

  • Database connection checks
  • External dependency checks
  • Startup tasks completion

Implementation:

// Express.js
app.get('/ready', async (req, res) => {
try {
// Check database connection
await db.ping();

// Check external API
await fetch('https://api.example.com/health');

res.status(200).json({
status: 'ready',
checks: {
database: 'ok',
externalApi: 'ok'
}
});
} catch (err) {
res.status(503).json({
status: 'not ready',
error: err.message
});
}
});

Health Check Status

View health check results:

  • Healthy: All checks passing
  • Unhealthy: Some checks failing
  • Unknown: No data or probes not configured

Monitoring Dashboard

App Details Page

View monitoring data:

  1. Overview Tab:

    • Current status (Running, Stopped, Failed)
    • Instance count and health
    • Quick metrics summary
  2. Metrics Tab:

    • CPU usage percentage
    • Memory usage percentage
    • Ready vs total replicas
  3. Logs Tab:

    • Real-time log streaming
    • Search and filter
    • Download logs
  4. Scaling Tab (if autoscaling enabled):

    • Current vs desired replicas
    • CPU/memory thresholds

Application-Level Metrics

For detailed application-level metrics beyond what the platform provides, implement metrics collection within your application:

Node.js Example

// Track custom metrics in your application
const metrics = {
requestCount: 0,
errorCount: 0,
responseTimesMs: []
};

// Instrument requests
app.use((req, res, next) => {
const start = Date.now();
metrics.requestCount++;

res.on('finish', () => {
const duration = Date.now() - start;
metrics.responseTimesMs.push(duration);

if (res.statusCode >= 500) {
metrics.errorCount++;
}

// Keep only last 1000 response times
if (metrics.responseTimesMs.length > 1000) {
metrics.responseTimesMs = metrics.responseTimesMs.slice(-1000);
}
});

next();
});

// Expose metrics endpoint
app.get('/app-metrics', (req, res) => {
const avgResponseTime = metrics.responseTimesMs.length > 0
? metrics.responseTimesMs.reduce((a, b) => a + b, 0) / metrics.responseTimesMs.length
: 0;

res.json({
request_count: metrics.requestCount,
error_count: metrics.errorCount,
avg_response_time_ms: Math.round(avgResponseTime),
timestamp: new Date().toISOString()
});
});

Python Example

import time
from functools import wraps

# Simple metrics collector
metrics = {
'request_count': 0,
'error_count': 0,
'response_times': []
}

@app.before_request
def before_request():
request.start_time = time.time()
metrics['request_count'] += 1

@app.after_request
def after_request(response):
duration = time.time() - request.start_time
metrics['response_times'].append(duration)

if response.status_code >= 500:
metrics['error_count'] += 1

# Keep only last 1000
if len(metrics['response_times']) > 1000:
metrics['response_times'] = metrics['response_times'][-1000:]

return response

@app.route('/app-metrics')
def app_metrics():
avg_time = sum(metrics['response_times']) / len(metrics['response_times']) if metrics['response_times'] else 0
return {
'request_count': metrics['request_count'],
'error_count': metrics['error_count'],
'avg_response_time_ms': round(avg_time * 1000),
}

Troubleshooting with Logs

Common Patterns

Application Crashes:

# Search for error logs
Error: Cannot read property 'id' of undefined
at /app/server.js:45:23

# Check stack trace for root cause
# Fix code and redeploy

High Memory Usage:

# Look for memory-related warnings
FATAL ERROR: Reached heap limit
Allocation failed - JavaScript heap out of memory

# Increase memory limit in manifest
# Or optimize application code

Connection Issues:

# Search for connection errors
Error: connect ECONNREFUSED 10.0.2.15:5432
# Check STRONGLY_SERVICES configuration
# Verify service is running

Performance Optimization

Identifying Bottlenecks

  1. High CPU:

    • Check slow endpoints
    • Optimize algorithms
    • Add caching
    • Scale horizontally
  2. High Memory:

    • Check for memory leaks
    • Optimize data structures
    • Implement pagination
    • Increase memory limit
  3. Slow Response:

    • Add database indexes
    • Implement caching
    • Optimize queries
    • Use connection pooling

Monitoring Checklist

  • Health check endpoint implemented
  • Structured logging in place
  • Resource limits configured appropriately
  • Autoscaling configured (if needed)
  • Regular log review process

Best Practices

  1. Log Everything Important: Request IDs, user actions, errors
  2. Monitor Proactively: Review metrics regularly
  3. Optimize Based on Data: Use CPU/memory metrics to guide optimization
  4. Test Health Checks: Ensure health endpoints work correctly
  5. Implement App-Level Metrics: For detailed insights beyond platform metrics
  6. Secure Metrics: Don't expose sensitive data in metrics endpoints

Next Steps