Logs, Metrics & Monitoring

Monitor your application's health, performance, and resource usage in real-time.

Overview

The Strongly platform provides monitoring capabilities through Kubernetes metrics-server:

Real-time Logs: Stream application logs in real-time
Resource Metrics: CPU and memory usage percentages
Health Checks: Kubernetes liveness and readiness probes
Replica Status: Ready replicas, total replicas, and availability

Viewing Logs

Real-time Log Streaming

Navigate to your app details page
Click Logs tab
View real-time log output from all instances

Features:

Live streaming (updates every 2-3 seconds)
Multi-instance aggregation
Searchable and filterable
Downloadable for analysis

Log Levels

Applications should implement structured logging with levels:

Node.js Example:

const winston = require('winston');

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.Console({
      format: winston.format.simple()
    })
  ]
});

logger.error('Database connection failed', { error: err.message });
logger.warn('High memory usage detected', { usage: '85%' });
logger.info('User logged in', { userId: 123 });
logger.debug('Processing request', { requestId: 'abc123' });

Python Example:

import logging
import os

# Configure logging
logging.basicConfig(
    level=os.environ.get('LOG_LEVEL', 'INFO'),
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger(__name__)

logger.error(f'Database connection failed: {err}')
logger.warning(f'High memory usage: 85%')
logger.info(f'User logged in: {user_id}')
logger.debug(f'Processing request: {request_id}')

Best Practices for Logging

Use Structured Logging: JSON format for easy parsing
Include Context: Request IDs, user IDs, timestamps
Appropriate Levels: DEBUG for development, INFO for production
Avoid Sensitive Data: Never log passwords, tokens, or PII
Log Errors with Stack Traces: Include full error details

Good Logging:

logger.info('User login successful', {
  userId: user.id,
  email: user.email,
  ip: req.ip,
  timestamp: new Date().toISOString()
});

logger.error('Payment processing failed', {
  orderId: order.id,
  amount: order.amount,
  error: err.message,
  stack: err.stack
});

Bad Logging:

console.log('User logged in');  // No context
console.log(user);  // Too much data, potential PII
console.log('Error: ' + err);  // No stack trace

Available Resource Metrics

The platform collects metrics from Kubernetes metrics-server. The following metrics are actually available and provide meaningful data:

CPU Usage

CPU usage percent: Approximate percentage of allocated CPU being used
Calculated from Kubernetes metrics-server pod resource usage

Memory Usage

Memory usage percent: Approximate percentage of allocated memory being used
Calculated from Kubernetes metrics-server pod resource usage

Replica Status

Ready replicas: Number of pods that are ready to serve traffic
Total replicas: Total number of desired pods
Availability: Whether the deployment has minimum available pods

Metrics Limitations

The platform uses Kubernetes metrics-server for resource monitoring. This provides CPU and memory usage at the pod level. Disk usage, network I/O, and per-request metrics are not available through the platform's built-in monitoring. For detailed application-level metrics, implement your own metrics collection within your application.

Health Checks

Kubernetes uses health checks to monitor application health:

Liveness Probe

Determines if the application is running. If it fails, Kubernetes restarts the container.

Configuration (from manifest):

health_check:
  path: /health
  port: 3000
  initial_delay: 10  # Wait before first check
  period: 30  # Check every 30 seconds
  timeout: 3  # Timeout after 3 seconds
  failure_threshold: 3  # Restart after 3 failures

Implementation:

// Express.js
app.get('/health', (req, res) => {
  res.status(200).json({
    status: 'ok',
    timestamp: new Date().toISOString()
  });
});

Readiness Probe

Determines if the application is ready to receive traffic. If it fails, Kubernetes stops sending requests.

Configure via the manifest's runtime.readiness_check_path field (defaults to the health check path if not specified).

Use for:

Database connection checks
External dependency checks
Startup tasks completion

Implementation:

// Express.js
app.get('/ready', async (req, res) => {
  try {
    // Check database connection
    await db.ping();

    // Check external API
    await fetch('https://api.example.com/health');

    res.status(200).json({
      status: 'ready',
      checks: {
        database: 'ok',
        externalApi: 'ok'
      }
    });
  } catch (err) {
    res.status(503).json({
      status: 'not ready',
      error: err.message
    });
  }
});

Health Check Status

View health check results:

Healthy: All checks passing
Unhealthy: Some checks failing
Unknown: No data or probes not configured

Monitoring Dashboard

App Details Page

View monitoring data:

Overview Tab:
- Current status (Running, Stopped, Failed)
- Instance count and health
- Quick metrics summary
Metrics Tab:
- CPU usage percentage
- Memory usage percentage
- Ready vs total replicas
Logs Tab:
- Real-time log streaming
- Search and filter
- Download logs
Scaling Tab (if autoscaling enabled):
- Current vs desired replicas
- CPU/memory thresholds

Application-Level Metrics

For detailed application-level metrics beyond what the platform provides, implement metrics collection within your application:

Node.js Example

// Track custom metrics in your application
const metrics = {
  requestCount: 0,
  errorCount: 0,
  responseTimesMs: []
};

// Instrument requests
app.use((req, res, next) => {
  const start = Date.now();
  metrics.requestCount++;

  res.on('finish', () => {
    const duration = Date.now() - start;
    metrics.responseTimesMs.push(duration);

    if (res.statusCode >= 500) {
      metrics.errorCount++;
    }

    // Keep only last 1000 response times
    if (metrics.responseTimesMs.length > 1000) {
      metrics.responseTimesMs = metrics.responseTimesMs.slice(-1000);
    }
  });

  next();
});

// Expose metrics endpoint
app.get('/app-metrics', (req, res) => {
  const avgResponseTime = metrics.responseTimesMs.length > 0
    ? metrics.responseTimesMs.reduce((a, b) => a + b, 0) / metrics.responseTimesMs.length
    : 0;

  res.json({
    request_count: metrics.requestCount,
    error_count: metrics.errorCount,
    avg_response_time_ms: Math.round(avgResponseTime),
    timestamp: new Date().toISOString()
  });
});

Python Example

import time
from functools import wraps

# Simple metrics collector
metrics = {
    'request_count': 0,
    'error_count': 0,
    'response_times': []
}

@app.before_request
def before_request():
    request.start_time = time.time()
    metrics['request_count'] += 1

@app.after_request
def after_request(response):
    duration = time.time() - request.start_time
    metrics['response_times'].append(duration)

    if response.status_code >= 500:
        metrics['error_count'] += 1

    # Keep only last 1000
    if len(metrics['response_times']) > 1000:
        metrics['response_times'] = metrics['response_times'][-1000:]

    return response

@app.route('/app-metrics')
def app_metrics():
    avg_time = sum(metrics['response_times']) / len(metrics['response_times']) if metrics['response_times'] else 0
    return {
        'request_count': metrics['request_count'],
        'error_count': metrics['error_count'],
        'avg_response_time_ms': round(avg_time * 1000),
    }

Troubleshooting with Logs

Common Patterns

Application Crashes:

# Search for error logs
Error: Cannot read property 'id' of undefined
  at /app/server.js:45:23

# Check stack trace for root cause
# Fix code and redeploy

High Memory Usage:

# Look for memory-related warnings
FATAL ERROR: Reached heap limit
Allocation failed - JavaScript heap out of memory

# Increase memory limit in manifest
# Or optimize application code

Connection Issues:

# Search for connection errors
Error: connect ECONNREFUSED 10.0.2.15:5432
# Check STRONGLY_SERVICES configuration
# Verify service is running

Performance Optimization

Identifying Bottlenecks

High CPU:
- Check slow endpoints
- Optimize algorithms
- Add caching
- Scale horizontally
High Memory:
- Check for memory leaks
- Optimize data structures
- Implement pagination
- Increase memory limit
Slow Response:
- Add database indexes
- Implement caching
- Optimize queries
- Use connection pooling

Monitoring Checklist

Health check endpoint implemented
Structured logging in place
Resource limits configured appropriately
Autoscaling configured (if needed)
Regular log review process

Best Practices

Log Everything Important: Request IDs, user actions, errors
Monitor Proactively: Review metrics regularly
Optimize Based on Data: Use CPU/memory metrics to guide optimization
Test Health Checks: Ensure health endpoints work correctly
Implement App-Level Metrics: For detailed insights beyond platform metrics
Secure Metrics: Don't expose sensitive data in metrics endpoints

Overview​

Viewing Logs​

Real-time Log Streaming​

Log Levels​

Best Practices for Logging​

Available Resource Metrics​

CPU Usage​

Memory Usage​

Replica Status​

Health Checks​

Liveness Probe​

Readiness Probe​

Health Check Status​

Monitoring Dashboard​

App Details Page​

Application-Level Metrics​

Node.js Example​

Python Example​

Troubleshooting with Logs​

Common Patterns​

Performance Optimization​

Identifying Bottlenecks​

Monitoring Checklist​

Best Practices​

Next Steps​