FinOps & Cost Management
Track infrastructure costs, manage budgets with threshold enforcement, schedule resource auto-stop/start, organize resources into groups, and analyze cost trends with forecasting and anomaly detection.
Overview
The FinOps module provides comprehensive cost management for the Strongly AI platform. It combines AWS cost data with platform-level resource tracking to give administrators full visibility into infrastructure spending, execution costs, and optimization opportunities.
Core Capabilities
- Cost Dashboard -- Real-time cost metrics with monthly trends, daily breakdowns, service-level analysis, and cost forecasting
- Budgets -- Create budgets at platform, user, or resource group scope with threshold alerts and enforcement actions
- Resource Groups -- Organize platform resources (apps, workflows, add-ons, models) into logical groups for budget and schedule management
- Resource Schedules -- Time-based auto-stop/start schedules to reduce costs during off-hours
- Cost Forecasting -- ML-based predictions of future costs based on historical spending patterns
- Anomaly Detection -- Automatic identification of unusual cost spikes
- Execution Costs -- Track costs associated with workflow executions
- Cache Management -- Backend caching layer for cost data with manual refresh capability
Architecture
FinOps data flows through several layers:
- Backend API -- Fetches cost data from AWS Cost Explorer, calculates execution costs, and generates forecasts
- Cache Layer -- Cost data is cached for fast access to avoid repeated AWS API calls
- Frontend Methods -- Proxy requests to the backend API or serve cached data
- Budget/Schedule/Group Management -- Managed via the platform's data layer
Backend API Endpoints
The platform communicates with the backend service for cost data:
| Endpoint | Description |
|---|---|
GET /api/v1/finops/monthly-costs | Monthly cost breakdown by service |
GET /api/v1/finops/daily-costs | Daily cost breakdown |
GET /api/v1/finops/cost-forecast | ML-based cost predictions |
GET /api/v1/finops/service-breakdown | Cost breakdown by AWS service for a date range |
GET /api/v1/finops/savings-recommendations | AWS optimization recommendations |
GET /api/v1/finops/cost-anomalies | Detected cost anomalies |
GET /api/v1/finops/execution-costs | Workflow execution cost tracking |
GET /api/v1/finops/combined-costs | Combined infrastructure + execution costs |
POST /api/v1/finops/cache/trigger-refresh | Trigger background cache refresh |
Cost Dashboard
Accessing the Dashboard
- Click FinOps in the main navigation
- The dashboard displays key metrics:
- Current Month Spend -- Real-time cost accumulation with trend vs. previous month
- Average Monthly Cost -- Baseline for comparison over selected time range
- Projected Next Month -- ML-based forecast of upcoming costs
- Potential Savings -- Total from optimization recommendations
Data Sources
The dashboard aggregates data from multiple sources:
- Monthly costs -- Cached from AWS Cost Explorer, broken down by service
- Daily costs -- Granular daily spending for the last 30 days
- Predictions -- ML-based cost forecasts
- Service breakdown -- Cost distribution across AWS services
- Anomalies -- Statistically detected cost spikes
All data is served from cache for fast access. When no cached data exists, the system falls back to legacy cache entries for backward compatibility.
Cost Explorer Methods
| Method | Parameters | Description |
|---|---|---|
costExplorer.getAllData | months (default: 12) | Get all cached cost data in a single call |
costExplorer.getMonthlyCosts | months (default: 12) | Monthly cost breakdown |
costExplorer.getDailyCosts | days (default: 30) | Daily cost breakdown |
costExplorer.getCostForecast | monthsAhead (default: 3) | Cost predictions |
costExplorer.getServiceBreakdown | startDate, endDate | Service-level cost breakdown for a date range |
costExplorer.getSavingsRecommendations | -- | AWS optimization recommendations |
costExplorer.getCostAnomalies | -- | Detected cost anomalies |
costExplorer.getExecutionCosts | months, year | Workflow execution costs |
costExplorer.getCombinedCosts | months (default: 12) | Infrastructure + execution costs combined |
costExplorer.triggerCacheRefresh | -- | Trigger background cache refresh |
All cost methods require admin role access.
Cache Refresh
Cost data is cached to minimize AWS API calls. To refresh the cache:
- Navigate to the FinOps dashboard
- Click the Refresh button
- The system triggers a background cache refresh via the backend API
- New data appears on the dashboard once the refresh completes
The refresh is fire-and-forget -- the UI returns immediately while the backend processes the refresh asynchronously.
Resource Groups
Resource Groups allow you to organize platform resources into logical groupings for budget and schedule management.
Supported Resource Types
| Type | Description |
|---|---|
app | Deployed applications |
workflow | Workflow definitions |
addon | Platform add-ons |
workspace | Workspaces |
project | Projects |
fine_tuning | Fine-tuning jobs |
self_hosted_model | Self-hosted model deployments |
ml_model | ML model registry entries |
automl | AutoML jobs |
Creating a Resource Group
- Navigate to FinOps then Resource Groups
- Click Create Resource Group
- Enter a name and description
- Optionally add initial resources
- Click Create
Managing Resources in Groups
| Operation | Method | Description |
|---|---|---|
| List groups | resourceGroups.list | List groups with optional status, search, and pagination filters |
| Get group | resourceGroups.get | Get a single group by ID |
| Create group | resourceGroups.create | Create a new group |
| Update group | resourceGroups.update | Update name, description, or status |
| Delete group | resourceGroups.delete | Delete a group |
| Add resource | resourceGroups.addResource | Add a resource (type, ID, name) to a group |
| Remove resource | resourceGroups.removeResource | Remove a resource from a group |
| Find by resource | resourceGroups.findByResource | Find all groups containing a specific resource |
Access Control
- Administrators can see all resource groups
- Non-admin users can only see groups they own (
ownerIdmatches their user ID) - Only the owner or an admin can modify or delete a group
- Duplicate resources (same type + ID) are prevented within a group
Analyzing Cost Trends
Monthly Trends
- Go to the Overview tab
- Review the monthly cost trend chart showing:
- Total cost per month
- Month-over-month percentage change
- Top AWS services contributing to costs
- Hover over any month to see a detailed breakdown
Service Breakdown
- Click the Service Breakdown tab
- View cost distribution across AWS services (EC2, RDS, S3, EKS, etc.)
- Review the detailed table with:
- Service name
- Current month cost
- Previous month cost
- Percentage change
- Percentage of total spend
Cost Forecast
- Navigate to the Forecast tab
- View 3-month cost projections based on historical patterns
- Review confidence intervals (upper and lower bounds)
- Use the forecast to plan budgets and identify potential overruns
Cost Anomalies
- Click the Anomalies tab
- Review detected cost anomalies:
- Date and duration of the anomaly
- Which service caused the anomaly
- Expected vs. actual cost
- Dollar amount of the anomaly
Best Practices
Organize Resources
- Create resource groups aligned with teams, projects, or environments
- Use consistent naming conventions for groups
- Regularly review group membership as resources are added or removed
Monitor Proactively
- Review the cost dashboard weekly
- Set up budgets with threshold alerts before costs become a problem
- Use resource schedules to automatically control non-production costs
- Track execution costs alongside infrastructure costs for full visibility
Optimize Costs
- Apply resource schedules to stop development resources during off-hours (potential 60-70% savings)
- Review optimization recommendations regularly
- Use resource groups to track costs by team or project
- Set budgets at multiple scope levels for layered cost control
The most impactful quick win is setting up resource schedules for development environments. Stopping resources during nights and weekends can reduce development compute costs by 60-70%.