AI Gateway
The AI Gateway provides a unified platform for deploying and managing AI models across multiple providers. Deploy self-hosted models from Hugging Face, connect third-party providers (OpenAI, Anthropic, Google, Cohere), fine-tune models, apply guardrails, and monitor usage with comprehensive analytics.
Key Features
Self-Hosted Models
Deploy open-source models from Hugging Face on your own infrastructure with full control over:
- Scaling: Configure autoscaling based on CPU and memory usage
- Costs: Optimize spending with on-demand and scheduled deployment options
- Data Privacy: Keep sensitive data within your infrastructure
- Environment Selection: Choose from vLLM, TGI, Ollama, or custom Docker images
Learn more about deploying AI models.
Third-Party Providers
Connect to external AI providers with unified API access and automatic key management:
- OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
- Anthropic: Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku
- Google: Gemini Pro, Gemini Ultra, PaLM 2
- Cohere: Command, Command-Light, Embed
- Azure OpenAI: Enterprise-grade OpenAI models with Azure SLA
Learn more about model providers.
Fine-Tuning
Train models on your custom data to improve performance for domain-specific tasks:
- LoRA: 90% less memory than full fine-tuning
- QLoRA: Fine-tune 70B models on single 48GB GPU
- Full Fine-Tuning: Maximum customization potential
Fine-tuning creates a specialized version of a base model optimized for your specific use cases.
Guardrails
Apply content moderation, safety filters, and policy enforcement to model inputs and outputs:
- Content Filtering: Toxicity detection, PII detection, prompt injection detection
- Topic Restrictions: Allowed and banned topics
- Rate Limiting: Request and token limits per user
- Output Validation: Hallucination detection, format enforcement
Analytics
Monitor model usage, performance, and costs with real-time analytics:
- Request Metrics: Track request volume and trends
- Token Usage: Monitor token consumption across models
- Performance: View response times and error rates
- Cost Tracking: Analyze spending by model and provider
Learn more about monitoring models.
Getting Started
- Deploy a Model: Choose between self-hosted or third-party providers
- Configure Autoscaling: Set up automatic scaling based on demand
- Apply Guardrails: Add content moderation and safety filters
- Monitor Usage: Track performance and costs in the analytics dashboard
- Optimize Costs: Use on-demand deployment and right-size resources
Model Cards
Each deployed model includes a comprehensive model card documenting:
- Model Information: Name, version, base model, architecture, parameter count
- Performance Metrics: Accuracy, latency, throughput, token limits
- Training Details: Dataset, training duration, hardware used, hyperparameters
- Intended Use: Recommended applications, limitations, known biases
- Ethical Considerations: Privacy implications, fairness analysis, safety measures
- License: Usage restrictions, commercial viability, attribution requirements
- Citation: How to cite the model in research or production
Using AI Models in Applications
Access deployed models via the STRONGLY_SERVICES environment variable:
// Access AI models via STRONGLY_SERVICES
const services = JSON.parse(process.env.STRONGLY_SERVICES);
const aiModel = services.aiModels['model-id'];
const response = await fetch(aiModel.endpoint, {
method: 'POST',
headers: {
'Authorization': `Bearer ${aiModel.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: aiModel.name,
messages: [{ role: 'user', content: 'Hello!' }],
max_tokens: 500,
temperature: 0.7
})
});
const data = await response.json();
console.log(data.choices[0].message.content);