Skip to main content

AI Gateway

The AI Gateway provides a unified platform for deploying and managing AI models across multiple providers. Deploy self-hosted models from Hugging Face, connect third-party providers (OpenAI, Anthropic, Google, Cohere), fine-tune models, apply guardrails, and monitor usage with comprehensive analytics.

Key Features

Self-Hosted Models

Deploy open-source models from Hugging Face on your own infrastructure with full control over:

  • Scaling: Configure autoscaling based on CPU and memory usage
  • Costs: Optimize spending with on-demand and scheduled deployment options
  • Data Privacy: Keep sensitive data within your infrastructure
  • Environment Selection: Choose from vLLM, TGI, Ollama, or custom Docker images

Learn more about deploying AI models.

Third-Party Providers

Connect to external AI providers with unified API access and automatic key management:

  • OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
  • Anthropic: Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku
  • Google: Gemini Pro, Gemini Ultra, PaLM 2
  • Cohere: Command, Command-Light, Embed
  • Azure OpenAI: Enterprise-grade OpenAI models with Azure SLA

Learn more about model providers.

Fine-Tuning

Train models on your custom data to improve performance for domain-specific tasks:

  • LoRA: 90% less memory than full fine-tuning
  • QLoRA: Fine-tune 70B models on single 48GB GPU
  • Full Fine-Tuning: Maximum customization potential

Fine-tuning creates a specialized version of a base model optimized for your specific use cases.

Guardrails

Apply content moderation, safety filters, and policy enforcement to model inputs and outputs:

  • Content Filtering: Toxicity detection, PII detection, prompt injection detection
  • Topic Restrictions: Allowed and banned topics
  • Rate Limiting: Request and token limits per user
  • Output Validation: Hallucination detection, format enforcement

Analytics

Monitor model usage, performance, and costs with real-time analytics:

  • Request Metrics: Track request volume and trends
  • Token Usage: Monitor token consumption across models
  • Performance: View response times and error rates
  • Cost Tracking: Analyze spending by model and provider

Learn more about monitoring models.

Getting Started

  1. Deploy a Model: Choose between self-hosted or third-party providers
  2. Configure Autoscaling: Set up automatic scaling based on demand
  3. Apply Guardrails: Add content moderation and safety filters
  4. Monitor Usage: Track performance and costs in the analytics dashboard
  5. Optimize Costs: Use on-demand deployment and right-size resources

Model Cards

Each deployed model includes a comprehensive model card documenting:

  • Model Information: Name, version, base model, architecture, parameter count
  • Performance Metrics: Accuracy, latency, throughput, token limits
  • Training Details: Dataset, training duration, hardware used, hyperparameters
  • Intended Use: Recommended applications, limitations, known biases
  • Ethical Considerations: Privacy implications, fairness analysis, safety measures
  • License: Usage restrictions, commercial viability, attribution requirements
  • Citation: How to cite the model in research or production

Using AI Models in Applications

Access deployed models via the STRONGLY_SERVICES environment variable:

// Access AI models via STRONGLY_SERVICES
const services = JSON.parse(process.env.STRONGLY_SERVICES);
const aiModel = services.aiModels['model-id'];

const response = await fetch(aiModel.endpoint, {
method: 'POST',
headers: {
'Authorization': `Bearer ${aiModel.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: aiModel.name,
messages: [{ role: 'user', content: 'Hello!' }],
max_tokens: 500,
temperature: 0.7
})
});

const data = await response.json();
console.log(data.choices[0].message.content);

Next Steps