AI Models
Manage AI models registered in the Strongly AI Gateway. Models can be third-party provider models (OpenAI, Anthropic, etc.) or self-hosted models deployed on your cluster.
AIModel Object
{
"id": "model_abc123",
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"status": "active",
"description": "GPT-4 for production workloads",
"capabilities": ["chat", "function-calling"],
"maxTokens": 8192,
"contextWindow": 128000,
"owner": "user_abc123",
"organizationId": "org_abc123",
"isShared": true,
"sharedWith": ["user_def456"],
"config": {
"defaultTemperature": 0.7,
"rateLimit": 100
},
"createdAt": "2025-01-15T10:00:00.000Z",
"updatedAt": "2025-02-01T14:30:00.000Z"
}
GET /api/v1/ai/models
List AI models
Returns a paginated list of AI models accessible to the current user.
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
search | query | string | No | Search by model name |
type | query | string | No | Filter by type: third-party, self-hosted |
status | query | string | No | Filter by status: active, deploying, stopped, failed |
provider | query | string | No | Filter by provider: openai, anthropic, google, etc. |
modelType | query | string | No | Filter by model type: chat, completion, embedding |
limit | query | integer | No | Max results (default: 50, max: 200) |
offset | query | integer | No | Pagination offset |
sort | query | string | No | Sort field (default: -createdAt) |
Response: 200 OK (paginated)
{
"data": [
{
"id": "model_abc123",
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"status": "active",
"description": "GPT-4 for production workloads",
"capabilities": ["chat", "function-calling"],
"maxTokens": 8192,
"contextWindow": 128000,
"owner": "user_abc123",
"organizationId": "org_abc123",
"isShared": true,
"createdAt": "2025-01-15T10:00:00.000Z",
"updatedAt": "2025-02-01T14:30:00.000Z"
}
],
"meta": { "total": 12, "limit": 50, "offset": 0, "hasMore": false, "requestId": "req_abc123" }
}
GET /api/v1/ai/models/overview
Get model overview statistics
Returns aggregate counts of models by status and type.
Scope: ai-gateway:read
Response: 200 OK
{
"data": {
"total": 12,
"active": 8,
"deploying": 1,
"stopped": 2,
"failed": 1,
"thirdParty": 9,
"selfHosted": 3
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/certified
List curated third-party models
Returns the catalog of certified third-party models maintained by the AI Gateway (e.g. OpenAI, Anthropic, Mistral). Drives the mobile add-model picker so clients should never hardcode vendor/model lists.
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
provider | query | string | No | Filter by provider (openai, anthropic, mistral, …) |
modelType | query | string | No | Filter by model type (chat, embedding, image, …) |
capability | query | string | No | Filter by capability flag (vision, audio, …) |
Response: 200 OK
{
"data": {
"models": [
{
"model_id": "gpt-4o",
"display_name": "GPT-4o",
"modelType": "chat",
"provider": "openai",
"vendor": "OpenAI",
"capabilities": { "vision": true, "function_calling": true },
"parameters": { "context_window": 128000 },
"api_endpoint": "https://api.openai.com/v1"
}
]
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/providers
List third-party providers
Returns the distinct providers present in the certified models catalog, with a model count for each. Used by the mobile add-model vendor picker.
Scope: ai-gateway:read
Response: 200 OK
{
"data": {
"providers": [
{ "id": "openai", "label": "OpenAI", "count": 14 },
{ "id": "anthropic", "label": "Anthropic", "count": 6 }
]
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/prebuilt
List self-hosted prebuilt model templates
Returns the catalog of prebuilt model templates that can be deployed as self-hosted models on the cluster. Drives the self-hosted-deploy wizard's vendor/model picker.
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
category | query | string | No | Filter by category (llm, image, audio, …) |
provider | query | string | No | Filter by provider (meta, google, microsoft, …) |
Response: 200 OK
{
"data": {
"templates": [
{
"id": "llama-3.1-8b-instruct",
"name": "Llama 3.1 8B Instruct",
"description": "Meta Llama 3.1 8B instruction-tuned chat model",
"category": "llm",
"provider": "meta",
"defaultPort": 8000,
"defaultResources": { "cpu": "2000m", "memory": "16Gi", "gpu": 1, "disk": "100Gi" },
"recommendedInstance": "g5.xlarge",
"modelSize": "8B",
"tags": ["chat", "instruct"]
}
]
},
"meta": { "requestId": "req_abc123" }
}
POST /api/v1/ai/models
Create a new AI model
Registers a new model in the AI Gateway. Third-party models become active immediately; self-hosted models require deployment.
Scope: ai-gateway:write
Request Body:
{
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"description": "GPT-4 for production workloads",
"capabilities": ["chat", "function-calling"],
"maxTokens": 8192,
"contextWindow": 128000,
"config": {
"defaultTemperature": 0.7,
"rateLimit": 100
}
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Human-readable model name |
type | string | Yes | Model type: third-party or self-hosted |
provider | string | Yes | Provider: openai, anthropic, google, huggingface, etc. |
vendorModelId | string | Yes | Provider's model identifier (e.g., gpt-4, claude-3-opus) |
modelType | string | No | Capability type: chat, completion, embedding |
description | string | No | Model description |
capabilities | string[] | No | List of capabilities (e.g., chat, function-calling, vision) |
maxTokens | integer | No | Maximum output tokens |
contextWindow | integer | No | Maximum context window size in tokens |
config | object | No | Additional provider-specific configuration |
Response: 201 Created
{
"data": {
"id": "model_abc123",
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"status": "active",
"owner": "user_abc123",
"organizationId": "org_abc123",
"createdAt": "2025-02-07T10:00:00.000Z",
"updatedAt": "2025-02-07T10:00:00.000Z"
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/:id
Get an AI model
Returns the full details of a single model.
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
{
"data": {
"id": "model_abc123",
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"status": "active",
"description": "GPT-4 for production workloads",
"capabilities": ["chat", "function-calling"],
"maxTokens": 8192,
"contextWindow": 128000,
"owner": "user_abc123",
"organizationId": "org_abc123",
"isShared": true,
"sharedWith": ["user_def456"],
"config": {
"defaultTemperature": 0.7,
"rateLimit": 100
},
"createdAt": "2025-01-15T10:00:00.000Z",
"updatedAt": "2025-02-01T14:30:00.000Z"
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/:id/options
Get model-specific options
Returns runtime options exposed by the model (e.g. available voices for TTS, supported languages for STT, sampler choices). Proxies to the AI Gateway, which queries the model pod (self-hosted) or the provider API (third-party).
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
Shape is model-dependent. Typical examples:
{
"data": {
"voices": [
{ "id": "alloy", "label": "Alloy", "gender": "neutral" },
{ "id": "verse", "label": "Verse", "gender": "neutral" }
],
"languages": ["en", "es", "fr", "de", "ja"]
},
"meta": { "requestId": "req_abc123" }
}
Errors:
502 gateway-error— AI Gateway unreachable or returned an error.500 config-error— AI Gateway URL not configured on the server.
PUT /api/v1/ai/models/:id
Update an AI model
Updates model properties. Only provided fields are changed.
Scope: ai-gateway:write
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Request Body:
{
"name": "GPT-4 Production (Updated)",
"description": "Updated description",
"maxTokens": 4096,
"config": {
"defaultTemperature": 0.5
}
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | No | Updated model name |
description | string | No | Updated description |
capabilities | string[] | No | Updated capabilities list |
maxTokens | integer | No | Updated max output tokens |
contextWindow | integer | No | Updated context window size |
config | object | No | Updated configuration (merged with existing) |
Response: 200 OK
{
"data": {
"id": "model_abc123",
"name": "GPT-4 Production (Updated)",
"updatedAt": "2025-02-07T12:00:00.000Z"
},
"meta": { "requestId": "req_abc123" }
}
DELETE /api/v1/ai/models/:id
Delete an AI model
Permanently removes a model from the AI Gateway. Self-hosted models are undeployed first.
Scope: ai-gateway:write
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 204 No Content
DELETE /api/v1/ai/models/:id/cache
Clear the model's semantic cache
Removes all semantic-cache entries for this model. Future requests will miss the cache and generate fresh responses. Useful after changing the system prompt, model parameters, or cache configuration.
Scope: ai-gateway:write
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
{
"data": {
"cleared": true,
"entriesDeleted": 47
},
"meta": { "requestId": "req_abc123" }
}
Errors:
500 cache-error— Failed to clear cache;messagecontains the underlying error.
POST /api/v1/ai/models/:id/deploy
Deploy a self-hosted model
Deploys a self-hosted model to the Kubernetes cluster. Only applicable to models with type: "self-hosted".
Scope: ai-gateway:write
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
{
"data": {
"id": "model_abc123",
"status": "deploying",
"message": "Deployment initiated"
},
"meta": { "requestId": "req_abc123" }
}
POST /api/v1/ai/models/:id/start
Start a stopped model
Starts a previously stopped self-hosted model.
Scope: ai-gateway:write
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
{
"data": {
"id": "model_abc123",
"status": "deploying",
"message": "Model starting"
},
"meta": { "requestId": "req_abc123" }
}
POST /api/v1/ai/models/:id/stop
Stop a running model
Stops a running self-hosted model, freeing cluster resources.
Scope: ai-gateway:write
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
{
"data": {
"id": "model_abc123",
"status": "stopped",
"message": "Model stopped"
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/:id/status
Get model deployment status
Returns the current deployment status and replica information for a self-hosted model.
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
{
"data": {
"status": "active",
"replicas": 2,
"readyReplicas": 2,
"conditions": [
{
"type": "Available",
"status": "True",
"lastTransitionTime": "2025-02-07T10:00:00.000Z",
"reason": "MinimumReplicasAvailable",
"message": "Deployment has minimum availability"
}
]
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/:id/metrics
Get model metrics
Returns performance and usage metrics for a model.
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
{
"data": {
"totalRequests": 15420,
"totalTokens": 2500000,
"avgLatencyMs": 450,
"errorRate": 0.02,
"requestsPerMinute": 12.5,
"uptimePercent": 99.8
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/:id/logs
Get model logs
Returns container logs for a self-hosted model deployment.
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
lines | query | integer | No | Number of log lines to return (default: 100) |
since | query | string | No | Return logs since this time (ISO 8601 or duration like 1h, 30m) |
container | query | string | No | Container name (for multi-container pods) |
Response: 200 OK
{
"data": {
"logs": "2025-02-07T10:00:00Z INFO Model loaded successfully\n2025-02-07T10:00:01Z INFO Ready to serve requests on port 8080\n",
"container": "model-server",
"lines": 100
},
"meta": { "requestId": "req_abc123" }
}
GET /api/v1/ai/models/:id/permissions
Get model permissions
Returns sharing and ownership information for a model.
Scope: ai-gateway:read
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Response: 200 OK
{
"data": {
"owner": "user_abc123",
"sharedWith": ["user_def456", "user_ghi789"],
"isShared": true,
"organizationId": "org_abc123"
},
"meta": { "requestId": "req_abc123" }
}
PUT /api/v1/ai/models/:id/permissions
Update model permissions
Updates sharing settings for a model. Only the model owner can modify permissions.
Scope: ai-gateway:write
Parameters:
| Name | In | Type | Required | Description |
|---|---|---|---|---|
id | path | string | Yes | Model ID |
Request Body:
{
"isShared": true,
"sharedWith": ["user_def456", "user_ghi789"]
}
| Field | Type | Required | Description |
|---|---|---|---|
isShared | boolean | No | Whether the model is shared with other users |
sharedWith | string[] | No | List of user IDs to share with |
Response: 200 OK
{
"data": {
"owner": "user_abc123",
"sharedWith": ["user_def456", "user_ghi789"],
"isShared": true,
"organizationId": "org_abc123"
},
"meta": { "requestId": "req_abc123" }
}