Skip to main content

AI Models

Manage AI models registered in the Strongly AI Gateway. Models can be third-party provider models (OpenAI, Anthropic, etc.) or self-hosted models deployed on your cluster.


AIModel Object

{
"id": "model_abc123",
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"status": "active",
"description": "GPT-4 for production workloads",
"capabilities": ["chat", "function-calling"],
"maxTokens": 8192,
"contextWindow": 128000,
"owner": "user_abc123",
"organizationId": "org_abc123",
"isShared": true,
"sharedWith": ["user_def456"],
"config": {
"defaultTemperature": 0.7,
"rateLimit": 100
},
"createdAt": "2025-01-15T10:00:00.000Z",
"updatedAt": "2025-02-01T14:30:00.000Z"
}

GET /api/v1/ai/models

List AI models

Returns a paginated list of AI models accessible to the current user.

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
searchquerystringNoSearch by model name
typequerystringNoFilter by type: third-party, self-hosted
statusquerystringNoFilter by status: active, deploying, stopped, failed
providerquerystringNoFilter by provider: openai, anthropic, google, etc.
modelTypequerystringNoFilter by model type: chat, completion, embedding
limitqueryintegerNoMax results (default: 50, max: 200)
offsetqueryintegerNoPagination offset
sortquerystringNoSort field (default: -createdAt)

Response: 200 OK (paginated)

{
"data": [
{
"id": "model_abc123",
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"status": "active",
"description": "GPT-4 for production workloads",
"capabilities": ["chat", "function-calling"],
"maxTokens": 8192,
"contextWindow": 128000,
"owner": "user_abc123",
"organizationId": "org_abc123",
"isShared": true,
"createdAt": "2025-01-15T10:00:00.000Z",
"updatedAt": "2025-02-01T14:30:00.000Z"
}
],
"meta": { "total": 12, "limit": 50, "offset": 0, "hasMore": false, "requestId": "req_abc123" }
}

GET /api/v1/ai/models/overview

Get model overview statistics

Returns aggregate counts of models by status and type.

Scope: ai-gateway:read

Response: 200 OK

{
"data": {
"total": 12,
"active": 8,
"deploying": 1,
"stopped": 2,
"failed": 1,
"thirdParty": 9,
"selfHosted": 3
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/certified

List curated third-party models

Returns the catalog of certified third-party models maintained by the AI Gateway (e.g. OpenAI, Anthropic, Mistral). Drives the mobile add-model picker so clients should never hardcode vendor/model lists.

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
providerquerystringNoFilter by provider (openai, anthropic, mistral, …)
modelTypequerystringNoFilter by model type (chat, embedding, image, …)
capabilityquerystringNoFilter by capability flag (vision, audio, …)

Response: 200 OK

{
"data": {
"models": [
{
"model_id": "gpt-4o",
"display_name": "GPT-4o",
"modelType": "chat",
"provider": "openai",
"vendor": "OpenAI",
"capabilities": { "vision": true, "function_calling": true },
"parameters": { "context_window": 128000 },
"api_endpoint": "https://api.openai.com/v1"
}
]
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/providers

List third-party providers

Returns the distinct providers present in the certified models catalog, with a model count for each. Used by the mobile add-model vendor picker.

Scope: ai-gateway:read

Response: 200 OK

{
"data": {
"providers": [
{ "id": "openai", "label": "OpenAI", "count": 14 },
{ "id": "anthropic", "label": "Anthropic", "count": 6 }
]
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/prebuilt

List self-hosted prebuilt model templates

Returns the catalog of prebuilt model templates that can be deployed as self-hosted models on the cluster. Drives the self-hosted-deploy wizard's vendor/model picker.

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
categoryquerystringNoFilter by category (llm, image, audio, …)
providerquerystringNoFilter by provider (meta, google, microsoft, …)

Response: 200 OK

{
"data": {
"templates": [
{
"id": "llama-3.1-8b-instruct",
"name": "Llama 3.1 8B Instruct",
"description": "Meta Llama 3.1 8B instruction-tuned chat model",
"category": "llm",
"provider": "meta",
"defaultPort": 8000,
"defaultResources": { "cpu": "2000m", "memory": "16Gi", "gpu": 1, "disk": "100Gi" },
"recommendedInstance": "g5.xlarge",
"modelSize": "8B",
"tags": ["chat", "instruct"]
}
]
},
"meta": { "requestId": "req_abc123" }
}

POST /api/v1/ai/models

Create a new AI model

Registers a new model in the AI Gateway. Third-party models become active immediately; self-hosted models require deployment.

Scope: ai-gateway:write

Request Body:

{
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"description": "GPT-4 for production workloads",
"capabilities": ["chat", "function-calling"],
"maxTokens": 8192,
"contextWindow": 128000,
"config": {
"defaultTemperature": 0.7,
"rateLimit": 100
}
}
FieldTypeRequiredDescription
namestringYesHuman-readable model name
typestringYesModel type: third-party or self-hosted
providerstringYesProvider: openai, anthropic, google, huggingface, etc.
vendorModelIdstringYesProvider's model identifier (e.g., gpt-4, claude-3-opus)
modelTypestringNoCapability type: chat, completion, embedding
descriptionstringNoModel description
capabilitiesstring[]NoList of capabilities (e.g., chat, function-calling, vision)
maxTokensintegerNoMaximum output tokens
contextWindowintegerNoMaximum context window size in tokens
configobjectNoAdditional provider-specific configuration

Response: 201 Created

{
"data": {
"id": "model_abc123",
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"status": "active",
"owner": "user_abc123",
"organizationId": "org_abc123",
"createdAt": "2025-02-07T10:00:00.000Z",
"updatedAt": "2025-02-07T10:00:00.000Z"
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id

Get an AI model

Returns the full details of a single model.

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

{
"data": {
"id": "model_abc123",
"name": "GPT-4 Production",
"type": "third-party",
"provider": "openai",
"vendorModelId": "gpt-4",
"modelType": "chat",
"status": "active",
"description": "GPT-4 for production workloads",
"capabilities": ["chat", "function-calling"],
"maxTokens": 8192,
"contextWindow": 128000,
"owner": "user_abc123",
"organizationId": "org_abc123",
"isShared": true,
"sharedWith": ["user_def456"],
"config": {
"defaultTemperature": 0.7,
"rateLimit": 100
},
"createdAt": "2025-01-15T10:00:00.000Z",
"updatedAt": "2025-02-01T14:30:00.000Z"
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/options

Get model-specific options

Returns runtime options exposed by the model (e.g. available voices for TTS, supported languages for STT, sampler choices). Proxies to the AI Gateway, which queries the model pod (self-hosted) or the provider API (third-party).

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

Shape is model-dependent. Typical examples:

{
"data": {
"voices": [
{ "id": "alloy", "label": "Alloy", "gender": "neutral" },
{ "id": "verse", "label": "Verse", "gender": "neutral" }
],
"languages": ["en", "es", "fr", "de", "ja"]
},
"meta": { "requestId": "req_abc123" }
}

Errors:

  • 502 gateway-error — AI Gateway unreachable or returned an error.
  • 500 config-error — AI Gateway URL not configured on the server.

PUT /api/v1/ai/models/:id

Update an AI model

Updates model properties. Only provided fields are changed.

Scope: ai-gateway:write

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Request Body:

{
"name": "GPT-4 Production (Updated)",
"description": "Updated description",
"maxTokens": 4096,
"config": {
"defaultTemperature": 0.5
}
}
FieldTypeRequiredDescription
namestringNoUpdated model name
descriptionstringNoUpdated description
capabilitiesstring[]NoUpdated capabilities list
maxTokensintegerNoUpdated max output tokens
contextWindowintegerNoUpdated context window size
configobjectNoUpdated configuration (merged with existing)

Response: 200 OK

{
"data": {
"id": "model_abc123",
"name": "GPT-4 Production (Updated)",
"updatedAt": "2025-02-07T12:00:00.000Z"
},
"meta": { "requestId": "req_abc123" }
}

DELETE /api/v1/ai/models/:id

Delete an AI model

Permanently removes a model from the AI Gateway. Self-hosted models are undeployed first.

Scope: ai-gateway:write

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 204 No Content


DELETE /api/v1/ai/models/:id/cache

Clear the model's semantic cache

Removes all semantic-cache entries for this model. Future requests will miss the cache and generate fresh responses. Useful after changing the system prompt, model parameters, or cache configuration.

Scope: ai-gateway:write

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

{
"data": {
"cleared": true,
"entriesDeleted": 47
},
"meta": { "requestId": "req_abc123" }
}

Errors:

  • 500 cache-error — Failed to clear cache; message contains the underlying error.

POST /api/v1/ai/models/:id/deploy

Deploy a self-hosted model

Deploys a self-hosted model to the Kubernetes cluster. Only applicable to models with type: "self-hosted".

Scope: ai-gateway:write

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

{
"data": {
"id": "model_abc123",
"status": "deploying",
"message": "Deployment initiated"
},
"meta": { "requestId": "req_abc123" }
}

POST /api/v1/ai/models/:id/start

Start a stopped model

Starts a previously stopped self-hosted model.

Scope: ai-gateway:write

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

{
"data": {
"id": "model_abc123",
"status": "deploying",
"message": "Model starting"
},
"meta": { "requestId": "req_abc123" }
}

POST /api/v1/ai/models/:id/stop

Stop a running model

Stops a running self-hosted model, freeing cluster resources.

Scope: ai-gateway:write

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

{
"data": {
"id": "model_abc123",
"status": "stopped",
"message": "Model stopped"
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/status

Get model deployment status

Returns the current deployment status and replica information for a self-hosted model.

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

{
"data": {
"status": "active",
"replicas": 2,
"readyReplicas": 2,
"conditions": [
{
"type": "Available",
"status": "True",
"lastTransitionTime": "2025-02-07T10:00:00.000Z",
"reason": "MinimumReplicasAvailable",
"message": "Deployment has minimum availability"
}
]
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/metrics

Get model metrics

Returns performance and usage metrics for a model.

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

{
"data": {
"totalRequests": 15420,
"totalTokens": 2500000,
"avgLatencyMs": 450,
"errorRate": 0.02,
"requestsPerMinute": 12.5,
"uptimePercent": 99.8
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/logs

Get model logs

Returns container logs for a self-hosted model deployment.

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID
linesqueryintegerNoNumber of log lines to return (default: 100)
sincequerystringNoReturn logs since this time (ISO 8601 or duration like 1h, 30m)
containerquerystringNoContainer name (for multi-container pods)

Response: 200 OK

{
"data": {
"logs": "2025-02-07T10:00:00Z INFO Model loaded successfully\n2025-02-07T10:00:01Z INFO Ready to serve requests on port 8080\n",
"container": "model-server",
"lines": 100
},
"meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/permissions

Get model permissions

Returns sharing and ownership information for a model.

Scope: ai-gateway:read

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Response: 200 OK

{
"data": {
"owner": "user_abc123",
"sharedWith": ["user_def456", "user_ghi789"],
"isShared": true,
"organizationId": "org_abc123"
},
"meta": { "requestId": "req_abc123" }
}

PUT /api/v1/ai/models/:id/permissions

Update model permissions

Updates sharing settings for a model. Only the model owner can modify permissions.

Scope: ai-gateway:write

Parameters:

NameInTypeRequiredDescription
idpathstringYesModel ID

Request Body:

{
"isShared": true,
"sharedWith": ["user_def456", "user_ghi789"]
}
FieldTypeRequiredDescription
isSharedbooleanNoWhether the model is shared with other users
sharedWithstring[]NoList of user IDs to share with

Response: 200 OK

{
"data": {
"owner": "user_abc123",
"sharedWith": ["user_def456", "user_ghi789"],
"isShared": true,
"organizationId": "org_abc123"
},
"meta": { "requestId": "req_abc123" }
}