AI Models

Manage AI models registered in the Strongly AI Gateway. Models can be third-party provider models (OpenAI, Anthropic, etc.) or self-hosted models deployed on your cluster.

AIModel Object

{
  "id": "model_abc123",
  "name": "GPT-4 Production",
  "type": "third-party",
  "provider": "openai",
  "vendorModelId": "gpt-4",
  "modelType": "chat",
  "status": "active",
  "description": "GPT-4 for production workloads",
  "capabilities": ["chat", "function-calling"],
  "maxTokens": 8192,
  "contextWindow": 128000,
  "owner": "user_abc123",
  "organizationId": "org_abc123",
  "isShared": true,
  "sharedWith": ["user_def456"],
  "config": {
    "defaultTemperature": 0.7,
    "rateLimit": 100
  },
  "createdAt": "2025-01-15T10:00:00.000Z",
  "updatedAt": "2025-02-01T14:30:00.000Z"
}

GET /api/v1/ai/models

List AI models

Returns a paginated list of AI models accessible to the current user.

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`search`	query	string	No	Search by model name
`type`	query	string	No	Filter by type: `third-party`, `self-hosted`
`status`	query	string	No	Filter by status: `active`, `deploying`, `stopped`, `failed`
`provider`	query	string	No	Filter by provider: `openai`, `anthropic`, `google`, etc.
`modelType`	query	string	No	Filter by model type: `chat`, `completion`, `embedding`
`limit`	query	integer	No	Max results (default: 50, max: 200)
`offset`	query	integer	No	Pagination offset
`sort`	query	string	No	Sort field (default: `-createdAt`)

Response: 200 OK (paginated)

{
  "data": [
    {
      "id": "model_abc123",
      "name": "GPT-4 Production",
      "type": "third-party",
      "provider": "openai",
      "vendorModelId": "gpt-4",
      "modelType": "chat",
      "status": "active",
      "description": "GPT-4 for production workloads",
      "capabilities": ["chat", "function-calling"],
      "maxTokens": 8192,
      "contextWindow": 128000,
      "owner": "user_abc123",
      "organizationId": "org_abc123",
      "isShared": true,
      "createdAt": "2025-01-15T10:00:00.000Z",
      "updatedAt": "2025-02-01T14:30:00.000Z"
    }
  ],
  "meta": { "total": 12, "limit": 50, "offset": 0, "hasMore": false, "requestId": "req_abc123" }
}

GET /api/v1/ai/models/overview

Get model overview statistics

Returns aggregate counts of models by status and type.

Scope: ai-gateway:read

Response: 200 OK

{
  "data": {
    "total": 12,
    "active": 8,
    "deploying": 1,
    "stopped": 2,
    "failed": 1,
    "thirdParty": 9,
    "selfHosted": 3
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/certified

List curated third-party models

Returns the catalog of certified third-party models maintained by the AI Gateway (e.g. OpenAI, Anthropic, Mistral). Drives the mobile add-model picker so clients should never hardcode vendor/model lists.

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`provider`	query	string	No	Filter by provider (`openai`, `anthropic`, `mistral`, …)
`modelType`	query	string	No	Filter by model type (`chat`, `embedding`, `image`, …)
`capability`	query	string	No	Filter by capability flag (`vision`, `audio`, …)

Response: 200 OK

{
  "data": {
    "models": [
      {
        "model_id": "gpt-4o",
        "display_name": "GPT-4o",
        "modelType": "chat",
        "provider": "openai",
        "vendor": "OpenAI",
        "capabilities": { "vision": true, "function_calling": true },
        "parameters": { "context_window": 128000 },
        "api_endpoint": "https://api.openai.com/v1"
      }
    ]
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/providers

List third-party providers

Returns the distinct providers present in the certified models catalog, with a model count for each. Used by the mobile add-model vendor picker.

Scope: ai-gateway:read

Response: 200 OK

{
  "data": {
    "providers": [
      { "id": "openai", "label": "OpenAI", "count": 14 },
      { "id": "anthropic", "label": "Anthropic", "count": 6 }
    ]
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/prebuilt

List self-hosted prebuilt model templates

Returns the catalog of prebuilt model templates that can be deployed as self-hosted models on the cluster. Drives the self-hosted-deploy wizard's vendor/model picker.

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`category`	query	string	No	Filter by category (`llm`, `image`, `audio`, …)
`provider`	query	string	No	Filter by provider (`meta`, `google`, `microsoft`, …)

Response: 200 OK

{
  "data": {
    "templates": [
      {
        "id": "llama-3.1-8b-instruct",
        "name": "Llama 3.1 8B Instruct",
        "description": "Meta Llama 3.1 8B instruction-tuned chat model",
        "category": "llm",
        "provider": "meta",
        "defaultPort": 8000,
        "defaultResources": { "cpu": "2000m", "memory": "16Gi", "gpu": 1, "disk": "100Gi" },
        "recommendedInstance": "g5.xlarge",
        "modelSize": "8B",
        "tags": ["chat", "instruct"]
      }
    ]
  },
  "meta": { "requestId": "req_abc123" }
}

POST /api/v1/ai/models

Create a new AI model

Registers a new model in the AI Gateway. Third-party models become active immediately; self-hosted models require deployment.

Scope: ai-gateway:write

Request Body:

{
  "name": "GPT-4 Production",
  "type": "third-party",
  "provider": "openai",
  "vendorModelId": "gpt-4",
  "modelType": "chat",
  "description": "GPT-4 for production workloads",
  "capabilities": ["chat", "function-calling"],
  "maxTokens": 8192,
  "contextWindow": 128000,
  "config": {
    "defaultTemperature": 0.7,
    "rateLimit": 100
  }
}

Field	Type	Required	Description
`name`	string	Yes	Human-readable model name
`type`	string	Yes	Model type: `third-party` or `self-hosted`
`provider`	string	Yes	Provider: `openai`, `anthropic`, `google`, `huggingface`, etc.
`vendorModelId`	string	Yes	Provider's model identifier (e.g., `gpt-4`, `claude-3-opus`)
`modelType`	string	No	Capability type: `chat`, `completion`, `embedding`
`description`	string	No	Model description
`capabilities`	string[]	No	List of capabilities (e.g., `chat`, `function-calling`, `vision`)
`maxTokens`	integer	No	Maximum output tokens
`contextWindow`	integer	No	Maximum context window size in tokens
`config`	object	No	Additional provider-specific configuration

Response: 201 Created

{
  "data": {
    "id": "model_abc123",
    "name": "GPT-4 Production",
    "type": "third-party",
    "provider": "openai",
    "vendorModelId": "gpt-4",
    "modelType": "chat",
    "status": "active",
    "owner": "user_abc123",
    "organizationId": "org_abc123",
    "createdAt": "2025-02-07T10:00:00.000Z",
    "updatedAt": "2025-02-07T10:00:00.000Z"
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id

Get an AI model

Returns the full details of a single model.

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 200 OK

{
  "data": {
    "id": "model_abc123",
    "name": "GPT-4 Production",
    "type": "third-party",
    "provider": "openai",
    "vendorModelId": "gpt-4",
    "modelType": "chat",
    "status": "active",
    "description": "GPT-4 for production workloads",
    "capabilities": ["chat", "function-calling"],
    "maxTokens": 8192,
    "contextWindow": 128000,
    "owner": "user_abc123",
    "organizationId": "org_abc123",
    "isShared": true,
    "sharedWith": ["user_def456"],
    "config": {
      "defaultTemperature": 0.7,
      "rateLimit": 100
    },
    "createdAt": "2025-01-15T10:00:00.000Z",
    "updatedAt": "2025-02-01T14:30:00.000Z"
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/options

Get model-specific options

Returns runtime options exposed by the model (e.g. available voices for TTS, supported languages for STT, sampler choices). Proxies to the AI Gateway, which queries the model pod (self-hosted) or the provider API (third-party).

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 200 OK

Shape is model-dependent. Typical examples:

{
  "data": {
    "voices": [
      { "id": "alloy", "label": "Alloy", "gender": "neutral" },
      { "id": "verse", "label": "Verse", "gender": "neutral" }
    ],
    "languages": ["en", "es", "fr", "de", "ja"]
  },
  "meta": { "requestId": "req_abc123" }
}

Errors:

502 gateway-error — AI Gateway unreachable or returned an error.
500 config-error — AI Gateway URL not configured on the server.

PUT /api/v1/ai/models/:id

Update an AI model

Updates model properties. Only provided fields are changed.

Scope: ai-gateway:write

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Request Body:

{
  "name": "GPT-4 Production (Updated)",
  "description": "Updated description",
  "maxTokens": 4096,
  "config": {
    "defaultTemperature": 0.5
  }
}

Field	Type	Required	Description
`name`	string	No	Updated model name
`description`	string	No	Updated description
`capabilities`	string[]	No	Updated capabilities list
`maxTokens`	integer	No	Updated max output tokens
`contextWindow`	integer	No	Updated context window size
`config`	object	No	Updated configuration (merged with existing)

Response: 200 OK

{
  "data": {
    "id": "model_abc123",
    "name": "GPT-4 Production (Updated)",
    "updatedAt": "2025-02-07T12:00:00.000Z"
  },
  "meta": { "requestId": "req_abc123" }
}

DELETE /api/v1/ai/models/:id

Delete an AI model

Permanently removes a model from the AI Gateway. Self-hosted models are undeployed first.

Scope: ai-gateway:write

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 204 No Content

DELETE /api/v1/ai/models/:id/cache

Clear the model's semantic cache

Removes all semantic-cache entries for this model. Future requests will miss the cache and generate fresh responses. Useful after changing the system prompt, model parameters, or cache configuration.

Scope: ai-gateway:write

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 200 OK

{
  "data": {
    "cleared": true,
    "entriesDeleted": 47
  },
  "meta": { "requestId": "req_abc123" }
}

Errors:

500 cache-error — Failed to clear cache; message contains the underlying error.

POST /api/v1/ai/models/:id/deploy

Deploy a self-hosted model

Deploys a self-hosted model to the Kubernetes cluster. Only applicable to models with type: "self-hosted".

Scope: ai-gateway:write

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID
`instance_type`	body	string	No	EC2 instance type to pin (e.g. `g5.xlarge`). Falls back to the model's `recommendedInstance`; required for GPU models when neither is set
`replicas`	body	number	No	Pod count when active (always-on holds at this; on-demand wakes 0 → this). Default `1`
`auto_shutdown_minutes`	body	number	No	`> 0` enables on-demand scale-to-zero after this many idle minutes
`env_vars`	body	object	No	Environment variables for the model container
`use_spot`	body	boolean	No	Schedule on spot (interruptible) capacity — up to 70% cheaper. Default `false`. See Spot Instance Support
`spot_fallback`	body	boolean	No	With `use_spot`, request spot as a scheduling preference and fall back to on-demand when spot capacity is unavailable. Default `true`; set `false` to pin strictly to spot

Request example:

{
  "instance_type": "g5.xlarge",
  "replicas": 4,
  "auto_shutdown_minutes": 10,
  "use_spot": true,
  "spot_fallback": true
}

Response: 200 OK

{
  "data": {
    "id": "model_abc123",
    "status": "deploying",
    "message": "Deployment initiated"
  },
  "meta": { "requestId": "req_abc123" }
}

POST /api/v1/ai/models/:id/start

Start a stopped model

Starts a previously stopped self-hosted model.

Scope: ai-gateway:write

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 200 OK

{
  "data": {
    "id": "model_abc123",
    "status": "deploying",
    "message": "Model starting"
  },
  "meta": { "requestId": "req_abc123" }
}

POST /api/v1/ai/models/:id/stop

Stop a running model

Stops a running self-hosted model, freeing cluster resources.

Scope: ai-gateway:write

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 200 OK

{
  "data": {
    "id": "model_abc123",
    "status": "stopped",
    "message": "Model stopped"
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/status

Get model deployment status

Returns the current deployment status and replica information for a self-hosted model.

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 200 OK

{
  "data": {
    "status": "active",
    "replicas": 2,
    "readyReplicas": 2,
    "conditions": [
      {
        "type": "Available",
        "status": "True",
        "lastTransitionTime": "2025-02-07T10:00:00.000Z",
        "reason": "MinimumReplicasAvailable",
        "message": "Deployment has minimum availability"
      }
    ]
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/metrics

Get model metrics

Returns performance and usage metrics for a model.

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 200 OK

{
  "data": {
    "totalRequests": 15420,
    "totalTokens": 2500000,
    "avgLatencyMs": 450,
    "errorRate": 0.02,
    "requestsPerMinute": 12.5,
    "uptimePercent": 99.8
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/logs

Get model logs

Returns container logs for a self-hosted model deployment.

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID
`lines`	query	integer	No	Number of log lines to return (default: 100)
`since`	query	string	No	Return logs since this time (ISO 8601 or duration like `1h`, `30m`)
`container`	query	string	No	Container name (for multi-container pods)

Response: 200 OK

{
  "data": {
    "logs": "2025-02-07T10:00:00Z INFO  Model loaded successfully\n2025-02-07T10:00:01Z INFO  Ready to serve requests on port 8080\n",
    "container": "model-server",
    "lines": 100
  },
  "meta": { "requestId": "req_abc123" }
}

GET /api/v1/ai/models/:id/permissions

Get model permissions

Returns sharing and ownership information for a model.

Scope: ai-gateway:read

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Response: 200 OK

{
  "data": {
    "owner": "user_abc123",
    "sharedWith": ["user_def456", "user_ghi789"],
    "isShared": true,
    "organizationId": "org_abc123"
  },
  "meta": { "requestId": "req_abc123" }
}

PUT /api/v1/ai/models/:id/permissions

Update model permissions

Updates sharing settings for a model. Only the model owner can modify permissions.

Scope: ai-gateway:write

Parameters:

Name	In	Type	Required	Description
`id`	path	string	Yes	Model ID

Request Body:

{
  "isShared": true,
  "sharedWith": ["user_def456", "user_ghi789"]
}

Field	Type	Required	Description
`isShared`	boolean	No	Whether the model is shared with other users
`sharedWith`	string[]	No	List of user IDs to share with

Response: 200 OK

{
  "data": {
    "owner": "user_abc123",
    "sharedWith": ["user_def456", "user_ghi789"],
    "isShared": true,
    "organizationId": "org_abc123"
  },
  "meta": { "requestId": "req_abc123" }
}

AIModel Object​

GET /api/v1/ai/models​

GET /api/v1/ai/models/overview​

GET /api/v1/ai/models/certified​

GET /api/v1/ai/models/providers​

GET /api/v1/ai/models/prebuilt​

POST /api/v1/ai/models​

GET /api/v1/ai/models/:id​

GET /api/v1/ai/models/:id/options​

PUT /api/v1/ai/models/:id​

DELETE /api/v1/ai/models/:id​

DELETE /api/v1/ai/models/:id/cache​

POST /api/v1/ai/models/:id/deploy​

POST /api/v1/ai/models/:id/start​

POST /api/v1/ai/models/:id/stop​

GET /api/v1/ai/models/:id/status​

GET /api/v1/ai/models/:id/metrics​

GET /api/v1/ai/models/:id/logs​

GET /api/v1/ai/models/:id/permissions​

PUT /api/v1/ai/models/:id/permissions​

AIModel Object

GET /api/v1/ai/models

GET /api/v1/ai/models/overview

GET /api/v1/ai/models/certified

GET /api/v1/ai/models/providers

GET /api/v1/ai/models/prebuilt

POST /api/v1/ai/models

GET /api/v1/ai/models/:id

GET /api/v1/ai/models/:id/options

PUT /api/v1/ai/models/:id

DELETE /api/v1/ai/models/:id

DELETE /api/v1/ai/models/:id/cache

POST /api/v1/ai/models/:id/deploy

POST /api/v1/ai/models/:id/start

POST /api/v1/ai/models/:id/stop

GET /api/v1/ai/models/:id/status

GET /api/v1/ai/models/:id/metrics

GET /api/v1/ai/models/:id/logs

GET /api/v1/ai/models/:id/permissions

PUT /api/v1/ai/models/:id/permissions