Local API Server

The local API server provides HTTP access to AI models running in the Sapientia application. This server is compatible with OpenAI API format, enabling integration with various tools and external applications.

Server Configuration

Activating the Server

Open the API Access menu in the application
Toggle the Server Status switch to activate the server
The server will run at http://localhost:[port], with the default port being 1945

Server Status

The status indicator displays the server condition:

Running (green): Server is active and ready to receive requests
Stopped (gray): Server is inactive

API Endpoints

GET /v1/models

Retrieves information about the currently loaded model.

curl http://localhost:1945/v1/models

POST /v1/chat/completions

Endpoint for chat completion with streaming support.

Parameters:

messages: Text input or array of chat messages
stream: Boolean to enable streaming response (default: false)

Example (Streaming):

curl http://localhost:1945/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": "Introduce yourself.",
    "stream": true
  }'

Example (Non-streaming):

curl http://localhost:1945/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": "Introduce yourself.",
    "stream": false
  }'

POST /v1/embeddings

Generates vector embeddings from text input.

Parameters:

input: Text to be converted into embeddings

Example:

curl http://localhost:1945/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter..."
  }'

Streaming Mode

Streaming mode enables real-time token reception as the model generates output. Useful for:

Displaying responses progressively to users
Reducing perceived latency
Implementing responsive chat interfaces

When stream: false, the server returns the complete response after the entire output is generated.

On this page