Local API Server

Local API Server overview

The local API server provides HTTP access to AI models running in the Sapientia application. This server is compatible with OpenAI API format, enabling integration with various tools and external applications.


Server Configuration

Activating the Server

  1. Open the API Access menu in the application
  2. Toggle the Server Status switch to activate the server
  3. The server will run at http://localhost:[port], with the default port being 1945

Server Status

The status indicator displays the server condition:

  • Running (green): Server is active and ready to receive requests
  • Stopped (gray): Server is inactive

API Endpoints

GET /v1/models

Retrieves information about the currently loaded model.

curl http://localhost:1945/v1/models

POST /v1/chat/completions

Endpoint for chat completion with streaming support.

Parameters:

  • messages: Text input or array of chat messages
  • stream: Boolean to enable streaming response (default: false)

Example (Streaming):

curl http://localhost:1945/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": "Introduce yourself.",
    "stream": true
  }'

Example (Non-streaming):

curl http://localhost:1945/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": "Introduce yourself.",
    "stream": false
  }'

POST /v1/embeddings

Generates vector embeddings from text input.

Parameters:

  • input: Text to be converted into embeddings

Example:

curl http://localhost:1945/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter..."
  }'

Streaming Mode

Streaming mode enables real-time token reception as the model generates output. Useful for:

  • Displaying responses progressively to users
  • Reducing perceived latency
  • Implementing responsive chat interfaces

When stream: false, the server returns the complete response after the entire output is generated.