Local API Server
Local API Server overview
The local API server provides HTTP access to AI models running in the Sapientia application. This server is compatible with OpenAI API format, enabling integration with various tools and external applications.
Server Configuration
Activating the Server
- Open the API Access menu in the application
- Toggle the Server Status switch to activate the server
- The server will run at
http://localhost:[port], with the default port being 1945
Server Status
The status indicator displays the server condition:
- Running (green): Server is active and ready to receive requests
- Stopped (gray): Server is inactive
API Endpoints
GET /v1/models
Retrieves information about the currently loaded model.
curl http://localhost:1945/v1/modelsPOST /v1/chat/completions
Endpoint for chat completion with streaming support.
Parameters:
messages: Text input or array of chat messagesstream: Boolean to enable streaming response (default: false)
Example (Streaming):
curl http://localhost:1945/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": "Introduce yourself.",
"stream": true
}'Example (Non-streaming):
curl http://localhost:1945/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": "Introduce yourself.",
"stream": false
}'POST /v1/embeddings
Generates vector embeddings from text input.
Parameters:
input: Text to be converted into embeddings
Example:
curl http://localhost:1945/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "The food was delicious and the waiter..."
}'Streaming Mode
Streaming mode enables real-time token reception as the model generates output. Useful for:
- Displaying responses progressively to users
- Reducing perceived latency
- Implementing responsive chat interfaces
When stream: false, the server returns the complete response after the entire output is generated.