Documentation
Enterprise

Getting Started

Workflow Builder

Performance & Monitoring

Local API Server

Troubleshooting

Tips and Tricks

Tips and Tricks overview

Tips and Tricks

Practical guide to optimize performance and user experience of the Sapientia application.

RAM Usage Optimization

Reducing Context Length

Reducing context length is an effective way to conserve RAM and increase inference speed:

Low Context Length (2048-4096): Suitable for short chats, Q&A, or systems with limited RAM (4-6 GB)
Medium Context Length (8192-16384): Ideal for long conversations and document analysis with 8-12 GB RAM
High Context Length (32768+): For large document processing and multi-turn conversations, requires 16 GB RAM or more

Reducing context length significantly decreases memory consumption and speeds up response time.

Choosing the Right AI Model

Based on Use Case

General Purpose & Chat:

Choose conversational models like Gemma, Llama, Mistral, or Qwen
Prioritize models with instruction-tuning

Coding & Development:

Use coding-specific models like CodeLlama, DeepSeek-Coder, or Qwen-Coder
Models with larger context length for large codebases

Analysis & Reasoning:

Choose models with strong reasoning capabilities
Consider larger models for optimal accuracy

Multilingual:

Select models that explicitly support your target language
Models like Gemma, Qwen, or Aya have better multilingual support

Model Selection Tips

Test various quantizations: Start with Q4 and gradually increase if RAM is sufficient
Monitor performance metrics: Pay attention to TTFT and TPS to assess model suitability
Consider tradeoffs: Larger models = more accurate but slower and require more resources
Update regularly: New models often offer better performance with the same size

On this page

Tips and Tricks RAM Usage Optimization Reducing Context Length Choosing the Right AI Model Based on Use Case Model Selection Tips

Sapientia

Run Large Language Models entirely on your device. Sapientia brings AI capabilities to your local environment with complete privacy, offline functionality, and visual workflow orchestration for intelligent agent systems.

Quick Links

Home
Documentation

Resources

Privacy Policy
Term of use

Contact Us

sapientia@godiscus.com

Instagram LinkedIn GitHub Twitter / X

©2026 Sapientia Project by GoDiscus. Made with by Sapientia Team.