Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) overview

What is RAG?

Retrieval Augmented Generation (RAG) is a technique that combines information retrieval systems with generative AI models. Instead of relying solely on knowledge stored within the model, RAG first retrieves relevant external information and then uses it as context to generate more accurate and up-to-date responses.

Benefits:

  • Model can answer questions with up-to-date data (without retraining)
  • Reduces hallucinations by providing factual context
  • Can access private documents or internal knowledge bases

3 RAG Methods in Sapientia

Sapientia implements RAG through 3 data sources:

1. Search Internet

Retrieves information from the internet via Google Custom Search API, then performs web scraping of page content using Cheerio.

How It Works:

  1. Query is sent to Google Custom Search API
  2. System retrieves a list of search result URLs
  3. Each URL is scraped to extract content
  4. Content is combined and tagged with source references
  5. Sent to AI as additional context

Usage Examples:

  • "Find today's gold price"
  • "Get the latest news about AI"
  • "Jakarta weather information now"

2. Read File (Local Documents)

Reads content from PDF or DOCX files from the local system.

How It Works:

  1. User provides file path (e.g., C:\Users\Admin\notes.pdf)
  2. Text is extracted from the file
  3. Text is sent as context to AI

Usage Examples:

  • "Read the contents of C:\Documents\report.pdf"
  • "Summarize meeting.docx document"
  • "Find information about X in data.pdf file"

3. Memories Knowledge (Vector Database)

Retrieves information from vector database using semantic search based on embedding similarity.

How It Works:

  1. Files or text are uploaded → split into chunks
  2. Each chunk is converted into an embedding vector using local embedding model
  3. Vector embedding data is stored in vector database
  4. During query:
    • Query is converted into an embedding vector
    • Vector search finds top-5 most relevant chunks
    • Relevant chunks are sent as context to AI

Usage Examples:

  • "What have I saved about project X?"
  • "Find information from memories about topic Y"
  • "Summarize existing knowledge about Z"