Retrieval-Augmented Generation (RAG)

What is Retrieval-Augmented Generation

optimizing the output of a LLM
referencing trusted knowledge base outside of the LLM training data
combining pre-trained LLM with internal data source

Benefits

Accurate and up-to-date information
Cost effective way to use LLM for your organization
More developer control

RAG Workflow

flowchart TD
    A[User Query] --> B[Query Processing]
    B --> C[Retrieve Relevant Documents]
    C --> D[Knowledge Base/Vector Database]
    D --> E[Retrieved Context]
    E --> F[Augment Prompt]
    A --> F
    F --> G[LLM Processing]
    G --> H[Generated Response]
    H --> I[Final Answer to User]

    style A fill:#4a90e2,color:#fff
    style D fill:#f39c12,color:#fff
    style G fill:#9b59b6,color:#fff
    style I fill:#27ae60,color:#fff

How RAG Works

User Query: User submits a question or request
Query Processing: The query is processed and converted into a searchable format (often vector embeddings)
Retrieve Relevant Documents: Search the knowledge base for relevant information
Knowledge Base: Internal documents, databases, or vector stores containing organizational data
Retrieved Context: Relevant documents/snippets are extracted
Augment Prompt: Combine user query with retrieved context
LLM Processing: The augmented prompt is sent to the LLM
Generated Response: LLM generates an answer based on both its training and the retrieved context
Final Answer: User receives an accurate, context-aware response

Source: AWS - RAG