Retrieval-Augmented Generation (RAG)
What is Retrieval-Augmented Generation
- optimizing the output of a LLM
- referencing trusted knowledge base outside of the LLM training data
- combining pre-trained LLM with internal data source
Benefits
- Accurate and up-to-date information
- Cost effective way to use LLM for your organization
- More developer control
RAG Workflow
flowchart TD
A[User Query] --> B[Query Processing]
B --> C[Retrieve Relevant Documents]
C --> D[Knowledge Base/Vector Database]
D --> E[Retrieved Context]
E --> F[Augment Prompt]
A --> F
F --> G[LLM Processing]
G --> H[Generated Response]
H --> I[Final Answer to User]
style A fill:#4a90e2,color:#fff
style D fill:#f39c12,color:#fff
style G fill:#9b59b6,color:#fff
style I fill:#27ae60,color:#fff
How RAG Works
- User Query: User submits a question or request
- Query Processing: The query is processed and converted into a searchable format (often vector embeddings)
- Retrieve Relevant Documents: Search the knowledge base for relevant information
- Knowledge Base: Internal documents, databases, or vector stores containing organizational data
- Retrieved Context: Relevant documents/snippets are extracted
- Augment Prompt: Combine user query with retrieved context
- LLM Processing: The augmented prompt is sent to the LLM
- Generated Response: LLM generates an answer based on both its training and the retrieved context
- Final Answer: User receives an accurate, context-aware response
Source: AWS - RAG