RAG Technology — How AI Buddha Zen Cites 10,000+ Scripture Verses

1 What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that enhances AI responses by first retrieving relevant information from a curated database, then using that information to generate accurate, grounded answers. Unlike standard AI chatbots that rely solely on training data, RAG ensures every response is backed by verifiable source material.

Why RAG matters for religious AI

Religious AI faces a unique challenge: hallucination in sacred texts. A standard AI might fabricate scripture quotes that sound authentic but don't actually exist. This is unacceptable in a religious context. RAG solves this by constraining the AI to cite only from verified, real scripture verses in our database.

	Standard AI Chatbot	AI Buddha Zen (RAG)
Knowledge source	Training data (static)	10,023 verified scripture verses
Citation accuracy	May fabricate quotes	Every quote traceable to source
Verifiability	Difficult to verify	Scripture name, chapter, verse #
Repetition control	None	Last 30 quotes excluded

2 5-Step RAG Pipeline

When you send a message to AI Buddha Zen, it goes through a 5-step pipeline before generating a response. This process takes about 3-5 seconds.

Your message ↓ Step 1: Theme Detection (20 themes + 44 bridge tags) ↓ Step 2: Recently-Seen Exclusion (last 30 quotes) ↓ Step 3: 5-Candidate Retrieval (confidence + priority + randomization) ↓ Step 4: AI Selection (Claude picks the best 1-2 from 5 candidates) ↓ Step 5: Response Generation (empathy + scripture quote + practical advice)

Step 1: Theme Detection

Your message is analyzed against 20 theme categories and 44 "bridge tags" — trigger phrases that help map everyday language to Buddhist concepts.

Example: "I can't sleep because of work stress" → Themes detected: anxiety work mindfulness

📋 All 20 themes

suffering · impermanence · anger · attachment · compassion · wisdom · emptiness · karma · mindfulness · relationship · death · anxiety · work · happiness · self · loneliness · craving · gratitude · aging · family

Step 2: Recently-Seen Exclusion

To prevent the same verse from being shown repeatedly, the system checks the last 30 quotes shown to you (stored in rag_usage_log). These are excluded from the candidate pool, ensuring you encounter a wide variety of the 10,000+ verses in our database.

Step 3: 5-Candidate Retrieval

From the remaining verses matching the detected themes, 5 candidates are selected using a priority system:

direct Directly cited from Pali Canon with verified source (highest priority)

aligned Aligned with scripture teaching, paraphrased or summarized

reference Reference material — cited with cautious language ("it is said that...")

A randomization factor (rand_order shift) ensures that even within the same theme and confidence level, different verses appear each time.

Step 4: AI Selection

The 5 candidates, complete with Pali text, Japanese/English translation, source information, and confidence level, are injected into the AI's prompt. Claude (Anthropic) then selects the 1-2 verses that best resonate with your specific concern.

━━━ Scripture Reference Data (RAG) ━━━ Below are 5 candidate verses. Select the 1-2 most relevant. 【STRICT】Quote the English translation directly. 【STRICT】Copy the source name exactly for the Reference line. 【STRICT】Do NOT fabricate verses not listed below. 【STRICT】You MUST cite at least one verse. 【Scripture 1】Dhammapada, Chapter 1, Twin Verses (Verse 1) Confidence: direct Pāli: Manopubbaṅgamā dhammā... English translation (QUOTE THIS): "All things are preceded by mind..." ...

Step 5: Response Generation

The AI generates a response following a structured format:

① Empathy — Acknowledge the seeker's feelings (1-2 sentences)

② Wisdom — Scripture quote + explanation (3-5 sentences)

③ Practice — Concrete suggestion (1-2 sentences)

④ Reference — Scripture name, chapter, verse number

3 Scripture Database

AI Buddha Zen's database contains 10,023 verses from 18 Buddhist scriptures, primarily from the Pali Canon (the oldest surviving Buddhist texts).

Scripture	Verses	Source
Dhammapada /	423	Khuddaka Nikāya
Therāgāthā /	1,279	Khuddaka Nikāya
Therīgāthā /	494	Khuddaka Nikāya
Sutta Nipāta /	1,149	Khuddaka Nikāya
Aṅguttara Nikāya /	855	Sutta Piṭaka
Saṃyutta Nikāya /	1,132	Sutta Piṭaka
+ 12 more scriptures...
Total	10,023

Data Structure Per Verse

{ "id": 1, "canon": "dhammapada", "source_ja": "Dhammapada Ch.1 Twin Verses (Verse 1)", "pali": "Manopubbaṅgamā dhammā, manoseṭṭhā manomayā...", "original": "All things are preceded by mind, led by mind, created by mind...", "japanese": "(Japanese translation)", "theme": "wisdom", "sub_themes": "self,mindfulness", "confidence_type": "direct", "keywords": "mind,heart,thought,creation" }

Source: SuttaCentral (CC0 license) — Bhikkhu Sujato English translations + Mahāsaṅgīti Pāli text.

4 Hallucination Prevention

AI Buddha Zen employs 5 layers of hallucination prevention to ensure no fabricated scripture quotes reach users:

🔒

Closed Canon RAG

The AI can only cite from the 10,023 verified verses in our database. It cannot search the internet or generate quotes from training data.

📋

Verbatim Quoting

The prompt instructs: "Quote the Japanese/English translation EXACTLY as provided. Do NOT paraphrase or re-translate."

⚠️

Confidence Labels

Each verse is tagged as "direct", "aligned", or "reference". Lower-confidence verses are cited with hedging language ("it is said that...").

🚫

Fabrication Block

The prompt explicitly states: "Do NOT fabricate verses not listed in the 5 candidates below."

✅

Mandatory Citation

The AI is required to cite at least 1 verse from the candidates. If it cannot find a relevant verse, it says "this is not in our database" rather than inventing one.

5 Safety Framework (CAP-SRP v2.0)

Beyond RAG, AI Buddha Zen implements a multi-layered safety framework based on clinical psychology and religious AI ethics research.

Risk Category	Detection	Action	Reference
🚨 Suicide Risk (3-tier)	C-SSRS Tier 1-3	Tier 2-3: Crisis intervention Tier 1: Empathetic response	Posner et al. (2011)
🧘 Spiritual Bypassing	Distress + avoidance co-occurrence	Suppress superficial spiritual comfort	SBS-13 (Fox et al. 2017)
🔗 AI Dependency	Message + usage pattern	Encourage human connections	AMDF (2026)
🔐 Privacy	SHA-256 + HMAC	Message text never stored	CAP-SRP Spec

→ Learn more about safety features

6 Comparison with BuddhaBot-Plus (Kyoto University)

Kyoto University's BuddhaBot-Plus, led by Professor Seiji Kumagai, pioneered the "source-first architecture" for Buddhist AI. AI Buddha Zen shares this RAG-based approach while adding consumer-facing features and safety layers.

	BuddhaBot-Plus	AI Buddha Zen
Developer	Kyoto University	VeritasChain Inc.
Architecture	Source-first RAG	Source-first RAG
Scripture DB	~3,000 verses	10,023 verses
Platform	Research prototype	LINE Bot + iOS App
Safety framework	ELSI	CAP-SRP v2.0 (C-SSRS + SBS-13 + AMDF)
Access	Not public	Free, public

Note: BuddhaBot-Plus is an academic research project with different goals. This comparison is for technical context, not competitive positioning.

🪷 Experience RAG-Powered Buddhist Wisdom

Try AI Buddha Zen for free. Every response cites the actual scripture name, chapter, and verse number.

💬 Add on LINE (Free)

Download iOS App