Search Engine
Late November 2025
My very first project — a full search engine from the ground up. Keyword indexing, ranking algorithms, and a clean frontend.
What is it?
A Wikipedia RAG search engine — users submit a query, the backend searches ~100K Wikipedia articles stored in Typesense, pulls context, and streams a natural language answer from Gemini 2.5 Flash or Groq Llama 3.3-70B. Two modes: instant keyword search (no AI) and full RAG ask mode.
How it works
FastAPI exposes two endpoints: GET /search returns raw Typesense hits instantly for autocomplete-style search. GET /ask retrieves the top 3 context snippets from Typesense (each truncated to 1200 bytes), builds a RAG prompt, and calls Gemini first, falling back to Groq if Gemini is unavailable. Deployed as a systemd service on a GCP VM — not Cloud Run.
The dataset: from 1M docs to 100K
The original Wikipedia dump had ~1 million articles. Indexing all of them into Typesense on a standard GCP VM hit memory limits — Typesense couldn't hold the full index in RAM. The solution: rebuild the collection using only the top 100K most-linked articles.
This involved: wiki_indexer.py to ingest, reduce_typesense_export.py and reduce_to_100k.sh to filter, run_typesense_lowmem.sh to start Typesense with constrained memory, and a rebuild from a v1 collection (wiki_search) to a new v2 collection (wiki_docs_v2). The old indexer.log is still in the repo from the original failed full-index run.
RAG architecture
RAG (Retrieval-Augmented Generation) grounds the LLM's answer in real documents instead of training data alone. The flow: search Typesense for the query → take top 3 results → inject their content into the prompt as context → ask the LLM to answer using only that context.
Context is limited to 1200 bytes per document (3 docs = ~3600 bytes of context total). This keeps prompts short and cheap while covering the most relevant content. Gemini 2.5 Flash is the primary model — large context window, fast, low cost. Groq (llama-3.3-70b-versatile) is the fallback.
Why systemd instead of Cloud Run?
The Typesense instance runs on the same GCP VM as the FastAPI app. Cloud Run is stateless — it can't host a local database. Moving Typesense to a separate Cloud Run-incompatible managed service would add cost and latency. A GCP VM with systemd gives persistent storage, a local Typesense socket, and always-on availability with Restart=always. The tradeoff: manual scaling and no zero-cost idle.
Key takeaways
- RAG pipeline: how to combine keyword retrieval with LLM generation for grounded answers
- Typesense collection management: schema definition, memory constraints, rebuilding collections
- Dataset size vs infrastructure limits — reducing 1M Wikipedia docs to fit in VM RAM
- Systemd service deployment: unit files, Restart=always, running on GCP VMs
- Gemini + Groq failover pattern for LLM availability