A small retrieval-augmented-generation demo. Twelve paragraphs about Cloudflare's edge platform sit in a D1 table, each with a 768-dimensional embedding stored as a BLOB. Your question is embedded in-flight, matched against the corpus by cosine similarity, and the top three hits are handed to Llama 3.1 as grounding context.
Try: "What's the difference between KV and D1?" or "How do Pages Functions route requests?"
Stack: Workers AI bge-base-en-v1.5 for 768-d embeddings, D1 for corpus + vector storage (as Float32Array blobs), cosine similarity in the Pages Function, Llama 3.1 8B for synthesis. The response streams over SSE; the first event is the sources list, every subsequent event is a token.