[ §005 — path ii · the machine ]

The machine
runs here.

AI is not a distant API call anymore. Cloudflare Workers AI puts Llama, Mistral, ResNet, and embedding models on GPUs inside every major data center on Earth. Your prompts run at the same edge serving this page.

> model: @cf/meta/llama-3.1-8b-instruct
> inference: on-edge GPU
> openai key: not required
> monthly cost: ~$0 at this scale

Three works. Each one talks to a different model.

> awaiting input
work i · 001
The Oracle
Streaming chat with Llama 3.1 8B. Tokens arrive as the model generates them. No external API — the whole conversation lives on the edge.
streamingllama 3.1
work ii · 002
Sight
Drop an image, get classification tags and a caption from edge vision models. Files land in R2, never leave the Cloudflare network.
visionlive
work iii · 003
Ghost in the corpus
Semantic search over a knowledge base using edge embeddings, D1-stored vectors, and Llama synthesis. Ask anything; the model finds relevant passages and answers.
raglive
work iv · 004
The sentinel
AI contact triage. Llama classifies every message (sales / support / partnership / spam), rates urgency, drafts a reply. D1-backed. The inbox that reads itself.
ai + d1live