[ §005 — path ii · the machine ]

The machine
runs here.

AI is not a distant API call anymore. Cloudflare Workers AI puts Llama, Mistral, ResNet, and embedding models on GPUs inside every major data center on Earth. Your prompts run at the same edge serving this page.

> model: @cf/meta/llama-3.1-8b-instruct
> inference: on-edge GPU
> openai key: not required
> monthly cost: ~$0 at this scale

Three works. Each one talks to a different model.

> awaiting input

work i · 001

The Oracle

Streaming chat with Llama 3.1 8B. Tokens arrive as the model generates them. No external API — the whole conversation lives on the edge.

streamingllama 3.1

work ii · 002

Sight

Drop an image, get classification tags and a caption from edge vision models. Files land in R2, never leave the Cloudflare network.

visionlive

work iii · 003

Ghost in the corpus

Semantic search over a knowledge base using edge embeddings, D1-stored vectors, and Llama synthesis. Ask anything; the model finds relevant passages and answers.

raglive

work iv · 004

The sentinel

AI contact triage. Llama classifies every message (sales / support / partnership / spam), rates urgency, drafts a reply. D1-backed. The inbox that reads itself.

ai + d1live

The machineruns here.

The machine
runs here.