
From Zero to Knowledge Pipeline: OpenViking on AWS Lightsail
- Stephen Jones
- Aws , Ai
- March 18, 2026
Table of Contents
Most people building with AI agents hit the same wall eventually. Your agent forgets things between sessions. RAG retrieval surfaces the wrong chunks. You cannot tell why the agent picked what it picked. And every time you start a new session, you are re-explaining context that should already exist.
OpenViking is ByteDance’s open-source answer to this problem. It is not another vector database. It is a context database that treats everything (your documents, memories, agent skills) as a hierarchical filesystem under a viking:// protocol. Think of it as giving your agent a persistent, structured brain rather than a flat pile of text chunks.
The key ideas:
- Everything gets a URI:
viking://resources/,viking://user/memories/,viking://agent/skills/ - Every resource is automatically summarised into three tiers: L0 (~100 token abstract), L1 (~2k token overview), L2 (full content)
- Agents navigate by structure first, then search semantically within that structure
- Sessions commit memories back, so the corpus gets smarter over time
The comparison that clicked for me: traditional RAG gives you a snapshot. OpenViking gives you a film.
Why Lightsail and Not the Full AWS Stack
For a POC, I did not want the complexity of ECS Fargate or the overhead of a full EC2 setup with IAM instance profiles and CodePipeline. Lightsail gives you a flat-rate instance ($20/month for 4GB RAM), a static IP, and enough headroom for OpenViking’s in-memory vector index.
The 4GB RAM requirement is not negotiable. OpenViking holds its vector index in memory, and a 2GB instance gets tight fast once the corpus grows.
Docker Compose makes the setup reproducible. The official compose file mounts two things into the container:
volumes:
- /var/lib/openviking/ov.conf:/app/ov.conf
- /var/lib/openviking/data:/app/data
Config goes to /app/ov.conf. Corpus persists at /app/data. Everything else is ephemeral.
What Actually Goes in ov.conf
OpenViking needs two model providers:
Embedding model: converts text to vectors for semantic search. text-embedding-3-large from OpenAI works well (3072 dimensions). This runs on every document you ingest.
VLM: generates the L0/L1 summaries. Use gpt-4o-mini here, not GPT-4. The VLM runs on ingestion, not retrieval. You want it cheap and fast, not premium. It is summarising your content, not reasoning about it.
{
"server": {
"host": "0.0.0.0",
"port": 1933,
"root_api_key": "your-secret-key"
},
"storage": {
"workspace": "/app/data"
},
"embedding": {
"dense": {
"provider": "openai",
"api_key": "sk-...",
"model": "text-embedding-3-large",
"dimension": 3072
}
},
"vlm": {
"provider": "openai",
"api_key": "sk-...",
"model": "gpt-4o-mini",
"temperature": 0.1
}
}
The storage.workspace must be /app/data (the container-side path, not the host path). Docker handles the mapping.
The Two Config Files That Will Trip You Up
There are two separate config files and this is where I wasted an hour. ov.conf is for the server. ovcli.conf is for the CLI client on your local machine. They live in different places and serve different purposes.
The CLI looks for ~/.openviking/ovcli.conf by default:
{
"url": "http://your-server-ip:1933",
"api_key": "your-secret-key"
}
The api_key here must match root_api_key in ov.conf exactly. If your server has no root_api_key set, do not include api_key in ovcli.conf at all. Having one when the server expects none causes a Missing API Key error even though you supplied one. Yes, that error message is backwards. Yes, I stared at it for twenty minutes.
Ingestion: The Path Problem Nobody Warns You About
This is where the documentation falls short. When you run ov add-resource /path/to/file.md, the CLI sends that path string to the server. The server tries to open it on its filesystem, inside the container. Your local path does not exist there.
The fix is a two-step upload flow that the API supports but the CLI has not fully exposed yet:
Step 1: Upload the file bytes to server temp storage
curl -X POST http://your-server:1933/api/v1/resources/temp_upload \
-H "x-api-key: your-key" \
-F "file=@/local/path/to/post.md"
Returns: {"temp_path": "/app/data/temp/upload/upload_abc123.md"}
Step 2: Ingest from the temp path
curl -X POST http://your-server:1933/api/v1/resources \
-H "x-api-key: your-key" \
-H "Content-Type: application/json" \
-d '{
"temp_path": "/app/data/temp/upload/upload_abc123.md",
"reason": "sjramblings.io blog post",
"instruction": "Focus on technical opinions, AWS services mentioned, and key conclusions",
"wait": true
}'
The instruction field is genuinely useful. It tells the VLM how to frame the L0/L1 summaries, so your abstracts surface what you actually want to query for. For batch ingestion of all blog posts, a small Python script handles the two-step flow cleanly: upload each file, capture the temp_path, pass it to the ingest endpoint.
What Good Retrieval Actually Looks Like
After ingesting one post, the query results look like this:
ov find "personal AI infrastructure" --uri viking://resources/
context_type uri level score abstract
resource viking://resources/Building_Your_PAI/.abstract 0 0.61 This directory contains resources focused on
developing a Personal AI Infrastructure...
resource viking://resources/Building_Your_PAI/section.md 2 0.59 The document serves as a guide to building
a PAI that enhances human capabilities...
A few things worth noting. The level field tells you which tier matched: 0 is L0 abstract, 1 is L1 overview, 2 is L2 full content. The agent loads L0 first (cheap), decides if it is relevant, then drills to L2 only when needed. Scores in the 0.5-0.6 range are reasonable for a single-post corpus. As more posts are ingested and the corpus builds cross-document structure, scores for genuinely relevant results climb.
OpenViking splits your markdown post into sections by heading, each becoming a separately addressable node. A query for “human centred approach” can surface just that section of a post rather than the whole document. That is the token efficiency story in practice.
What Comes Next
This was purely the ingestion and retrieval validation layer. The interesting part starts when you wire this into a Claude Code skill:
- A
sjramblings-blogskill that tells Claude how to query the corpus before drafting new posts, ensuring consistency with existing content and voice - Session commits that extract what was useful after each writing session, feeding back into
viking://user/memories/ - The same infrastructure backing a broader signal pipeline: structured ingestion of external sources feeding into the same retrieval layer
The POC question was simple: is the signal real? After one post ingested and queried, the answer is yes. The retrieval finds the right content, the summaries are accurate, and the hierarchical structure means you are not burning tokens loading full documents to answer simple questions.
The full blog pipeline (ingesting all posts, building the skill, connecting it to Claude Code) is the next build.
Stack: AWS Lightsail $20/month, Docker Compose, OpenViking v0.2.x, OpenAI text-embedding-3-large + gpt-4o-mini, Python requests for batch ingestion


