From Zero to Knowledge Pipeline: OpenViking on AWS Lightsail

From Zero to Knowledge Pipeline: OpenViking on AWS Lightsail

Table of Contents

Most people building with AI agents hit the same wall eventually. Your agent forgets things between sessions. RAG retrieval surfaces the wrong chunks. You cannot tell why the agent picked what it picked. And every time you start a new session, you are re-explaining context that should already exist.

OpenViking is ByteDance’s open-source answer to this problem. It is not another vector database. It is a context database that treats everything (your documents, memories, agent skills) as a hierarchical filesystem under a viking:// protocol. Think of it as giving your agent a persistent, structured brain rather than a flat pile of text chunks.

The key ideas:

  • Everything gets a URI: viking://resources/, viking://user/memories/, viking://agent/skills/
  • Every resource is automatically summarised into three tiers: L0 (~100 token abstract), L1 (~2k token overview), L2 (full content)
  • Agents navigate by structure first, then search semantically within that structure
  • Sessions commit memories back, so the corpus gets smarter over time

The comparison that clicked for me: traditional RAG gives you a snapshot. OpenViking gives you a film.

Why Lightsail and Not the Full AWS Stack

For a POC, I did not want the complexity of ECS Fargate or the overhead of a full EC2 setup with IAM instance profiles and CodePipeline. Lightsail gives you a flat-rate instance ($20/month for 4GB RAM), a static IP, and enough headroom for OpenViking’s in-memory vector index.

The 4GB RAM requirement is not negotiable. OpenViking holds its vector index in memory, and a 2GB instance gets tight fast once the corpus grows.

Docker Compose makes the setup reproducible. The official compose file mounts two things into the container:

volumes:
  - /var/lib/openviking/ov.conf:/app/ov.conf
  - /var/lib/openviking/data:/app/data

Config goes to /app/ov.conf. Corpus persists at /app/data. Everything else is ephemeral.

What Actually Goes in ov.conf

OpenViking needs two model providers:

Embedding model: converts text to vectors for semantic search. text-embedding-3-large from OpenAI works well (3072 dimensions). This runs on every document you ingest.

VLM: generates the L0/L1 summaries. Use gpt-4o-mini here, not GPT-4. The VLM runs on ingestion, not retrieval. You want it cheap and fast, not premium. It is summarising your content, not reasoning about it.

{
  "server": {
    "host": "0.0.0.0",
    "port": 1933,
    "root_api_key": "your-secret-key"
  },
  "storage": {
    "workspace": "/app/data"
  },
  "embedding": {
    "dense": {
      "provider": "openai",
      "api_key": "sk-...",
      "model": "text-embedding-3-large",
      "dimension": 3072
    }
  },
  "vlm": {
    "provider": "openai",
    "api_key": "sk-...",
    "model": "gpt-4o-mini",
    "temperature": 0.1
  }
}

The storage.workspace must be /app/data (the container-side path, not the host path). Docker handles the mapping.

The Two Config Files That Will Trip You Up

There are two separate config files and this is where I wasted an hour. ov.conf is for the server. ovcli.conf is for the CLI client on your local machine. They live in different places and serve different purposes.

The CLI looks for ~/.openviking/ovcli.conf by default:

{
  "url": "http://your-server-ip:1933",
  "api_key": "your-secret-key"
}

The api_key here must match root_api_key in ov.conf exactly. If your server has no root_api_key set, do not include api_key in ovcli.conf at all. Having one when the server expects none causes a Missing API Key error even though you supplied one. Yes, that error message is backwards. Yes, I stared at it for twenty minutes.

Ingestion: The Path Problem Nobody Warns You About

This is where the documentation falls short. When you run ov add-resource /path/to/file.md, the CLI sends that path string to the server. The server tries to open it on its filesystem, inside the container. Your local path does not exist there.

The fix is a two-step upload flow that the API supports but the CLI has not fully exposed yet:

Step 1: Upload the file bytes to server temp storage

curl -X POST http://your-server:1933/api/v1/resources/temp_upload \
  -H "x-api-key: your-key" \
  -F "file=@/local/path/to/post.md"

Returns: {"temp_path": "/app/data/temp/upload/upload_abc123.md"}

Step 2: Ingest from the temp path

curl -X POST http://your-server:1933/api/v1/resources \
  -H "x-api-key: your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "temp_path": "/app/data/temp/upload/upload_abc123.md",
    "reason": "sjramblings.io blog post",
    "instruction": "Focus on technical opinions, AWS services mentioned, and key conclusions",
    "wait": true
  }'

The instruction field is genuinely useful. It tells the VLM how to frame the L0/L1 summaries, so your abstracts surface what you actually want to query for. For batch ingestion of all blog posts, a small Python script handles the two-step flow cleanly: upload each file, capture the temp_path, pass it to the ingest endpoint.

What Good Retrieval Actually Looks Like

After ingesting one post, the query results look like this:

ov find "personal AI infrastructure" --uri viking://resources/

context_type  uri                                          level  score   abstract
resource      viking://resources/Building_Your_PAI/.abstract  0   0.61    This directory contains resources focused on
                                                                           developing a Personal AI Infrastructure...
resource      viking://resources/Building_Your_PAI/section.md 2   0.59    The document serves as a guide to building
                                                                           a PAI that enhances human capabilities...

A few things worth noting. The level field tells you which tier matched: 0 is L0 abstract, 1 is L1 overview, 2 is L2 full content. The agent loads L0 first (cheap), decides if it is relevant, then drills to L2 only when needed. Scores in the 0.5-0.6 range are reasonable for a single-post corpus. As more posts are ingested and the corpus builds cross-document structure, scores for genuinely relevant results climb.

OpenViking splits your markdown post into sections by heading, each becoming a separately addressable node. A query for “human centred approach” can surface just that section of a post rather than the whole document. That is the token efficiency story in practice.

What Comes Next

This was purely the ingestion and retrieval validation layer. The interesting part starts when you wire this into a Claude Code skill:

  • A sjramblings-blog skill that tells Claude how to query the corpus before drafting new posts, ensuring consistency with existing content and voice
  • Session commits that extract what was useful after each writing session, feeding back into viking://user/memories/
  • The same infrastructure backing a broader signal pipeline: structured ingestion of external sources feeding into the same retrieval layer

The POC question was simple: is the signal real? After one post ingested and queried, the answer is yes. The retrieval finds the right content, the summaries are accurate, and the hierarchical structure means you are not burning tokens loading full documents to answer simple questions.

The full blog pipeline (ingesting all posts, building the skill, connecting it to Claude Code) is the next build.


Stack: AWS Lightsail $20/month, Docker Compose, OpenViking v0.2.x, OpenAI text-embedding-3-large + gpt-4o-mini, Python requests for batch ingestion

Share :

Related Posts

Cost-Effective Workflow Automation: Deploying n8n on Amazon Lightsail

Cost-Effective Workflow Automation: Deploying n8n on Amazon Lightsail

Recently I’ve been trying out n8n as a workflow automation tool and I’m really enjoying the flexibility it offers. Of course, being an AWS Community Builder I would naturally run this on AWS Fargate as the n8n software is available as a container, however to keep the costs down I ended up running it on Amazon Lightsail.

Read More
From Network Plumbing to Application Intent: What AWS Networking Reveals About Infrastructure's New Role

From Network Plumbing to Application Intent: What AWS Networking Reveals About Infrastructure's New Role

Rob Kennedy, AWS Vice President of Network Services, opened his re:Invent 2025 keynote with a simple metaphor: atoms bond into molecules, molecules combine into structures, and those structures become complex organisms. The implication was clear, networking is no longer about connectivity. It’s about intent.

Read More
PageIndex Deep Dive: The Good, The Bad, and The Ugly of Vectorless RAG

PageIndex Deep Dive: The Good, The Bad, and The Ugly of Vectorless RAG

What if everything we know about RAG is built on a flawed assumption?

Read More