From Zero to Knowledge Pipeline: OpenViking on AWS Lightsail

From Zero to Knowledge Pipeline: OpenViking on AWS Lightsail

Table of Contents

Most people building with AI agents hit the same wall eventually. Your agent forgets things between sessions. RAG retrieval surfaces the wrong chunks. You cannot tell why the agent picked what it picked. And every time you start a new session, you are re-explaining context that should already exist.

OpenViking is ByteDance’s open-source answer to this problem. It is not another vector database. It is a context database that treats everything (your documents, memories, agent skills) as a hierarchical filesystem under a viking:// protocol. Think of it as giving your agent a persistent, structured brain rather than a flat pile of text chunks.

The key ideas:

  • Everything gets a URI: viking://resources/, viking://user/memories/, viking://agent/skills/
  • Every resource is automatically summarised into three tiers: L0 (~100 token abstract), L1 (~2k token overview), L2 (full content)
  • Agents navigate by structure first, then search semantically within that structure
  • Sessions commit memories back, so the corpus gets smarter over time

The comparison that clicked for me: traditional RAG gives you a snapshot. OpenViking gives you a film.

Why Lightsail and Not the Full AWS Stack

For a POC, I did not want the complexity of ECS Fargate or the overhead of a full EC2 setup with IAM instance profiles and CodePipeline. Lightsail gives you a flat-rate instance ($20/month for 4GB RAM), a static IP, and enough headroom for OpenViking’s in-memory vector index.

The 4GB RAM requirement is not negotiable. OpenViking holds its vector index in memory, and a 2GB instance gets tight fast once the corpus grows.

Docker Compose makes the setup reproducible. The official compose file mounts two things into the container:

volumes:
  - /var/lib/openviking/ov.conf:/app/ov.conf
  - /var/lib/openviking/data:/app/data

Config goes to /app/ov.conf. Corpus persists at /app/data. Everything else is ephemeral.

What Actually Goes in ov.conf

OpenViking needs two model providers:

Embedding model: converts text to vectors for semantic search. text-embedding-3-large from OpenAI works well (3072 dimensions). This runs on every document you ingest.

VLM: generates the L0/L1 summaries. Use gpt-4o-mini here, not GPT-4. The VLM runs on ingestion, not retrieval. You want it cheap and fast, not premium. It is summarising your content, not reasoning about it.

{
  "server": {
    "host": "0.0.0.0",
    "port": 1933,
    "root_api_key": "your-secret-key"
  },
  "storage": {
    "workspace": "/app/data"
  },
  "embedding": {
    "dense": {
      "provider": "openai",
      "api_key": "sk-...",
      "model": "text-embedding-3-large",
      "dimension": 3072
    }
  },
  "vlm": {
    "provider": "openai",
    "api_key": "sk-...",
    "model": "gpt-4o-mini",
    "temperature": 0.1
  }
}

The storage.workspace must be /app/data (the container-side path, not the host path). Docker handles the mapping.

The Two Config Files That Will Trip You Up

There are two separate config files and this is where I wasted an hour. ov.conf is for the server. ovcli.conf is for the CLI client on your local machine. They live in different places and serve different purposes.

The CLI looks for ~/.openviking/ovcli.conf by default:

{
  "url": "http://your-server-ip:1933",
  "api_key": "your-secret-key"
}

The api_key here must match root_api_key in ov.conf exactly. If your server has no root_api_key set, do not include api_key in ovcli.conf at all. Having one when the server expects none causes a Missing API Key error even though you supplied one. Yes, that error message is backwards. Yes, I stared at it for twenty minutes.

Ingestion: The Path Problem Nobody Warns You About

This is where the documentation falls short. When you run ov add-resource /path/to/file.md, the CLI sends that path string to the server. The server tries to open it on its filesystem, inside the container. Your local path does not exist there.

The fix is a two-step upload flow that the API supports but the CLI has not fully exposed yet:

Step 1: Upload the file bytes to server temp storage

curl -X POST http://your-server:1933/api/v1/resources/temp_upload \
  -H "x-api-key: your-key" \
  -F "file=@/local/path/to/post.md"

Returns: {"temp_path": "/app/data/temp/upload/upload_abc123.md"}

Step 2: Ingest from the temp path

curl -X POST http://your-server:1933/api/v1/resources \
  -H "x-api-key: your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "temp_path": "/app/data/temp/upload/upload_abc123.md",
    "reason": "sjramblings.io blog post",
    "instruction": "Focus on technical opinions, AWS services mentioned, and key conclusions",
    "wait": true
  }'

The instruction field is genuinely useful. It tells the VLM how to frame the L0/L1 summaries, so your abstracts surface what you actually want to query for. For batch ingestion of all blog posts, a small Python script handles the two-step flow cleanly: upload each file, capture the temp_path, pass it to the ingest endpoint.

What Good Retrieval Actually Looks Like

After ingesting one post, the query results look like this:

ov find "personal AI infrastructure" --uri viking://resources/

context_type  uri                                          level  score   abstract
resource      viking://resources/Building_Your_PAI/.abstract  0   0.61    This directory contains resources focused on
                                                                           developing a Personal AI Infrastructure...
resource      viking://resources/Building_Your_PAI/section.md 2   0.59    The document serves as a guide to building
                                                                           a PAI that enhances human capabilities...

A few things worth noting. The level field tells you which tier matched: 0 is L0 abstract, 1 is L1 overview, 2 is L2 full content. The agent loads L0 first (cheap), decides if it is relevant, then drills to L2 only when needed. Scores in the 0.5-0.6 range are reasonable for a single-post corpus. As more posts are ingested and the corpus builds cross-document structure, scores for genuinely relevant results climb.

OpenViking splits your markdown post into sections by heading, each becoming a separately addressable node. A query for “human centred approach” can surface just that section of a post rather than the whole document. That is the token efficiency story in practice.

What Comes Next

This was purely the ingestion and retrieval validation layer. The interesting part starts when you wire this into a Claude Code skill:

  • A sjramblings-blog skill that tells Claude how to query the corpus before drafting new posts, ensuring consistency with existing content and voice
  • Session commits that extract what was useful after each writing session, feeding back into viking://user/memories/
  • The same infrastructure backing a broader signal pipeline: structured ingestion of external sources feeding into the same retrieval layer

The POC question was simple: is the signal real? After one post ingested and queried, the answer is yes. The retrieval finds the right content, the summaries are accurate, and the hierarchical structure means you are not burning tokens loading full documents to answer simple questions.

The full blog pipeline (ingesting all posts, building the skill, connecting it to Claude Code) is the next build.


Stack: AWS Lightsail $20/month, Docker Compose, OpenViking v0.2.x, OpenAI text-embedding-3-large + gpt-4o-mini, Python requests for batch ingestion

Share :

Related Posts

Agent Plugins Are the Future. But You Might Be Giving Away Your Best Engineering.

Agent Plugins Are the Future. But You Might Be Giving Away Your Best Engineering.

A few weeks ago AWS dropped Agent Plugins, a packaging model that bundles skills, MCP servers, hooks, and reference docs into installable units for AI coding agents. Two commands and your Claude Code or Cursor agent knows how to deploy to AWS, estimate costs, and generate IaC.

Read More
CloudWatch Logs Just Got an HTTP Endpoint. That Changes More Than You Think.

CloudWatch Logs Just Got an HTTP Endpoint. That Changes More Than You Think.

Every time I set up log shipping from a non-AWS source to CloudWatch, the same friction shows up. Install an agent. Configure IAM credentials. Implement SigV4 signing. Manage rotation. It works, but it is a lot of ceremony for “send this text to that place.”

Read More
AWS Config Just Added 30 Resource Types. The Bedrock AgentCore Ones Matter Most.

AWS Config Just Added 30 Resource Types. The Bedrock AgentCore Ones Matter Most.

AWS quietly announced support for 30 new resource types in AWS Config on March 2, 2026. If you’re the kind of person who skims these announcements and moves on, I get it. Most Config resource type expansions are incremental. This one isn’t.

Read More