Vespa cluster - Primary components

 In a Vespa cluster, the primary components you define in services.xml are:

  1. <admin> – for management and control.

  2. <container> – for serving HTTP (queries/feeds/APIs) and stateless services.

  3. <content> – for stateful services like document storage, indexing, and search.

Let’s deep dive into each with purpose, internal structure, and advanced configuration.


✅ 1. <admin> Component – Management and Monitoring

🔷 Purpose:

The admin component configures admin server, monitoring, metrics, and cluster orchestration (e.g., config servers, cluster controllers).

🔷 Basic Example:

<admin version="2.0">
  <adminserver hostalias="node1"/>
</admin>


🔷 Advanced Configuration:

<admin version="2.0">
  <adminserver hostalias="node1"/>
 
  <configservers>
    <configserver hostalias="node1"/>
    <configserver hostalias="node2"/>
    <configserver hostalias="node3"/>
  </configservers>

  <cluster-controllers>
    <cluster-controller hostalias="node1"/>
    <cluster-controller hostalias="node2"/>
  </cluster-controllers>

  <metrics>
    <consumer id="my-prometheus">
      <metric name="memory.usage"/>
      <metric name="cpu.util"/>
    </consumer>
  </metrics>
</admin>


🔷 Internal Roles:

Sub-component

Purpose

adminserver

Single node responsible for admin UI and coordination

configservers

Config distribution across nodes (required in multi-node setup)

cluster-controllers

Supervises content nodes (health, state mgmt)

metrics

Exposes metrics to Prometheus/Telegraf/etc.


✅ 2. <container> Component – Stateless HTTP/Query/Feed/Processing Engine

🔷 Purpose:

Runs:

  • Query services (REST/JSON)

  • Feed ingestion

  • Document processing

  • Custom HTTP applications

  • Search and ranking logic

🔷 Basic Example:

<container id="default" version="1.0">
  <search/>
  <document-api/>
  <nodes>
    <node hostalias="node1"/>
  </nodes>
</container>


🔷 Advanced Configuration:

<container id="query-container" version="1.0">
  <search/>
  <document-api/>
  <processing/>
  <document-processing cluster="feed"/>
 
  <rest-api>
    <binding>http://*/custom-endpoint</binding>
  </rest-api>
 
  <nodes>
    <node hostalias="query-node-1"/>
    <node hostalias="query-node-2"/>
  </nodes>
</container>


🔷 Key Elements and Their Roles:

Element

Purpose

<search/>

Enables query handling and ranking

<document-api/>

Allows feed operations (document PUT/REMOVE/etc.)

<processing/>

Enables request/response processing chains

<document-processing cluster="feed"/>

Associates processing pipeline with specific content cluster

<rest-api>

Add custom REST endpoints

<nodes>

Deploy container nodes (can scale independently)

You can have multiple containers: one for feeding, one for querying, one for admin APIs, etc.


✅ 3. <content> Component – Stateful Document Storage, Indexing, Search

🔷 Purpose:

  • Stores and indexes Vespa documents.

  • Performs vector search, full-text search, filtering, etc.

  • Manages replication, distribution, and persistence.

🔷 Basic Example:

<content id="my-content" version="1.0">
  <engine>
    <proton/>
  </engine>
  <documents>
    <document type="mydoc" mode="index"/>
  </documents>
  <nodes>
    <node hostalias="node1"/>
    <node hostalias="node2"/>
  </nodes>
</content>


🔷 Advanced Configuration:

<content id="my-content" version="1.0" redundancy="2" visibility-delay="2s">
  <engine>
    <proton/>
  </engine>

  <documents>
    <document type="mydoc" mode="index"/>
    <document type="myvecdoc" mode="index"/>
  </documents>

  <distribution>
    <hash/>
  </distribution>

  <tuning>
    <searchnode>
      <resource-limits>
        <disk>0.7</disk>
        <memory>0.8</memory>
      </resource-limits>
    </searchnode>
  </tuning>

  <nodes>
    <node hostalias="node1"/>
    <node hostalias="node2"/>
    <node hostalias="node3"/>
  </nodes>
</content>


🔷 Key Elements and Roles:

Element

Purpose

<engine><proton/></engine>

Core search engine (always proton)

<documents>

Defines document types handled

<distribution><hash/></distribution>

Hash-based sharding

<redundancy>

Replication factor (how many copies of data)

<visibility-delay>

Ensures soft commit delay before doc is visible

<tuning>

Fine-grain performance/resource tuning

<nodes>

Storage and search nodes


🔄 Communication Flow Summary:

[User HTTP Request]
        ↓
  [Container Node]
    - REST API
    - Feed / Search
        ↓
  [Content Node(s)]
    - Stores/indexes docs
    - Runs queries/searches
        ↑
  [Admin/Config Server]
    - Cluster state
    - Orchestration



🔍 Use Case-Based Deployment Design:

Use Case

Component Setup

High QPS Read

Scale <container> (with <search/>) horizontally

High Feed Ingestion

Separate feed <container> with <document-api/> and <document-processing>

Document Vector Search

Use <content> with mode="index" and ANN field types

Monitoring

Add <metrics> inside <admin> with Prometheus support

Blue/Green Deployment

Run parallel <content> clusters and switch container endpoints



Distributed by Gooyaabi Templates | Designed by OddThemes