Vector search & local RAG guide

This guide covers the "vector-native edge" features introduced in 0.3.0: inline embeddings, quantized DiskANN indexes, and hybrid (vector + keyword) retrieval. It explains why each exists, how to use it, and gives runnable examples.

All of these are ordinary SQL over a normal SQLite-compatible database. There is no separate vector service to run and no special file format — a database that doesn't use these features stays byte-compatible with stock SQLite.

Why
1. Storing and searching vectors
2. Embedding text inline with embed()
3. Quantization for the edge
4. Hybrid search (vector + keyword)
5. Putting it together: local RAG
Reference

Why

Retrieval-augmented generation (RAG) and semantic features usually mean running a separate vector database next to your primary store, keeping the two in sync, and paying a network hop on every query. On the edge — mobile apps, CLIs, desktop apps, embedded devices — that is often impractical.

SQL Anywhere already ships a DiskANN vector index and edge replication inside a single embedded engine. 0.3.0 makes that combination practical for real workloads:

embed() removes the "compute embeddings elsewhere first" step for prototyping and lexical use cases.
Quantization makes the index small enough to ship on constrained devices.
Hybrid search gives production-grade retrieval quality by fusing semantic and keyword signals — in one SQL query, against local data.

The result: retrieval with no extra infrastructure and no network round-trip.

1. Storing and searching vectors

Why. Similarity search ("find the rows most like this one") powers recommendations, semantic search and RAG. SQL Anywhere stores embeddings as a native column type and indexes them with DiskANN for approximate nearest-neighbour (ANN) search.

How. Declare a FLOATn(dim) column, build an index with sqlanywhere_vector_idx(...), and query with vector_top_k(...).

CREATE TABLE movies (
  id    INTEGER PRIMARY KEY,
  title TEXT,
  emb   FLOAT32(4)          -- 4-dimensional f32 embedding
);

CREATE INDEX movies_idx ON movies(sqlanywhere_vector_idx(emb));

INSERT INTO movies VALUES
  (1, 'Action',   vector32('[1, 0, 0, 0]')),
  (2, 'Romance',  vector32('[0, 1, 0, 0]')),
  (3, 'Thriller', vector32('[0.8, 0.2, 0, 0]'));

-- 2 nearest neighbours of a query vector
SELECT movies.title
FROM   vector_top_k('movies_idx', vector32('[1, 0, 0, 0]'), 2) AS k
JOIN   movies ON movies.id = k.id;
-- => 'Action', 'Thriller'

Distances are available directly too: vector_distance_l2(a, b) and vector_distance_cos(a, b). Vectors come in several precisions — vector32, vector64, vector16, vector8, vector1bit, vectorb16 — and vector_extract(v) reads a vector back as text.

Runnable, tested examples: sqlanywhere/tests/vector.rs.

2. Embedding text inline with `embed()`

Why. Building a vector column normally means running an embedding model before you touch the database. For prototyping, lexical search, and zero-dependency setups that friction is unnecessary.

How. sqlanywhere::embed(text, dims) returns a normalized vector literal you pass straight to vector32:

use sqlanywhere::{embed, params};

conn.execute(
    "INSERT INTO docs (body, emb) VALUES (?, vector32(?))",
    params![text, embed(text, 128)],
).await?;

What it is (and isn't). embed() uses the hashing trick: text is tokenized into words, each word is hashed (stable FNV-1a) into one of dims buckets with a signed contribution, and the result is L2-normalized. Documents that share vocabulary get similar vectors, so cosine similarity works as a lexical signal. It is deterministic and dependency-free — perfect as a default and for tests.

It is not a neural/semantic embedding: no synonyms, no context. For production semantic search, compute embeddings with a real model (local ONNX or a hosted API) and store the resulting numbers the same way — the index and vector_top_k behave identically no matter how the vectors were produced.

Tests: sqlanywhere/tests/embed.rs.

3. Quantization for the edge

Why. A DiskANN graph stores neighbour vectors at every node. At full f32 precision that dominates index size — a problem when shipping to phones or embedded devices.

How. Pass compress_neighbors= when creating the index:

CREATE INDEX movies_idx ON movies(
  sqlanywhere_vector_idx(emb, 'metric=cosine', 'compress_neighbors=float1bit')
);

Measured impact (800 × 32-dim vectors, cosine; search still returns results):

`compress_neighbors`	Index size	vs `float32`
`float32` (default)	3352 KB	1×
`float16`	1744 KB	1.9× smaller
`float8`	1212 KB	2.8× smaller
`float1bit`	604 KB	5.5× smaller

Lower precision trades a little recall for a lot of space. float8 is a good default for edge; float1bit is extreme compression for very large corpora. Other index knobs: metric=cosine|l2, max_neighbors, alpha, search_l, insert_l.

4. Hybrid search (vector + keyword)

Why. Pure vector search misses exact terms (names, codes, rare words); pure keyword search misses paraphrases. Production retrieval systems combine both.

How. SQLite's FTS5 full-text index and the vector index live in the same database, so you can fuse their rankings with Reciprocal Rank Fusion (RRF) — score = Σ 1/(k + rank) across rankers — in one query:

WITH v AS (                              -- semantic (vector) ranking
  SELECT k.id, ROW_NUMBER() OVER () AS vrank
  FROM vector_top_k('docs_vec', vector32('[1,0,0,0]'), 20) k
),
f AS (                                   -- keyword (FTS5) ranking
  SELECT docs_fts.rowid AS id, ROW_NUMBER() OVER (ORDER BY rank) AS frank
  FROM docs_fts WHERE docs_fts MATCH 'ownership'
)
SELECT d.id, d.title
FROM docs d
LEFT JOIN v ON v.id = d.id
LEFT JOIN f ON f.id = d.id
WHERE v.id IS NOT NULL OR f.id IS NOT NULL
ORDER BY COALESCE(1.0/(60+v.vrank),0) + COALESCE(1.0/(60+f.frank),0) DESC;

Documents strong in both rankers surface first. k = 60 is the usual RRF constant; raise it to flatten the contribution of top ranks.

Tests: sqlanywhere/tests/hybrid_search.rs.

5. Putting it together: local RAG

The capstone example wires all of the above into one retrieval pipeline — inline embedding, a quantized index, an FTS5 index, and hybrid RRF — in a single embedded database:

cargo run -p sqlanywhere --example local_rag

Q: how does rust handle memory and ownership
  1. [Ownership] Rust enforces memory safety through ownership and borrowing.
Q: similarity search over embeddings
  1. [Vectors] A vector database indexes embeddings for similarity search.

Source: sqlanywhere/examples/local_rag.rs.

Because it is just an embedded database, the same pipeline runs inside an embedded replica (see the Replication section of the README): sync documents from a primary, then embed, index and retrieve entirely on-device.

Reference

Function	Purpose
`vector32(t)` / `vector64(t)` / `vector16(t)` / `vector8(t)` / `vector1bit(t)` / `vectorb16(t)`	Build a typed vector from a `'[...]'` literal
`vector_extract(v)`	Read a vector back as text
`vector_distance_l2(a, b)`	Euclidean (L2) distance
`vector_distance_cos(a, b)`	Cosine distance (0 = identical, 2 = opposite)
`sqlanywhere_vector_idx(col, 'metric=...', 'compress_neighbors=...', ...)`	Declare a DiskANN index in `CREATE INDEX`
`vector_top_k('index_name', query_vec, k)`	ANN search returning the `k` nearest row ids
`sqlanywhere::embed(text, dims)` (Rust)	Reference lexical embedder → vector literal

¶Vector search & local RAG guide

¶Contents

¶Why

¶1. Storing and searching vectors

¶2. Embedding text inline with embed()

¶3. Quantization for the edge

¶4. Hybrid search (vector + keyword)

¶5. Putting it together: local RAG

¶Reference