AI Prediction in the Browser using Vector Embeddings

14th Jun 2025

The Movement

There is a big movement going on in the Local First circles to give users greater control over their data, greater performance and resilience. They want to do this by building applications that need less access to external servers and even the network at times.

One of the major breakthroughs in recent years which is making this goal more accessible to a lot of applications is the ability to use the power of well established databases within the Browser. Wayback, we saw WebSQL appear in browsers but never gained traction because of licensing and compatibility across all browsers. This was a real shame.

The appearance of IndexedDB was a welcome addition to the space and has served us well. It was designed as a low-level implementation but again, I don’t think it has given the boost we expected due to the ergonomics of using it. It also seemed to lack some of the power many expect from working with longstanding Relational databases. Having said that, there has been a significant rise in its use over the last few years as a storage for in-browser data managers and in-memory query engines.

All this background brings us to a major change which is happening in the browser data storage landscape. With the introduction of WebAssembly we are seeing a number of new options for us to consider. These include well loved relational databases like SQLite and now Postgres in the form of PGLite.

Vector Databases

What is a vector database

Vector Database is a specialised feature of a database which allows efficient storage and indexing of long series of numbers, called vectors. A Vector in this context is referring to a vector space where numbers represent data in a multi-dimensional space. The position of each number within the series represents a property or a feature of the object the vector has been derived from. The database then allows us to do exact and approximate nearest neighbour searches to find records similar to the search term.

This is a lot to wrap your head around at first. It is often useful to understand this while considering a simplified vector representing a set of images. Let's consider the following few images and next to each store a 4-dimensional vector, each representing a potential property of the image. The higher the number stored in each dimension indicates the relevance of that property to the contents of the image.

Mountains

Sky

Sea

Sand

Beach near Inverness

0.8

0.4

0.9

0.8

Looking up at Mt Fuji

1

0.7

0

0

Croyde Beach Devon

0.1

0.9

1

1

It's interesting that by considering these numbers, you can probably build up a picture of the original image in your mind.

Now consider that these vectors might have thousands or even millions of dimensions, this allows us to use a Vector Database to store the semantic meanings of an object as well as compare it against others in an efficient manner.

What options are available with SQLite and PGLite

Let's consider two common options for using a relational database with nearest neighbour search on vector embeddings.

PGLite with pgvector

The pgvector project has been going since 2021, is well supported and works on the server as well as in the browser with WebAssembly.

The following shows a simple example of how to query the embeddings in the database.

CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

SQLite with sqlite-vec

The sqlite-vec project has been going since 2024, it is relatively new but has a good set of companies backing it, including Mozilla. It is pre v1 at the time this is being written so one to play with and watch.

Here is a query similar to the above based on the example in the Github Repo.

create virtual table items using vec0(
embedding float[3]
);

-- KNN style query
select
rowid,
distance
from items
where
sample_embedding match '[0.890, 0.544, 0.825]'
order by distance
limit 2;
/*
┌───────┬──────────────────┐
│ rowid │ distance │
├───────┼──────────────────┤
│ 2 │ 2.38687372207642 │
│ 1 │ 2.38978505134583 │
└───────┴──────────────────┘
*/

Opportunity

Taking a simple example of embeddings model which has been generated with 7 vectors, one for each day of the week. The model would calculate these vectors on past data and will result in each vector being a number between 0 and 1 where the closer these vectors are to 1 the more relevant that day is for a user on the system.

select * from lessons

user

embedding

27021dfb

[0,0,0,0,0,0,0]

9a944aeb

[0,0,0,0.33333334,0,0,0.33333334]

c267e4b7

[0,0,0,0,0,1,0]

bfa22fd5

[0,0,1,0,0,0,0]

230f118c

[0,0.6666667,0,0,0,0,0]

043c0b43

[0,0,0,1,0,0,0]

64a9508a

[0,0,0,0,0,0,0]

d8aa039c

[0,0,0.5,0.5,0,0,0]

e479c275

[0,1,0,0,0,0,0]

dea159d5

[0,0.5,0,0,0.5,0,0]

8acb2fe9

[0,0,0,0.4,0,0,0]

4dbaa764

[0,0,0,0.4,0,0.2,0.2]

b17c4b6a

[0,0,0,0.5,0,0,0]

14eb0eeb

[0,0,0,0,1,0,0]

7e4445e3

[0,0, 0.2857143, 0, 0.2857143, 0.14285715, 0]

...

Searching for the user best suited for an event on Wednesday might be done in one of two ways.

  1. Selecting the rows ordered by the distance
  2. Selecting the distance itself

Of course we could mix these two together, but let's consider them separately for the moment. To get the five closest results for a vector which matches Wednesday the most we can use [0,0,0,1,0,0,0]pgvector adds a few distance functions, the <-> which gives us L2 distance on this multi-dimensional array of numbers.

SELECT * FROM lessons ORDER BY embedding <-> '[0,0,0,1,0,0,0]' LIMIT 5;

user

embedding

1a4a943f

[0,0,0,1,0,0,0]

043c0b43

[0,0,0,1,0,0,0]

b17c4b6a

[0,0,0,0.5,0,0,0]

8acb2fe9

[0,0,0,0.4,0,0,0]

4dbaa764

[0,0,0,0.4,0,0.2,0.2]

Selecting the distance itself allows us to see the result of the calculation. This is useful if we want to filter out results if they are not close enough.

SELECT l.user, l.embedding <-> '[0,0,0,1,0,0,0]' as distance from lessons l order by distance LIMIT 5;

user

distance

1a4a943f

0

043c0b43

0

b17c4b6a

0.5

8acb2fe9

0.6000000119209289

4dbaa764

0.6633249562739323

References