Notes from migrating an aggregation- and search-heavy app's data layer from Firestore to Postgres (Neon)

This article was co-authored with a generative AI. Facts have been cross-checked against official documentation where possible, but errors may remain. Please verify against primary sources before relying on this for important decisions.

We are building an experimental archive app that places pins on a map and links literary works — waka poems, noh chants, tales — to locations. We initially used Firebase Firestore as the data store, but as we ran it, we kept hitting situations where it did not fit this app's workload well, so we migrated the data layer to Postgres (Neon) + Drizzle ORM (Object-Relational Mapping). This article goes through how that happened and the workload characteristics that informed the decision.

Rather than framing it as "Firestore versus Postgres", the conclusion is that fit seems to depend on whether the most common operation in your app is "look up by key" or "count and search across records".

What we are building

The data model is roughly as follows.

projects — projects
pins — pins on the map (latitude/longitude, tags, material references)
materials — source materials
yorei (usage examples) — a work's text (original, reading, translation, commentary) linked to a pin or material
activity — operation logs

Together with login (Google account), everything was handled within Firebase, and it ran serverless with little operational overhead, so it looked like a reasonable choice for a small experimental tool. The data itself was only on the order of a few thousand records, so volume was never the issue.

A list page stalled on read-based billing

While running it, the usage-example list page (/yorei) would sometimes stop rendering. As far as we could tell, the cause was reaching Firestore's free-tier daily read limit (50,000 reads, per what is published).

Why would reads run out with only a few thousand records? That was our initial question. Firestore bills, in the basic case, by the number of documents read (strictly speaking, a query that returns zero results still costs at least one read, and one read is added for each batch of up to 1,000 index entries scanned). The things this app wanted to do on its list pages, however, did not fit per-document billing.

Show the number of usage examples per material alongside the list
Show the number of usage examples per pin
Enumerate category candidates for filtering
Substring search over the body text

Firestore does offer count() / sum() / avg() aggregation queries (count() added in 2023, sum() / avg() in 2024), which compute server-side and return only the result, so they are cheaper than reading every document. But each returns a single value, so producing per-group facets (per material, per pin) means one query per group, and a GROUP BY equivalent was not part of the standard query API at the time (grouped aggregation later arrived via Enterprise-edition Pipelines). Aggregation queries also cannot be combined with real-time listeners. In the end this app read the target documents and counted them in the app, so every time a list opened it read several thousand documents, and that accumulated across users and operations until it reached the limit.

In other words, what we ran into was not data volume but the shape of the workload.

The operations we needed

To summarize, the high-frequency operations in this app were these three.

Aggregation and faceting — always showing "how many usage examples for this material" and "how many for this pin" beside the list
Full-text search — substring matching over the body (original, reading, translation, commentary)
Composite sort — splitting a string like "巻3-12" (volume 3, item 12) into two numeric axes and ordering by them

These are areas relational databases handle well, and areas where document databases require extra effort. Firestore is easy to work with for looking up by key, narrowing by equality, and subscribing to changes in real time, but counting, searching, and joining across records appear to fall outside its primary design focus.

On top of that, this app made little use of Firestore's real-time sync. Multiple people edit, but not as concurrent co-editing that contends over the same pin — more like dividing the work and touching it asynchronously — so there was limited benefit from the subscription mechanism.

What became straightforward to write in Postgres

For the destination we chose Neon (serverless Postgres). The reasons were that it has scale-to-zero, which suspends compute while idle and lets us run a small experiment within the free tier, and that it connects cleanly to our deploy target, Vercel. For the ORM we used Drizzle, which reads close to SQL.

Each of the three operations above collapsed into a single query.

Per-material usage-example counts are one GROUP BY, and the app receives only the counts.

select material_id, count(*)::int as n
from yorei
where project_id = $1
group by material_id;

Because a usage example links to multiple pins, it holds pin_ids as an array, but per-pin counts can be obtained by unnest-ing the array and counting.

select pid as pin_id, count(*)::int as n
from yorei, unnest(pin_ids) as pid
where project_id = $1
group by pid;

For substring matching over the body we used a GIN (Generalized Inverted Index) index from the pg_trgm extension. We index an expression that concatenates title, author, original, reading, translation, and commentary, so that ILIKE substring matching works.

create extension if not exists pg_trgm;

create index yorei_search_trgm_idx on yorei using gin (
  (
    coalesce(title, '') || ' ' ||
    coalesce(author, '') || ' ' ||
    coalesce(body_original, '') || ' ' ||
    coalesce(body_reading, '') || ' ' ||
    coalesce(body_translation, '') || ' ' ||
    coalesce(context, '')
  ) gin_trgm_ops
);

Ordering by strings like "巻3-12" could also be written declaratively by splitting them into two axes with regular expressions.

order by
  nullif(substring(location from '巻(\d+)-'), '')::int asc nulls last,
  nullif(substring(location from '巻\d+-(\d+)'), '')::int asc nulls last;

What matters here is that these are treated as a single "query execution" regardless of how many records they scan. Since billing is not proportional to the number of documents read, with appropriate indexes it becomes easy to offer aggregation and faceting as standard features of the list.

On the schema side, we can now declare constraints such as foreign keys with ON DELETE CASCADE, enum types, and NOT NULL. Firestore's flexible schemaless structure conversely meant integrity had to be maintained in the app, and that burden decreased.

Design changes that came with the migration

It is not all upside; some parts were rebuilt, and there are trade-offs.

Real-time sync is now something we handle ourselves. There is no equivalent of Firestore's subscriptions (onSnapshot), so we would implement it separately if needed. We could drop it here because it was not a requirement, but for a co-editing-centric use case the decision would be reversed.
There are cold starts. Neon's free tier suspends compute after a period of idleness, so the first access after a while incurs a startup wait. If constant responsiveness is required, the paid usage-based plan is an option.
Connection management became a new concern. Connecting to Postgres from serverless functions tends to exhaust connections, so we consolidated the place that opens a connection into a single file (src/lib/db/client.ts) with lazy initialization.
The browser can no longer touch the database directly. Firestore allowed direct access via the client SDK and security rules, but Postgres does not, so we moved everything through API routes and consolidated authorization logic (the union of owner and editor) into a single file (authz.ts). It is more work, but access control became easier to follow.
Login stays on Firebase Auth. There is no need to move the entire data layer; authentication is handled by Firebase, and we use the Admin SDK only to verify ID tokens. We also did not change the vector-search backend on the RAG (Retrieval-Augmented Generation) chat side. We kept the parts that were a good fit and replaced only the parts that were not.

Fit by workload

Generalizing from this experience, the picture seems to be as follows.

When the central operations are looking up by key and syncing in real time, Firestore is easy to work with; when they are counting, searching, and ordering across records, a relational database is easier.

The dividing line appears to be the shape of the workload, not the volume of data. This app had a relational-leaning workload even when it held only a few thousand records. We think that simply surfaced as the free-tier read limit.

Rather than Firestore having a problem, our reading is that there was a mismatch in applying a document database to an app centered on aggregation, faceting, and full-text search. For real-time co-editing, or a key-lookup-centric mobile app, continuing with Firestore would have been the natural choice. Looking back, if we had been able to state "what is the most common operation in this app" in a single sentence at the technology-selection stage, we might have chosen Postgres from the start.

🗺️Notes from migrating an aggregation- and search-heavy app's data layer from Firestore to Postgres (Neon)

What we are building

A list page stalled on read-based billing

The operations we needed

What became straightforward to write in Postgres

Design changes that came with the migration

Fit by workload

🙆Developed a Firestore Adapter for the Mirador 3 mirador-annotations Plugin

Comments

What we are building

A list page stalled on read-based billing

The operations we needed

What became straightforward to write in Postgres

Design changes that came with the migration

Fit by workload

Related Articles

🙆Developed a Firestore Adapter for the Mirador 3 mirador-annotations Plugin

Comments