Grapes don't have a table

If you visit /grapes you see a list of 80-ish wine grape varieties, each with a count of how many regions grow it, a list of countries, and three sample regions. It looks like a join across a grapes table.

There is no grapes table.

Every row in regions has a grapes column of type text[] — a Postgres array of strings, free-form, populated during the seed scripts from whatever Wikipedia / WSET-style sources we had handy:

SELECT name, grapes FROM regions WHERE name = 'Tuscany';
--
-- Tuscany | {"Sangiovese","Vernaccia","Trebbiano","Vermentino","Canaiolo"}

When you load /grapes, the page reads every region's grapes array, flattens them into a map keyed by slug of the grape name, and counts distinct regions per slug. That's the whole "grapes table" — built fresh on each request from the regions table.

Why no table?

We considered it. The relational version would be:

CREATE TABLE grapes (
  id        SERIAL PRIMARY KEY,
  name      TEXT NOT NULL,
  slug      TEXT NOT NULL UNIQUE
);

CREATE TABLE region_grapes (
  region_id INT REFERENCES regions(id),
  grape_id  INT REFERENCES grapes(id),
  PRIMARY KEY (region_id, grape_id)
);

This is correct, normalized, and the wrong tradeoff for the actual problem. Three reasons:

1. The source data is dirty. A region's grape list comes from Wikipedia infoboxes and WSET regional notes. The same grape shows up as Sangiovese, Sangiovese Grosso, Brunello (which is a clone of Sangiovese), and Prugnolo Gentile (another local name for the same grape). With a normalized table we'd need a curated canonical list and a synonyms table to import into. With a string array we shrug, normalize on read, and ship.

2. The query is the join. The /grapes page reads ~600 region rows, each averaging maybe 5 grapes in its array. That's 3,000 strings to bucket. JavaScript chews through that in roughly 12 ms. A SQL GROUP BY grape_id would be faster, but neither is the bottleneck — both finish before the network round-trip to Supabase does.

3. Editing is easier. Adding a grape to a region is one update to a text[] column. Adding a grape variety we don't currently track is zero schema changes — just include the new name in any region's array and it appears in /grapes.

The aggregation

The page itself is the schema:

const map = new Map<string, {
  display: string,
  regions: { name: string; country?: string }[],
  countries: Set<string>
}>()

for (const r of rows) {
  if (!Array.isArray(r.grapes)) continue
  for (const raw of r.grapes) {
    const slug = nameToSlug(raw)
    if (!slug) continue
    const entry = map.get(slug) || { display: raw, regions: [], countries: new Set() }
    entry.regions.push({ name: r.name, country: r.countries?.name })
    if (r.countries?.name) entry.countries.add(r.countries.name)
    map.set(slug, entry)
  }
}

The key is nameToSlug(raw), the same one-line normalizer from the dedup post. So Sangiovese and sangiovese collapse, but Sangiovese and Sangiovese Grosso stay distinct. That's the trade-off in plain sight: cheap dedup, no semantic synonyms.

What we give up

Sorting grapes by global vineyard area would require actual hectares-planted data, which neither Wikipedia nor OSM has in machine-readable form. With a real grapes table we could at least store it once someone hand-curated the numbers. As-is, we sort by region count, which is a reasonable proxy and demands zero data entry.

We also can't show "grapes that share a parent variety" without a hand-built synonyms table. Which means Brunello and Sangiovese appear as separate grape pages. The wine purists won't love it. Most visitors won't notice.

The trick this post is actually about

The thing worth taking away: when your data is dirty and your edits are infrequent, a denormalized array beats a normalized table. The schema cost of a real grapes table isn't just the two CREATE TABLE statements — it's the curation pipeline, the synonym table, the import-time validation, and the dashboard for adding new varieties. None of that is free. A text[] column is free.

The day we need to ship region-by-region hectares-planted data is the day we add the table. Until then, the array is the schema.

#data-model#grapes#supabase