Thirteen scripts, eleven weeks, one database

In scripts/ there are thirteen files named seed-wNN.js, where NN is the ISO week number when the script was written. The first is seed-w21.js from week 21 of this year; the last is seed-w31.js from a few weeks ago. Two of them (-w22b-regions, -w26b-regions) are mid-week patches.

Together they're how the districts table got populated.

Why not one big seed?

The obvious design — a single seed.js that knows every district — falls apart for two reasons.

First, the data we want isn't static. New regions get added (Etna's contrade), old ones get re-classified (Brouilly graduated from a district to a region), and dozens of small AVAs only became relevant once we'd built the UI to display them. A single seed file would need a sophisticated diff/upsert to remain idempotent.

Second, we genuinely didn't know in week 21 what we'd want by week 31. The script-per-week pattern means each week's additions are self-contained: a flat array of districts plus the same boilerplate insert loop at the bottom.

const D = [
  { region: 'Etna (Sicilia)', name: 'Linguaglossa',
    lat: 37.84, lng: 15.14, zoom: 12, type: 'DOC',
    desc: 'Norra Etnas vinhuvudstad — Nerello Mascalese.' },
  // ...130 more
]
;(async () => {
  // load existing, dedup by (region_id, name), insert in batches of 100
})()

Each file is between 100 and 150 lines, of which the data is roughly 80% and the insert harness 20%.

The shared harness

Every seed script ends with essentially the same code:

const { data: regions } = await s.from('regions').select('id, name')
const idByName = new Map((regions || []).map(r => [r.name.toLowerCase(), r.id]))
const { data: existing } = await s.from('districts').select('region_id, name')
const ek = new Set((existing || []).map(d => `${d.region_id}:${d.name.toLowerCase()}`))
const ti = []
for (const d of D) {
  const rid = idByName.get(d.region.toLowerCase())
  if (!rid) { console.warn(`missing: ${d.region}`); mr++; continue }
  const k = `${rid}:${d.name.toLowerCase()}`
  if (ek.has(k)) { dup++; continue }; ek.add(k)
  ti.push({ region_id: rid, name: d.name, /* ... */ manually_edited: true })
}

Note the manually_edited: true on every insert — these are hand-curated districts that the OSM importer is forbidden from overwriting (see the import-seams post).

Note also mr (missing-region) and dup (duplicate skipped) as counters at the bottom. The output of every seed run looks like:

total: 134, ins: 91, mr: 7, dup: 36
inserted 91

The 36 duplicates are districts that were already in the database from a previous week's run. We could refactor that away — INSERT … ON CONFLICT DO NOTHING would do it — but the explicit dedup is useful as a sanity check.

What the scripts encode beyond data

Reading the seed files in order tells you what the project was focused on each week:

| Week | Focus | |---|---| | 21 | Spain (extra DOs), Portugal (Douro/Alentejo), Switzerland | | 22 | New regions split off, including Brouilly/Bolgheri | | 23-24 | German VDPs, Austrian DACs, Hungarian | | 25-26 | New World: Australia GIs, NZ sub-regions | | 27 | Slovenia, Croatia, Georgia | | 28-29 | South America (Mendoza sub-regions, Chile zones) | | 30 | South Africa (wards), Lebanon, Turkey | | 31 | Italy: Etna contrade, Franciacorta, Lambrusco subzones |

You can tell from the descriptions which weeks the author was deep into wine reading and which weeks were structural. The Etna contrade in w31 have unusually specific descriptions ("Etna Bianco Superiore — endast från Milo"). The Australian GIs in w25-26 are terser ("Klassiska Mt Barker-trakter").

When does this pattern stop scaling?

Probably around seed-w52.js. The friction is that re-running an old seed script to check what it did takes a roundtrip to Supabase even if everything's already inserted (it loads existing districts to dedup). Thirteen scripts is fine. Fifty would be ugly.

The endgame is to flatten all the seeds into a single districts.json exported from the live database, ship that as canonical reference data, and retire the weekly scripts. We're not there yet because the database is still moving — most weeks something gets restructured, and a flat JSON would have to be re-exported every time.

For now, seed-wNN.js is both a script and an audit trail. Git already tells you what was committed each week. The seed scripts tell you what got inserted into Supabase each week, which is the data that actually shipped.

#seed#history#process