Home / Blog / Two sources for a winery description, in order
May 11, 2026· Wine World Map
Two sources for a winery description, in order
Most wineries in the database land with description = NULL. The OSM
importer pulls coordinates, opening hours, contact info, and the
accepts_visitors flag, but it doesn't try to be clever about prose.
enrich-wineries.js is the follow-up pass that fills in the
description column from whatever sources are available.
It runs in two phases, in this order:
- OSM tags — read
description,description:en, ornotedirectly from theosm_tagsJSON we stored at import time. - Wikipedia — for wineries whose OSM data includes a
wikipedia=en:Titletag, fetch the article summary and use the first paragraph.
Phase 1 is synchronous, runs in seconds, and fills in maybe 30% of
the rows. Phase 2 is rate-limited (120 ms per request, polite
User-Agent) and fills in another 5-8%. The rest stay NULL, which
the UI handles gracefully — the description block is just omitted.
Phase 1: OSM tags
A surprising number of OSM contributors write a sentence or two in
the description tag when they map a winery:
const t = (w.osm_tags || {})
const candidate = t.description || t['description:en'] || t.note
if (typeof candidate === 'string' && candidate.trim().length > 10) {
osmUpdates.push({ id: w.id, description: candidate.trim() })
}
Three things to notice:
- Fallback order: localized
description:enonly wins if the basedescriptionwas missing. In practice contributors tend to populate one or the other, not both. noteis the last resort. It's officially OSM's "private comment to other mappers" field, but in winery-land it's routinely used for visitor-facing prose because contributors don't always remember the tag conventions.- Length gate of 10 characters. Below that you typically get
things like
privateorgate code 1234, which we don't want to display.
These updates fan out in batches of 100, with no rate limiting, straight into Supabase. Roughly 4,500 wineries gain a description this way on a fresh database.
Phase 2: Wikipedia
The wikipedia OSM tag is in the form lang:Title:
wikipedia=en:Château Margaux
wikipedia=fr:Domaine de la Romanée-Conti
wikipedia=de:Weingut Robert Weil
The script parses the lang prefix, hits that language's REST
endpoint, and grabs the extract field:
async function fetchWikipediaSummary(wikiTag) {
const m = String(wikiTag).match(/^([a-z]{2,3}):(.+)$/)
if (!m) return null
const [, lang, title] = m
const url = `https://${lang}.wikipedia.org/api/rest_v1/page/summary/${encodeURIComponent(title.replace(/ /g, '_'))}?redirect=true`
// ...
if (sum.type === 'disambiguation') return null
return sum.extract || null
}
The fact that we honor whichever language the OSM tag specifies is
the interesting part. A German winery gets its description in German.
A French domaine gets it in French. The UI doesn't currently flag
the language, which is a small bug — eventually we'll need a
description_lang column so the page can render with
lang="fr" for accessibility and Google's hreflang signaling.
What manually_edited does here
Both phases skip rows where manually_edited = true. This is the
same protective flag from the import-seams post,
extended to enrichment: if a human has rewritten the description, the
script leaves it alone, even when Wikipedia would offer something
"better." Human-curated text wins.
Phase 1 also implicitly skips manually-edited rows because the
candidate column is only filled when description IS NULL — and any
hand-written description means the column is no longer null. Phase
2 needs the explicit check because in principle someone could have
flagged a row as manually-edited without writing anything yet.
Yield, ordered cheap-to-expensive
| Source | Coverage gain | Time | Net cost |
|---|---:|---:|---|
| OSM description / description:en / note | ~30% | seconds | free |
| Wikipedia summary (when osm_tags.wikipedia is set) | ~5-8% | minutes | one polite REST hit per candidate |
| Hand-written | 0.5% | manual | the only way to get the famous ones right |
This is the same lesson as the image-fetching post: always try the local, free source before reaching for the network. A surprising amount of the data is already in your database, encoded in a tag you weren't reading.
The corollary: when you do reach for the network, do it
idempotently and politely. enrich-wineries.js can be re-run any
time. It only touches rows still missing a description. It identifies
itself with a User-Agent. It sleeps 120 ms between requests. On a
fresh run it takes maybe four minutes. On a re-run, when most rows
are already filled, it takes about twelve seconds.
#enrichment#osm#wikipedia