Home / Blog / Two sources for a winery description, in order

May 11, 2026· Wine World Map

Two sources for a winery description, in order

Most wineries in the database land with description = NULL. The OSM importer pulls coordinates, opening hours, contact info, and the accepts_visitors flag, but it doesn't try to be clever about prose. enrich-wineries.js is the follow-up pass that fills in the description column from whatever sources are available.

It runs in two phases, in this order:

  1. OSM tags — read description, description:en, or note directly from the osm_tags JSON we stored at import time.
  2. Wikipedia — for wineries whose OSM data includes a wikipedia=en:Title tag, fetch the article summary and use the first paragraph.

Phase 1 is synchronous, runs in seconds, and fills in maybe 30% of the rows. Phase 2 is rate-limited (120 ms per request, polite User-Agent) and fills in another 5-8%. The rest stay NULL, which the UI handles gracefully — the description block is just omitted.

Phase 1: OSM tags

A surprising number of OSM contributors write a sentence or two in the description tag when they map a winery:

const t = (w.osm_tags || {})
const candidate = t.description || t['description:en'] || t.note
if (typeof candidate === 'string' && candidate.trim().length > 10) {
  osmUpdates.push({ id: w.id, description: candidate.trim() })
}

Three things to notice:

  • Fallback order: localized description:en only wins if the base description was missing. In practice contributors tend to populate one or the other, not both.
  • note is the last resort. It's officially OSM's "private comment to other mappers" field, but in winery-land it's routinely used for visitor-facing prose because contributors don't always remember the tag conventions.
  • Length gate of 10 characters. Below that you typically get things like private or gate code 1234, which we don't want to display.

These updates fan out in batches of 100, with no rate limiting, straight into Supabase. Roughly 4,500 wineries gain a description this way on a fresh database.

Phase 2: Wikipedia

The wikipedia OSM tag is in the form lang:Title:

wikipedia=en:Château Margaux
wikipedia=fr:Domaine de la Romanée-Conti
wikipedia=de:Weingut Robert Weil

The script parses the lang prefix, hits that language's REST endpoint, and grabs the extract field:

async function fetchWikipediaSummary(wikiTag) {
  const m = String(wikiTag).match(/^([a-z]{2,3}):(.+)$/)
  if (!m) return null
  const [, lang, title] = m
  const url = `https://${lang}.wikipedia.org/api/rest_v1/page/summary/${encodeURIComponent(title.replace(/ /g, '_'))}?redirect=true`
  // ...
  if (sum.type === 'disambiguation') return null
  return sum.extract || null
}

The fact that we honor whichever language the OSM tag specifies is the interesting part. A German winery gets its description in German. A French domaine gets it in French. The UI doesn't currently flag the language, which is a small bug — eventually we'll need a description_lang column so the page can render with lang="fr" for accessibility and Google's hreflang signaling.

What manually_edited does here

Both phases skip rows where manually_edited = true. This is the same protective flag from the import-seams post, extended to enrichment: if a human has rewritten the description, the script leaves it alone, even when Wikipedia would offer something "better." Human-curated text wins.

Phase 1 also implicitly skips manually-edited rows because the candidate column is only filled when description IS NULL — and any hand-written description means the column is no longer null. Phase 2 needs the explicit check because in principle someone could have flagged a row as manually-edited without writing anything yet.

Yield, ordered cheap-to-expensive

| Source | Coverage gain | Time | Net cost | |---|---:|---:|---| | OSM description / description:en / note | ~30% | seconds | free | | Wikipedia summary (when osm_tags.wikipedia is set) | ~5-8% | minutes | one polite REST hit per candidate | | Hand-written | 0.5% | manual | the only way to get the famous ones right |

This is the same lesson as the image-fetching post: always try the local, free source before reaching for the network. A surprising amount of the data is already in your database, encoded in a tag you weren't reading.

The corollary: when you do reach for the network, do it idempotently and politely. enrich-wineries.js can be re-run any time. It only touches rows still missing a description. It identifies itself with a User-Agent. It sleeps 120 ms between requests. On a fresh run it takes maybe four minutes. On a re-run, when most rows are already filled, it takes about twelve seconds.

#enrichment#osm#wikipedia