Home / Blog / The two seams in the OSM import

May 11, 2026· Wine World Map

The two seams in the OSM import

Re-running the OSM importer should be safe. You should be able to do it weekly and have it pull in new wineries that mappers added, update opening hours when they change, and not clobber the description you spent ten minutes writing about Château Lafite.

There are two awkward places where the importer interacts with state it didn't create: rows that humans have edited, and wineries whose coordinates don't fall inside any region polygon. This post is about both.

Seam 1: respecting hand-edits

Every row in wineries has two columns the importer reads before deciding what to do:

  • manually_edited (boolean) — skip this row entirely. Useful when someone has rewritten everything by hand and the OSM data is just worse.
  • override_fields (text[]) — list of field names that should not be overwritten on re-import. Everything else still gets refreshed.

So the typical curated row looks like:

UPDATE wineries
SET description = 'Founded in 1855, Lafite produces…',
    override_fields = ARRAY['description', 'image_url']
WHERE name = 'Château Lafite Rothschild';

Next import: OSM has a slightly more accurate lat/lng and an updated phone number. Those land. The carefully written description doesn't move. The hand-cropped hero image doesn't move.

The cost is that someone has to remember to set override_fields when they edit a row. Forgetting is the most common bug — you write a description, you reimport, the description is gone. There's no UI for this yet; it's still raw SQL in the Supabase dashboard.

There's a third escape hatch we don't really need but exists: a description_sv column that the importer never touches at all. It was meant for Swedish translations and has effectively become "the field you write in if you don't want to think about override_fields."

Seam 2: wineries with nowhere to live

OSM has plenty of wineries in places we haven't drawn a region around. A small Weingut in the Mittelrhein. A garage producer in some sub-region of the Loire that doesn't have its own row. A Croatian estate on Pelješac that's hours from the nearest mapped centroid.

When the importer can't decide which region a winery belongs to, it inserts the row with region_id = NULL. We call these orphans.

scripts/backfill-orphan-regions.js is the cleanup pass. It loads all regions with coordinates, all orphan wineries, and for each orphan finds the closest region by haversine distance:

function hav(a, b) {
  const R = 6371, toR = d => d * Math.PI / 180
  const dLat = toR(b.lat - a.lat), dLng = toR(b.lng - a.lng)
  const A = Math.sin(dLat/2)**2
          + Math.cos(toR(a.lat)) * Math.cos(toR(b.lat))
          * Math.sin(dLng/2)**2
  return 2 * R * Math.asin(Math.sqrt(A))
}

Anything within 300 km of some region gets adopted. Beyond that we leave it orphan, on the assumption that it's probably miscoded or in a genuinely untracked area.

The script prints distance buckets at the end:

distance buckets: { "<10km": 412, "<50km": 187, "<150km": 63, "<300km": 21 }

Almost every match is under 10 km — those are wineries that sit just outside the region polygon we drew but well inside what a wine writer would call the region. The under-50km bucket is the genuinely ambiguous ones: a winery near the border between Rheingau and Rheinhessen, say. The under-300km tail is the worrying part, and is worth occasionally inspecting by hand.

Why this is two passes, not one

The cleaner design is "the importer figures out the right region during INSERT and never produces an orphan." That requires:

  • A spatial index in Postgres (PostGIS),
  • Region polygons that are actually correct (see the vineyard-hulls post),
  • Tie-breaking logic when a winery falls inside two overlapping polygons.

We don't have any of those. We have lat/lng centroids and a haversine function. A nightly orphan-backfill turns out to be entirely good enough — it's bounded work, easy to dry-run, and the worst case is that one winery shows up under "Rheingau" when a purist would say "Rheinhessen." For a map whose job is to help someone plan a wine trip, that's a tolerable error.

The slogan, if there is one: don't force the import to be correct. Force it to be reversible, and let a follow-up pass clean up.

#osm#import#data