Data validation

When you edit OpenStreetMap (OSM), all your changes are applied automatically. This makes for a fast mapping environment where you can easily work together. But of course, there are risks to this approach. The OpenStreetMap community handles those risks by constant validation.

For this tool, we use the OSMCha. The tool was developped by Mapbox, a company that provides basemaps and other services to large corporations. They provide this tool to any mapper free of charge.

In practice

Every edit you make in OSM is bundled in a little file, a changeset (example). This file is forever associated with your username (example) and the object you changed (example).

In OSMCha we filter changesets:

  • if they have the changeset comment “pomp” (the name of our instance at the time; this was later changed to “cyclofix”)
  • if they are in Belgium
  • if the changeset has NOT been reviewed yet within OSMCha
  • if the contributor has NOT been marked as trusted by the current OSMCha user
  • if the edit was made with the Editor “MapComplete 0.0.0”

If you first go to OSMCha.org and connect your OpenStreetMap.org, you can see our filter (this replaces our previous filter). We do not go through all the changes. When we see a new edit, we try to assess if the edit is plausible. We look at the data in the context of other OSM data: maybe it’s a duplicate, maybe something that is usually besides a road is mapped in the middle of the road; or inside a building!

Then we cross-reference with aereal imagery, the photos provided by the contributor or on Mapillary (a crowd-sourced street-level imagery datasource). We make corrections and imporvements if needed, and mark the changeset as “good” or “bad” within OSMCha.

If we have checked a few edits of a new contributor and found them to be in perfect shape, we mark the contributor as a “trusted user”. This list is shared somewhere on the OSM wiki.

During this process, we can leave a message on the Changeset for the contributor and the community to see. This creates a public record on any doubts there might be about this particular data-point.