Marcos Dione's Diary | Correcting wrong tag values | OpenStreetMap
Select Language
Loading...
Marcos Dione's Diary
Correcting wrong tag values
Posted by
Marcos Dione
on 4 April 2026 in
I just published a hacky but effective tool to fix wrong tag values. This lead to some 50 edits only this morning.
More info:
Discussion
Comment from
chris_debian
on
5 April 2026 at 09:07
Nice script, Marcos! Would you be able to share an example of a ‘before’ and ‘after’ tag?
Thanks,
Chris
Comment from
Marcos Dione
on
5 April 2026 at 13:53
The edits are by hand, there is no automation there, just finding “bad” values.
Comment from
chris_debian
on
5 April 2026 at 14:23
Thanks for the clarification Marcos, makes sense to keep the edits manual for accuracy.
I had a go at tidying up the script a little, in case it’s useful. The main changes:
Swapped
os.system()
for
subprocess.run()
— a bit safer and more Pythonic
Added an
input()
pause between objects so they open one at a time rather than all at once
Used a context manager for the DB connection
Added a
HAVING count(*) < N
filter to focus on the long tail and skip common valid values
Added a
KNOWN_GOOD
set to skip values you already know are fine
Added comments explaining the negative-ID-means-relation behaviour (I didn’t know that, good to learn!)
Revised script below. Sorry about the formatting, the diary interprets hash/ pound as bold font, they should be comments in the script. Happy to be ignored if you prefer your original — it clearly does the job! :)
Chris
Revised script
```python
#! /usr/bin/env python3
“””
fix_osm_tags.py - Find and manually correct wrong highway tag values in OSM.
Requires a local osm2pgsql rendering database (e.g. ‘europe’).
Workflow:
1. Queries planet_osm_line for highway values, rarest first (long tail first).
2. For each rare value, opens the OSM editor in your browser one object at a time.
3. You review, correct or leave a note, then press Enter to continue.
“””
import subprocess
import sys
import psycopg2
— Configuration —
DB_NAME = “europe”
BROWSER = “librewolf”
BROWSER_PROFILE = “default”
Highway values that are known-good and should be skipped.
# Expand this list to avoid being prompted for valid rare values.
KNOWN_GOOD = {
“residential”, “track”, “path”, “footway”, “cycleway”,
“service”, “unclassified”, “tertiary”, “secondary”, “primary”,
“trunk”, “motorway”, “living_street”, “pedestrian”, “steps”,
“motorway_link”, “trunk_link”, “primary_link”, “secondary_link”,
“tertiary_link”,
Only show groups with fewer than this many occurrences.
# Keeps the focus on the long tail of rare/likely-wrong values.
MAX_COUNT = 50
def open_in_editor(osm_id: int) -> None:
“"”Open the OSM web editor for a given osm2pgsql osm_id.
In osm2pgsql rendering databases, negative IDs represent relations;
positive IDs represent ways.
"""
if osm_id < 0:
# Negative ID => relation
url = f"https://www.openstreetmap.org/edit?relation={-osm_id}"
else:
# Positive ID => way
url = f"https://www.openstreetmap.org/edit?way={osm_id}"

subprocess.run([BROWSER, "-P", BROWSER_PROFILE, url], check=False)
def main() -> None:
with psycopg2.connect(dbname=DB_NAME) as db:
cursor = db.cursor()
# Fetch highway values ordered by frequency ascending (rarest first).
# HAVING filters out common values that are unlikely to be errors,
# keeping the focus on the suspicious long tail.
cursor.execute(
"""
SELECT
count(*) AS count,
highway
FROM planet_osm_line
WHERE
highway IS NOT NULL
GROUP BY highway
HAVING count(*) < %s
ORDER BY count ASC
""",
(MAX_COUNT,),
groups = cursor.fetchall()

for count, highway in groups:
# Skip values we already know are valid.
if highway in KNOWN_GOOD:
continue

print(f"\n{'='*50}")
print(f"Value: '{highway}' ({count} occurrence{'s' if count != 1 else ''})")
print("Press Enter to open each object, or type 's' to skip this group.")

choice = input("> ").strip().lower()
if choice == "s":
continue

# Fetch all OSM IDs with this highway value.
cursor.execute(
"""
SELECT osm_id
FROM planet_osm_line
WHERE highway = %s
""",
(highway,),
osm_ids = [row[0] for row in cursor.fetchall()]

for i, osm_id in enumerate(osm_ids, start=1):
print(f" Opening {i}/{len(osm_ids)} (osm_id={osm_id}) ...")
open_in_editor(osm_id)

# Note: if the object no longer exists in OSM, the editor
# will open a view of the whole planet — just close that tab.
if i < len(osm_ids):
next_action = input(" Press Enter for next, or 's' to skip rest of group: ").strip().lower()
if next_action == "s":
break

print("\nAll done!")
if
name
== “
main
”:
sys.exit(main())
```
Comment from
Marcos Dione
on
7 April 2026 at 08:28
I did more modifications on my side, I collapsed both queries into a single one, and now there’s no wait between objects :) Also, any key for any table. Will try to merge these tonight.
```
#! /usr/bin/env python3
import os
import sys
import psycopg2
def main():
db = psycopg2.connect(dbname=’europe’)
cursor = db.cursor()
cursor.execute(f'''
SELECT
count(*) AS count,
{sys.argv[2]},
array_agg(osm_id)
FROM planet_osm_{sys.argv[1]}
WHERE
{sys.argv[2]} IS NOT NULL
GROUP BY {sys.argv[2]}
ORDER BY count ASC
''')

while (data := cursor.fetchone()) is not None:
# print(data)
count, tag, osm_ids = data
print(f"next {count}: {tag}")

for osm_id in osm_ids:
ans = input(f'Ready for {osm_id}? ')
if ans != 'n':
if osm_id < 0:
# in rendering DBs, this is a relation
os.system(f"librewolf -P default 'https://www.openstreetmap.org/edit?relation={-osm_id}'")
else:
os.system(f"librewolf -P default 'https://www.openstreetmap.org/edit?way={osm_id}'")
if
name
== ‘
main
’:
main()
```
Comment from
chris_debian
on
7 April 2026 at 16:24
Nice work.
Leave a comment
to leave a comment