This tool rebuilds shapes.txt for a GTFS feed by map-matching trip stops to OpenStreetMap (OSM) data. It supports both road (bus) and rail (mainline train) networks.
- Automatic Mode Detection: Infers whether a route is "road" or "rail" based on
route_typeand keywords in route names. - Hybrid Graph Building: Builds separate routing graphs for road and rail networks from one or more OSM PBF files (rail now restricted to
rail+ ferry links; no tram/metro/light_rail/monorail). - Rail Stop Normalizer: Optionally snaps rail GTFS stops to nearby OSM rail stops before shape generation to fix bad coordinates.
- Rail speed bias: Uses OSM
maxspeedto prefer tracks coerent with train class (AV, IC/EC, REG, etc.). - Anti-detour & multi-candidate snapping: For each stop pair, tries multiple nearby rail nodes, avoids huge detours, rejects unrealistically sharp turns.
- Ferry bridging: Includes
route=ferrysegments in the rail graph to span sea gaps (es. Stretto di Messina). - Shape Simplification: Simplifies the resulting polylines to reduce file size while maintaining accuracy.
- Deduplication & reuse: Assigns shared
shape_ids to trips with identical geometries and skips recomputation when the stop sequence repeats. - Visualizer: Includes a web-based viewer to watch the graph building and shape generation process in real-time.
- Python 3.9+
- A valid GTFS feed (unpacked directory).
- An OpenStreetMap PBF (OSM.PBF) file covering the region of the GTFS feed.
- Clone the repository.
- Create a virtual environment:
python -m venv .venv source .venv/bin/activate # Linux/Mac .venv\Scripts\activate # Windows
- Install dependencies:
Note: You may need to install
pip install -r requirements.txt
osmiumdependencies separately if the pip install fails (e.g.,libosmiumon Linux).
To rebuild shapes for a GTFS feed using an OSM PBF file:
python main.py --gtfs /path/to/gtfs_dir --osm /path/to/region.osm.pbfYou can also pass multiple PBFs (they are merged while building the graphs):
python main.py --gtfs /path/to/gtfs_dir --osm /path/to/region.osm.pbf /path/to/extra.osm.pbfThis will:
- Load the GTFS data.
- Compute the bounding box of all stops.
- Build road and/or rail graphs from the PBF file within that bounding box.
- Process every trip in
trips.txt, generating a shape. - Write a new
shapes.txtto the GTFS directory. - Update
trips.txtwith the newshape_ids (backing up the original astrips.txt.bak). - Generate a
shape_id_map.csvin the GTFS directory, mapping each trip to its assigned shape ID.
| Argument | Description | Default |
|---|---|---|
--gtfs |
Path to the unpacked GTFS directory. | trgtfs |
--osm |
Path(s) to one or more OSM PBF files. | lazio.osm.pbf |
--modes |
Which graphs to build: road, rail, or both. |
both |
--dry-run |
Run without writing changes to disk. | False |
--max-trips |
Limit the number of trips to process (for testing). | None |
--tolerance-road |
Simplification tolerance (meters) for road shapes. | 5.0 |
--tolerance-rail |
Simplification tolerance (meters) for rail shapes. | 3.0 |
--with-viewer |
Launch the web visualizer. | False |
--load-graphs |
Load a previously saved road/rail graph cache instead of rebuilding. | None |
--save-graphs |
Save the built road/rail graphs for future runs. | None |
--normalize-rail-stops |
Snap rail GTFS stops to nearby OSM rail stops before building shapes. | False |
--normalize-rail-threshold |
Max distance in meters to move a rail stop when normalizing. | 600.0 |
If you only want to process rail trips (e.g., for a train-only feed or to save time):
python main.py --gtfs ./my_gtfs --osm ./italy.osm.pbf --modes railAny trips identified as "road" (bus) will be skipped.
To watch the process in real-time:
python main.py --with-viewerIf you already built the road/rail graphs once, you can reuse them to skip parsing the PBF:
python main.py --gtfs ./my_gtfs --osm ./italy.osm.pbf --modes rail --save-graphs ./rail_graphs.gpickle
# Later
python main.py --gtfs ./my_gtfs --osm ./italy.osm.pbf --modes rail --load-graphs ./rail_graphs.gpickleThe cache stores road and rail graphs together; loading automatically pulls only the modes you request.
This will open a web browser at http://127.0.0.1:1890. Click "Build Graphs (Live View)" to start.
Note: The visualizer is a basic tool for debugging and watching progress. It is not highly optimized and may struggle with very large datasets. It is "not one of the bests", but it gets the job done for monitoring.
-
OSM Coverage: You MUST provide an OSM PBF that covers the entire area of your GTFS feed. If the PBF is too small, stops outside the area will not be matched, and trips may fail or result in straight lines.
- Tip: Download a larger region (e.g., the whole country or municipality) from Geofabrik. The script automatically filters the graph to the bounding box of your stops, so using a large PBF is efficient.
-
Graph Connectivity: The script assumes the OSM network is connected. If stops are far from any road/rail (e.g., bad stop coordinates or missing OSM data), the map matching may fail or produce straight lines between those stops.
-
Rail scope: The rail graph now keeps only
rail(mainline) androute=ferrylinks; tram/metro/light_rail/monorail are excluded to avoid wrong routings. -
Rebuilding cache: If you use a saved graph cache, rebuild it after updates that add attributes (e.g.,
maxspeed, ferry inclusion) to benefit from new routing bias. -
Performance:
- Building graphs from large PBFs can take time and memory (RAM), and may as well cause CPU strain.
- Processing thousands of trips can take a while, especially if they are road ones. Use
--max-tripsto test on a subset first.
-
Route Mode Inference: The script uses a "smart" heuristic to determine if a route is Road (Bus) or Rail (Train/Tram/Subway).
- Priority 1: Keywords: It checks
route_id,route_short_name, androute_long_namefor keywords.- Road keywords: "bus", "autobus", "pullman"
- Rail keywords: "rail", "train", "metro", "subway", "tram", "ferrovia", "metropolitana"
- Priority 2: GTFS
route_type: If no keywords are found, it falls back to the standard GTFSroute_typefield.3= Road (Bus)0(Tram),1(Subway),2(Rail) = Rail
- Default: If neither matches, it defaults to Road.
You can edit keywords to your liking in the
route_modefunction.If your GTFS uses non-standard
route_typevalues (e.g. extended types like 700 for bus) or lacks clear names, you may need to edit theroute_modefunction inmain.py. - Priority 1: Keywords: It checks
- "ModuleNotFoundError: No module named 'pandas'": Ensure you have activated your virtual environment and installed requirements.
- Straight lines in output: This usually means the map matching failed for those segments. Check if:
- The OSM PBF covers that area.
- The stops are close enough to roads/rails.
- The correct
--modeswere enabled.
- Script crashes with MemoryError: Try using a smaller PBF (cropped to your region) or a machine with more RAM. Unfortunately, for big areas such as entire regions or even entire nations, not much can be done to not make your computer crash at build time.
CC-BY-NC-SA @Ciospettw