Transmilenio Router
Project scope: optimal pathfinding over the full TransMilenio BRT network with a natural language interface. Context: Bogotá commuters losing hours to bad routing apps. Core problem: 141 stations, transfer logic, and a query interface that does not require a form. Status: v2 live.
The Challenge
The data is public. The stations, the routes, the service information, all accessible. The real blocker is the GTFS feed. Without it the graph is built by hand, which is fragile and time-consuming. The GTFS is not just a data format, it is the foundation that makes serious analysis possible: real travel times, service frequencies, transfer logic, accessibility data. Everything interesting about this project scales with the quality of that input.
The Approach
Python was the obvious choice for data work, graph operations, NLP prototyping and statistical analysis. The more important decision was how to model the system itself. When I encountered Dijkstra's algorithm I recognized the network immediately: a large set of interconnected nodes and edges where optimal pathfinding is a solved problem if the graph is modeled correctly. Other routing apps appear to handle this differently, which explains why they fail on edge cases, particularly when a user is already inside the system and needs to navigate between services. That gap is where this project starts.
What Was Built
v1 is a focused, solid program: it takes an origin and a destination, runs Dijkstra over the graph, and returns structured, useful route information. No extras, no noise. v2 adds the NLP layer: natural language input, fuzzy station name matching that accounts for the many ways someone might write a station name in real Bogotá Spanish, and returns the correct optimal result regardless of spelling variation.
The Outcome
The result is a CLI that in practice outperforms the current routing apps on the cases that matter most. The architecture also opens significant possibilities: multilingual support, accessibility layers for users with disabilities, and migration to friendlier interfaces that present route information simply and clearly without sacrificing accuracy.
What I'd Do Differently
Start with the GTFS feed. With it the analysis becomes complete: real schedules, linear regression models for travel time estimation, passenger volume data by station and hour to flag congestion points that affect service efficiency. The network also needs to stay current because TransMilenio updates and changes routes regularly, making a manually built graph prone to going stale. The data layer is the foundation. Everything else builds on top of it.