Monday 4 January 2016

Modelling Public Transportation Networks using Time Dependent Graphs using neo4j

Finding best routes in public transportation networks is a classic algorithm problem. In particular, there is increased interest in using graph compute engines for dynamic management of transportation networks.  A typical use case involves finding the shortest path from a starting station to a destination based on a railway schedule.  Conceptually this seems like an easy problem to models with stations as nodes, edges to denote routes and edge weights to denote trip times (similar to a road network). However, the nuances of train travel switching trains, transfer times and trains operating at certain times makes the quickest choice dependent on the time of the journey. Initial attempts to model train travel with models such as linking stations directly, storing schedule information in station nodes or edges, resulted in unwieldy graphs and poor performance especially when handling several thousand nodes and connections. Eventually, a time dependent graph model [1] seemed to be best suited for modelling this problem. Here the time-dependent graph is defined as G(S,R,E,W) where:
  • S is the set of station nodes
  • R is the set of route nodes corresponding to the stops each train from the set [Z1,Z2,...Znmakes in a route sequence [S1,S2,...Sn]
  • is set of edges connecting subsequent route nodes. The weight of each edge w(τ)=τA(i+1)−τA(i) which is the equivalent of waiting time at station plus travel time.
  • Transfer times: To model the switch of trains at certain stations, additional edges are created between the route nodes and the station nodes. Zero weights are assigned to edge for getting off the train and the transfer time is assigned to the edge representing getting on the train.
The time dependent model can best be illustrated with a simple example implemented using the neo4j graph database. Here, we model train journeys departing from Maidenhead to Paddington with some illustrative intermediate stops.


The blue nodes representing the stations ( created using the following script):


We can build the time dependent graph:


In order to find the shortest path, we can run a Cypher query for trains departing from Maidenhead to Paddington (here we ignore time of departure and possibly different fares on different trains):

In the above query, we search for all connections from the departure station “MAI” to destination “PAD”, aggregate the weights of each route including the transfer times and finally sort the aggregations in ascending order to give us the shortest time. 


The above example is obviously a over-simplification of a real-world scenario, but it gives a great foundation to model more complex scenarios such a time of departure, peak and off-peak fares, fast trains leaving later from the departure station but reaching earlier. I leave it to the reader to extend the model for complete real-world scenarios and additionally recommend reading [2]. In general, time dependent graphs are a great solution to modelling rail and flight networks. I found the neo4j UI to be a great tool for  visualising and testing preliminary graph models before embarking on a full-fledged journey to ingest and perform graph computations on real world data.


Reference:
[1] Pyrga, E., Schulz, F., Wagner, D., Zaroliagis, C.: Towards realistic modeling of time-table information through the time-dependent approach. In: Proceedings of the 3rd Workshop on Algorithmic Methods and Models for Optimization of Railways (ATMOS’03). Volume 92 of Electronic Notes in Theoretical Computer Science., Elsevier (2004) 85–103
[2] Delling, D., Katz, B., and Pajor, T. 2012. Parallel computation of best connections in public transportation networks. ACM J. Exp. Algor. 17, 4, Article 4.4 (October 2012),