Aquius at the University of York

Aquius – An Alternative Approach to Public Transport Network Discovery

As the complexity of public transport networks grew during the 20th century, so did the inventiveness of the attempts to communicate those networks to users. Angular schematic maps, in the form of the London Underground map attributed to Harry Beck, have since become common for core urban and suburban public transport networks. Since at least the 1990s these maps have infected service design, with high frequency bus networks increasingly structured to be readily communicable as stylised network maps – inevitably limiting the range of direct destinations offered. Inter-regional networks necessarily remain complicated, and, as illustrated by various European attempts at national network flow maps, are challenging to communicate in schematic form. At least on paper.

New technology tends to first blindly duplicates prior rationales (allowing a fixed paper network map to be downloaded), and only later is properly addressed to the underlying demand (network understanding). Yet that final stage has not obviously occurred for public transport maps. Computer science has tended to focus on eliminating the need for network maps by providing bespoke journey plans: A trend set to continue as “Mobility as a Service” seeks the algorithm to rule them all – naively oblivious to Leibniz, Gödel and the underlying fluidity of humanity that necessarily rejects perfection. Meanwhile conventional fixed network maps continue to give transport operators and public agencies control over how their service, and their wider role in society, is perceived: National railway line maps are common examples, apparently demonstrating an equality of state-sponsored connectivity between places that may not actually be reflected in service provision. The implication is that neither commercial funding nor societal politics provide a motivation to evolve the traditional public transport network map. Consequently the task falls to the realm of creative curiosity. As a former employee of (the forerunner of) London Underground, that much would be familiar to Beck.

This review introduces the conceptual approach of Aquius, describing dataset creation and limitations. The use of Aquius to assess human connectivity is compared to interchange/schedule-based models, with some examples of frequency-based connectivity in practice. Finally consideration is given to how to evolve the current code into something more complete and sustainable.

Here + Us

The idea of reinventing the humble public transport network map was born in the early 2000s, when probing the first batch of British ATCO-CIF public transport electronic schedule data – but such data was not then Open (public and reusable), and internet (especially Javascript-based) mapping was still in its infancy. Constraints that have since evaporated. “Aquius” – a Spanish-English hybrid of “here” and “us” – was originally developed to help understand the Spanish railway network. Aquius has since evolved to cover a wide range of scheduled public transport:

  • From dense urban networks such as Paris and New York, whose scale and complexity cannot otherwise be conveyed through a single fixed map.
  • To the cabotaged international network of Flixbus, where pickup and setdown conditions vary from place-to-place such that a fixed physical map of passenger journey opportunities is infeasible.

Conceptually Aquius is half-way between the two prevailing approaches to public transport information:

  1. Conventional journey planners are aimed at users that already have some understanding of what routes exist. Aquius summarises overall service patterns.
  2. Most network maps are pre-defined and necessarily simplified, typically showing an entire city or territory. Aquius relates to a bespoke location.

Aquius is premised on mapping the network from here, where that here is defined by the user. By default the user simply points at the map, although the underlying code can be hacked to allow more specific methods of geographic selection. Any service that departs from here is then automatically rendered on the map as a sequence of links between the nodes (stops or stations) it serves after here.

The width of each line is related to its service level (frequency), such that the most common journey opportunities are the most visible. Further basic information about the services within each link, such as headcode or operator, can be viewed by clicking on the link. More specific information, such as a timetable, is affected by a URL link to an external website. Optionally demographics can be assigned to those nodes to indicate the “us” – the people that the links connect the user to, where population is a simple proxy for all manner of socio-economic activity and facilities.

While Aquius’ approach appears simple, in practice Aquius needs to manage a wide range of potentially complicated “edge cases”, such as looping circular and lasso bus routes, or trains that split into portions mid-route or convey local journey seats in the course of a longer-distance vehicle trip – all of which must be rendered and counted to avoid the appearance of duplication.

With no server-based queries to delay user interaction, network discovery is plausible by trial and error. That changes the entire user dynamic from being told, to playful learning – encouraging the child-like gaming behaviour that closely aligns to how we physically experience the world: The ability to reach in and grab whatever is of interest.

Dataset Creation

The Aquius client (the user’s map interface) loads a Javascript library into a given location on a webpage. That library is under 40 kilobytes in size, plus another 40 for the mapping library called Leaflet – net less data than the average webpage picture, although requiring the browser to perform more computation and thus likely slower to render. As currently structured, the client also loads a single dataset which describes the entire queryable network. For small cities or limited networks (such as York or Renfe) this dataset is typically 100-200 kilobytes in size, while larger networks (such New York City or British Rail) can reach a megabyte uncompressed. A well-configured webserver will compress such datasets by 3-5 times, because datasets are JSON files that consist of inefficient (in byte terms) text and formatting, so actual load times should remain acceptably quick over most modern internet connections. Once loaded the client can process a dataset almost instantly – local here queries generally execute within 10-20 milliseconds (well within human reaction times), while especially large areas may take a hundred (risking only slight interface lag).

Datasets must be pre-built. Most of the networks described here have been built out of freely available General Transit Feed Specification (GTFS) files, or via conversion from similar electronic data (such as Transxchange) into GTFS and hence into an Aquius dataset. In most cases conversion from GTFS to Aquius takes a matter of seconds, although current processing is particularly dependant on available operating system memory, and excessively large GTFS archives (such as Île-de-France, which consists almost a gigabyte of raw data) may need to be processed in chunks, with the resulting Aquius datasets then merged.

In practice the most time-consuming aspect of building an Aquius dataset lay in its configuration to ensure a high quality of output. For example, the York dataset is built via Transxchange files, an operational format with little customer-friendly service information. Consequently the structure of headcodes (service numbers) needed to be sanitized to match what the public would see, and route colours and URLs needed to be added manually. Britain has standardised node locations, but many networks do not. For example, different agencies within (administratively less centralised) Barcelona manage slightly different sets of geographic locations for the same bus stops, which (without proper sanitization) would tend to result in 2 or 3 slightly different sets of links along the same stretch of road – precisely the links that Aquius tries to combine together, but cannot do so without the nodes sharing the same coordinates. The conversion tool contains various settings to solve common problems, but these require the dataset creator to understand the raw data they are processing. Repeat processing of data is thus generally very fast and theoretically easy to automate, but the initial configuration will often require work by someone with an element of (transport-related) expertise.

While operators and agencies generally make electronic data available as schedules, Aquius itself does not necessarily need schedule information, since in most configurations Aquius simply counts services across a day or week. Consequently Aquius datasets can also be built from map traces or geospatial plots – simple GeoJSON points, lines and boundaries with appropriate descriptive metadata. While this might be a laborious method of creating a large or complicated network, the option is useful where no electronic data exists. The option also allows unimplemented service proposals or modifications to outlined relatively quickly (in comparison to scheduling), as illustrated for a selection of proposed network changes in Barcelona (which combines GTFS-extracted data for current services with hand-crafted alterations into a strategic “vortex” grid map).

Datasets may be built with pre-defined filters – currently network/product (including operator or mode) and service periods (day and/or time bands). The dataset creator is charged with only presenting appropriate filters, preferably filters which match the characteristics of the network. For example, Spanish urban networks typically contain no substantial variation in service pattern or headway across a given day, and thus establishing a peak/commuter hours filter would be quite pointless. Note that for clock times, the service period filter is applied only at the time in the middle (by duration) of the vehicle’s trip, so is always an approximation and cannot specifically reflect any given here query – which may be near the start or end of the vehicle’s trip. The use of detailed time bands would be inappropriate on long distance networks where vehicle trips travel all day.

Limitations

Aquius maps straight-line links, not the precise geographic route taken along roads or railways. This allows services that do not stop at all intermediate stops to be clearly differentiated. It makes it technically possible for an internet client to work with a large transport network, since it need only reference point coordinates, not potentially complex intermediate shapes. It also makes it possible for Aquius to work with schedule data that has been distributed without full routing details (which is typical of rural and inter-regional networks). Aquius is however limited to conventional scheduled public transport with fixed stopping points.

Aquius summarises the patterns of fixed public transport networks. It presumes a degree of network stability over time, and cannot sensibly be used to describe a transport network that is in constant flux. The Aquius data structure allows filtering by time period, but such periods must be pre-defined and cannot offer the same precision as schedule-based systems.

Aquius only shows the direct service from here, not journeys achieved by interchange. Displaying all possible interchanges would ultimately results in a map of every service which fails to convey what is genuinely local to here. Aquius’ intrinsic simplicity emphasises network understanding, which supposes a different rationale to that which requires multi-stage trips be automatically chained together, as further discussed in the context of connectivity below.

As an internet/mobile application, Aquius is already limited by the filesize of its datasets, which are currently always loaded as a single file. Aquius was designed to support the biggest single networks in the world. Multiple large local networks can potentially be aggregated into a very large dataset, but such a dataset may imply unacceptably long load times for online users. As a concept, Aquius is not necessarily so limited: With further code rewrites, fragmented datasets could theoretically be loaded to match the user’s choosen viewpoint. Current Aquius (user) client memory usage is not yet a constraint for a modern computer or mobile device, although ultimately some combination of processing, memory and map canvas rendering will start to slow performance as the volume of data grows. An Aquius-for-the-world would surely require some clever engineering to manage these limited operating system resources.

Aquius is conceptually inaccessible to those with severe visual impairment (“blind people”), with no non-visual alternative available. The basic problem is that of communicating spatial information to those with a rather different spatial understanding, and genuine solutions (such as soundscape represenattions or playing the map like a text-based MUD) would imply substantial new coding. Fortunately alternative network representations can still be made available using prior technology, such as screen-reader friendly timetables or tactile maps.

Connecting People

Population is a proxy for all manner of socio-economic activity and facilities, measured both in utility and in perception. While imperfect – modern commercial activity does not always correlate to residential population – population is typically the dominant determinant of passenger transport markets, and so the most helpful single variable when understanding the role of passenger transport networks.

Each node (stop or station) can be assigned to one place within the Aquius dataset. Places are intended to quickly summarise local demographics. The scale of place geography should broadly reflect the natural catchment or hinterland of services in the network. Typically administrative, census, or postal boundaries are used. The population counted by Aquius is simply the sum of the population of all places containing one or more nodes (stops, stations) linked to here (by the services shown), including the population of here itself. This method is too crude to analyse any one node accurately – for example, a node close to the boundary between places may reasonably take passenger from a portion of both places. However across a whole network such skews tend to average out. For comparision, Barcelona (Àrea Metropolitana de Barcelona, AMB) has been analysed using two quite different place geographies:

  • 188 Administrative units (view network): Barris within the Ajuntament de Barcelona (inner city) and Districts within peripheral municipalities. These units broadly reflect societal notions of locality, and thus are not of equal area or population.
  • 463 Grid squares (view network): Fixed squares of 0.01 (global) degrees in width, assigned population using a grid-based Quadtree analysis. These squares are (almost) perfectly uniform in scale, but have no relation to any societal notion of locality.

The population connected to each defined place has been calculated using the Aquius codebase. The process is identical to the standard interface except here is defined as one place, not a circular area. The number of (AMB resident) people connected at each place has then been plotted against the cumulative population (of AMB), to allow the results of the two geographies to be fairly compared. While there are subtle differences between the results, both patterns are remarkably similar in spite of the use of very different base geographies. The implication is that so long as the scale geography approximates to the scale at which people access the network, the structure of that geography is relatively unimportant overall.

AMB Public Transport Connectivity by Geography
AMB Public Transport Connectivity by Geography: Comparison of results for administrative units and for grid squares. Analysis includes data from AMB, FGC, GenCat, Idescat, Renfe, TMB and TRAM.

Aquius’ numeric connectivity analysis is particularly useful for benchmarking similar styles of network in completely different locations – comparison between different cities, or even different countries. The example below compares the national passenger railway networks of Great Britain and peninsular Spain inter-regionally (note the graph below relates regional demography, not the more local geography visible in the linked datasets). As above, connectivity is plotted against cumulative population, but with values expressed as percentages because the absolute best connectivities and national populations differ.

Inter-regional passenger railway connectivity within Britain and Spain
Inter-regional passenger railway connectivity within Britain (Network Rail) and Spain (Renfe), using NUTS 2 regions (Inner London has been combined). High speed networks (AVE, HS2) favour the best connected regions. Analysis includes data from Renfe, RSP and Eurostat.

While no two networks are perfectly comparable, the inter-regional intention of both national railway networks is somewhat similar (for example, both capitals – London and Madrid – offer direct connections to every region) so comparison is reasonable. The annotaions indicate how Spain’s partial high speed network (AVE) appears to contribute to a distortion of national connectivity – a festering political problem for a (currently) state operated network that is popularly expected to be equitable in its delivery. The Café Para Todos sequence of essays discussed Spanish railway connectivity in great detail, but this comparison to Britain is new. Alternative forms of analysis are of course possible, such as mapping Flixbus’ European network by region (which suggests Flixbus retains much of their founding bias towards Germany).

Interchange and Time

Much conventional connectivity analysis is schedule-based, potentially allowing multi-stage journeys (using more than one service to complete a journey) and in-journey interchange. Such approaches typically derive from two presumptions:

  1. The importance of large urban networks, where interchange is often required (as exemplified in the next section). Interchange improves the engineering efficiency of the network – allowing high-volume mid-distance modes, such as rail, to aggregate passengers for multiple origins and destinations. However multi-stage trips with interchange can be inefficient for passengers, often introducing further variability (on most urban networks each stage is subject to a variable wait time period) and negatively impacting people with reduced mobility or intolerance to urban waiting environments. Outside large cities, interchange is at best a secondary consideration: Rural and inter-regional public transport markets are typically defined by the availability of direct services. (The contemporary importance of individualised “micro” transport – bicycles, hoverboards – in cities logically counters such engineered mass public transport networks, although as discussed in the introduction to Is Alta Velocidad Fast?, public policy tends to counter-balance such individualism by placing even more emphasis on engineered networks. It is notable that in its most recent sequence of network changes, Barcelona has further increased the emphasis on interchange to become more like New York.)
  2. The dominance of the valuation of time in econometric analysis, where passenger decisions are normalised to the value of the time spent travelling, on the assumption that rational economic agents seek to minimise such time. Therein often lay a utilitarian presumption of activity at a specific place. Even where journey time is important in mode choice decisions, that may not necessitate modelling it in detail: In a given geographic environment, simple assumptions may be reasonably made about the differing speed of modes such as car, bus and bicycle. Likewise while precise scheduling may appear important in certain cases, such as arrival at work or hospital by an appointed time, when timing is so crucial passengers will tend to over-compensate for (worst case) variability – and their assessment of overall service delivery may be strongly influenced by service frequency. Finally many transport decisions – especially long run decisions such as home location or car ownership – are based not on the ability to make a specific journey in a specific time, but on the general availability of modes and services.

In Britain, MVA/Systra’s Accession (which became the standard in the early 2000s) incorporated so much scheduling complexity (multi-stage journeys across multiple modes) that the model could take hours to produce results. Peter Davidson developed a frequency-based model, but it did not flourish. The subsequent strategic national accessibility model worked more efficiently over large datasets, but its results could still be extremely hard to explain locally, because none of the contributing (mostly) bus service patterns were visible. So while such accessibility modelling may have appealed to a centralised British state with an organisational need to quantify everything, such methods have had minimal takeup among non-experts. In this context, Aquius’ frequency-based connectivity analysis provides several advantages:

  • Efficient means of analysing perception-driven decisions, such as the general availability of services, which tend to dominate in the long run. And in politics.
  • Easier to explain and confirm the results visually, since the associated links can be rendered on a map. Not a “black box” model that requires trust in the expert.
  • Fundamentally faster calculation, which in turn (theoretically) allows planners to experiment with more options or analyse more details.

Overall, Aquius’ intrinsic simplicity provides greater emphasis on understanding networks. While Aquius’ method may lack the expected sense of mathematical perfection, Aquius still manages to convey the important patterns, and does so in a manner that can be shown to the non-expert. Analysis of Spanish railways (with a slightly more complex variant of the method, which also assigned municipalities without stations) correctly identified all the provinces that routinely complain about their poor railway connectivity – validating the network in its political context.

Connectivity by Frequency

Connectivity analysis can further factor the population linked by the service level linking it. The method used attempts to capture broad differences in network perception – for example, that 14 trains per day from London to Paris is considered a “good” service, while operating 14 daily within either city would be almost imperceptible. The Connectivity slider can be moved to reflect one of three broad service level expectations (in addition to the unfactored “any”), the defaults summarised in the table below. The formula used is: 1 – ( 1 / (service * factor)), if the result is greater than 0, with the default factor values: 2 (long distance), 0.2 (local/interurban) and 0.02 (city). There is no particular rationale for any of these factors, other than the results they produce “feel about right”. Remember, the aim is not to define an absolute truth, rather to provide some options to assist analysis. In practice where the network is consistent in design the relative results produced by each factor differ little, as this section will illustrate.

Example Connectivity Factors applied to Populations
Frequency Expectation Service Total
0% Minimum 50% Factor 95% Factor
Long Distance (low frequency) 0.5 1 10
Local/Interurban (mid frequency) 5 10 100
City (high frequency) 50 100 1000

In a small city, such as York, these factors can make a tangible difference to how connectivity is assessed, because different approaches have been adopted to network design within the city: Infrequent services provide a wide range of peripheral links, while frequent services tend to focus on the city centre. This pattern is reflected in the analysis graphed below, which again compares the connectivity to cumulative population. The “average” connectivity (shown in the legend) is calculated by multiplying each local area’s own connectivity by its own population, summing the results, and expressing that sum as a percentage of the total population. 100% would indicate all local areas have equal connectivity.

City of York Bus Connectivity by Frequency Expectation
City of York Bus Connectivity by Frequency Expectation: Connectivity is expressed as the percentage of the Ward (local administrative area) with the best connectivity, against the cumulative population of the city. The graph thus describes how well the rest compares to the best. The percentages in the legend are the “average” connectivity, described above. Analysis includes data from Traveline and ONS.

Large cities are generally better connected than smaller settlements in absolute terms because they house more people, but when considering how well a city is connected to itself, large cities often yield poor direct connectivity (of the type modelled by Aquius). Such large city networks tend to emphasise high frequency networks that require interchange. As shown in the table below, for the three large cities of Barcelona, New York and Paris, average connectivity is very similar regardless of the frequency assumption: In practice, direct services either exist or not, and if they exist they are likely to exist at high frequency. Although not shown, the distribution of population within these average connectivities is broadly similar in all three cities, and broadly reflects the pattern of York’s high frequency connectivity – the top 5-10% of the population are significantly better connected than the remainder. While Paris is a city famed for its centralism, the pattern is less obviously expected of Barcelona and New York.

Connectivity by Frequency for Sample Large Cities
City Population (million) Average Connectivity by Frequency
Any Low Mid High
Barcelona (AMB) 3.2 42% 42% 40% 35%
New York City 8.2 37% 37% 38% 35%
Paris (Petite Couronne) 6.7 33% 33% 32% 29%

Dive In

Aquius’ code is hosted and technically documented on Github. It is available under an MIT License that allows a wide range of “open” reuse, including commercial projects.

To get started, follow the quick setup. Sample datasets (whose license conditions differ) can be found in a separate AquiusData depository. Live demonstrations of most of those sample datasets are available, alongside hosted versions of dataset creation tools: GTFS, GeoJSON, and Merge. The only tool that is unpublished is that to perform bulk queries, which programmers can script by using Aquius as a stand-alone library.

Be warned that while Aquius is maturing as software, its status is still a pre-release “alpha” – proof of concept, not polished product: There are known (and likely unknown) bugs and logical weaknesses in the output, most relating to unusual service patterns and unconventional data arrangements. If the network involves complex variations on loops or over-use of setdown/pickup restrictions, proceed with caution. Likewise there are inadequate checks on data integrity – erroneously structured GTFS archives may yield unexpected results.

Where Next?

As noted above, in its current form Aquius serves to prove the concept works technically. But while, in most cases, the code functions accurately, internally the project bears all the scars of problems solved as encountered – not a project coded to a neat pre-defined specification. The main client library still has obvious deficiencies, not least an inability to load datasets as required, and thus allow one client to shift seamlessly from place to place. Some of the here-query code has become extremely convoluted, especially in the processing of setdown/pickup restrictions, much of which was added retrospectively. Internally, the GTFS converter is a work of chaos, with numerous fixes to handle unexpected conditions, resulting in a program flow that can be difficult to follow. None of the code is organised as expected by modern Javascript environments, notably Node, the logical platform for automated and bulk processing. Even the data structure masks annoyances. For example, the default direction is both (“mirror links”), but this mostly describes metros, links which represent a tiny proportion of most dataset filesize, which would consequently be minimised if the default was uni-directional. Rarely used reference data, such as stop names, often bloats dataset files, but in most cases this information is not required until a popup box is displayed, and could be filled by querying a remote server.

The work implied in a more precise technical implementation – especially if involving someone with actual programming expertise or if committing to maintain a wide range of datasets – implies putting some element of the project on a slightly more sustainable (presumably commercial) footing, but the best approach is unclear:

  • Silicon Valley is awash with transport “tech” startups which often transpire to be remarkably uninventive technologically, directing much investment into physical mechanisms (like bicycles) that are far from revolutionary, while trying to exploit loopholes in civic or employment regulation. Aquius is too basic to simply exploit such a market of speculative hype.
  • A commercial strategy based on technology licensing (or equivalent implementation services) would be difficult, because not only are Aquius-like methods easily exposed and copied, but most potential licensees are large (especially state) organisations whose contracting regimes tend to distrust small providers. Technology startups (for example those selling housing, where potential buyers may reasonably wish to review unfamiliar local transport provision – a variation on Walkscore) might provide a niche, but perhaps not a big or dominant enough niche to commit to maintaining datasets.
  • Local public transport is too cheap to support a commission or advertising financial model, and so as with schedule information, provision of public-facing network maps logically falls to either operators (and related public agencies with an interest in promoting public transport) or data aggregators (who might merge such networks into their wider data services). Both naturally address the long run requirement to maintain and update datasets. However, as noted in the introduction, had either been motivated to use something like Aquius, they should already be doing so.
  • Maintain free datasets and related creation tools, but sustain those via a related niche, such as the provision of network planning or analysis tools. Although not yet implemented as such, Aquius’ underlying structure makes it extremely easy to tweak or modify existing network patterns and analyse the results strategically. While such requirements exist within the fields of planning, consultancy and research, none is obviously likely to sustain a business that isn’t already being provided for – and many of those currently providing for this niche are just as big and potentially unwelcoming as the operators and state agencies they serve.
Advertisements