Big Data
A couple of weeks ago, when the Ashley Madison data hack was revealed, I felt cold shivers of fear run down my spine, although not for the reasons you may be thinking.
Perhaps I should remind some of you that Ashley Madison is the world’s largest online site for adultery. Or at least it was until hackers managed to breach the company’s digital security and obtain the names and addresses of all the people who held accounts there and proceeded to publish them on the shadowy ‘Dark Web’.
The result was a lot of humiliation and the collapse of what had been until then a multimillion dollar business.
So why should all this matter to someone like me from the rather less racy world of transport modelling? Because it is yet another reminder of the risks run by any organisation that collects or uses large amounts of personal information in the digital age. And as we enter the era of big data ever more deeply, that means us.
Much has been written and said both for and against the value of big data – especially mobile phone and GPS information – for the future of transport modelling, but to my mind the question of privacy is the one that most urgently demands an answer. Not because failures of digital security are possible but – as the NSA will testify as much as Ashley Madison – because they are inevitable. And when they happen they are catastrophic for reputations, for public confidence and for business.
There is no doubt that public concern about the collection of personal data, its uses and abuses, is growing. And industries such as ours face a possible double whammy of public anxiety, not only over data security but over the collection of personal information in the first place. Individual mobile phone users, for example, are often entirely unaware that their movements may be tracked and details passed on to third parties (us). It would be very easy to cast that sort of thing in a sinister light. After all, at least the users of Ashley Madison had agreed to have their personal details stored by the company when they signed up.
This all matters so much to me because I am not among those who think that big data is overhyped. To the contrary, I believe that we are not faced with a mere change in how we do things but with a revolution, and the only thing stopping some from seeing it is that they are not thinking radically enough. It is not just a question of using a new resource to improve models, big data can replace models. The data will become the model.
Many first order transport investment decisions like ‘do we need that new bypass or a rail/metro route? ’ do not require the complex behavioural models of the past. Apply the right software algorithms to a large, live dataset and move the ‘sliders’ to control the input assumptions about the supply networks and the database of observed travel patterns can be manipulated to show the likely effects of major network alterations without the investment in time and money that a traditional model would require. Mobile device and vehicle tracking data tells you more about travel patterns than anything else yet known. There are biases and gaps but if we combine it with a good household survey, automatic traffic counts and public transport boarding data, we can account for those and change the world. Our little bit of it anyway.
Many in the industry, including some in my own organization SYSTRA, are more sceptical. They point to the sometimes excessive price for the data, the tendency to weight in favour of larger, more expensive vehicles( and therefore higher-income individuals), the black spots missed by mobile phone coverage, the danger of ever-increasing complexity and the blindness to travel purpose. These are all important points, but to my mind they are teething problems. Big data is going to become richer – there is almost no limit to the information that may potentially be available on individual transport users – software will improve, prices will fall, users will become more skilled. And it will all add up to better, more accurate, faster models which in turn will mean better transport planning decisions and that means better towns and cities for future generations to live in.
That’s why it is essential that the industry comes to terms with the privacy problem now, before a scandal erupts that will force us to. But who is prepared to stand up and make a clear public case for the scale of data collection that is already happening and is set only to grow? Who is really confident that they have a strong story to tell when a scandal eventually blows through the transport modelling universe and trigger-happy legislators reflexively respond to public dismay with attempts at blanket bans? Pointing at backside-covering exercises in box-checking, small print consent and muttering about anonymisation just won’t do.
I think the big data future is bright but if we let it become endangered by failing to learn the lessons of the digital privacy disasters playing out all around us, we will have only ourselves to blame. We need to act now, to come together and work as an industry to agree rigorous standards for data collection and use and to communicate them not only among ourselves but to the public and policy makers. The work of transport modellers affects millions but is little understood in the wider world. We need to make sure that when they find out who we are and what we do it is not on the front pages of the tabloids.