Thanks to that device in your pocket, even in unfamiliar surroundings, your location never has to be a mystery these days — to you, or myriad other interested parties. There might be some dystopian outcomes to fret over (more on that later), but for the people who are charged with making efficient use of infrastructure dollars, tracking, and improving safety outcomes and optimizing road and transit operations that fact is world changing.
“Big data” is quickly becoming the basis for transportation planning, prioritizing infrastructure projects, and adapting to new urban mobility options.
The “location-based services” on your smartphone are a significant component of the big data that is quickly becoming the basis for transportation planning, prioritizing infrastructure projects, and adapting to new urban mobility options, from scooters to electric vehicle charging. You contribute to that big data whenever you opt to share your location to make use of, say, apps for weather, travel, or myriad other services. Location-based services integrate data from Global Positioning System (GPS) satellites, cell tower signals and Wi-Fi pings to track the user’s geographical location. With enough computing power, some savvy algorithms and machine learning, and honed with in-field verification, the gazillion location data points can be used to infer where people are traveling to and from, when and for what purpose and by what mode.
In just the last few years, for example, this big data approach has begun to be used to:
- Help shape walking and biking networks so as to optimize their utility and eliminate deaths and injuries;
- Power a clean-slate redesign of Boston’s century-old bus network;
- Provide real-time information on bus arrivals and crowding, and even offering personalized options for riders of New York City’s bus system;
- Support efforts to coordinate transportation and development to minimize climate damage and create less car-dependent places; and
- Help transit agencies adapt quickly to shifting ridership demands during the pandemic.
“The old saying in the transportation world is, ‘What doesn’t get counted doesn’t count,’” said Jennifer Dill, director of the Transportation Research and Education Center at Portland State University in Oregon, whose institute is at the forefront of evaluating use of big data. For state transportation departments, federal funding depends on reporting data about vehicle counts, travel times and traffic fatalities, among other data. For local jurisdictions, being able to track progress on, say, pedestrian safety or “mode shift” from cars to transit or biking requires data on people walking, biking, and taking transit.
Traditionally, collecting travel-related data has been time and resource-intensive. Counting vehicle traffic has required pavement sensors or roadside radar-style devices at a few fixed points. Counting people walking and biking — when it’s done at all — also requires periodic, point-in-time counts by in-person observers and traveler surveys. Noting that “new technologies and new data seeming unrelated to vehicle travel have been explored successfully to characterize vehicle travel,” the Federal Highway Administration (FHWA) has begun to accept the use of “passive,” big data in many cases. The FHWA also has launched a multi-year, multi-state study to further advance and calibrate big-data practices “to decipher traffic volumes and other movement data such as origin destination data and modal share data.”
Recognizing that most transportation agencies and planning boards don’t have the computing power and expertise to process so much data, even though they have the need for it, consulting firms and for-profit companies have sprung up to fill the void. Cambridge Systematics, a national transportation consulting firm, has developed a big data product it calls LOCUS that can help users track everything from freight movement to local traffic patterns. One of the earliest and best-known private firms in the field, Streetlight claims it tracks travel “based on over 40 billion monthly location records across the country, collected from smartphones and connected cars and trucks,” using algorithms to “draw on 365 days of data on more than four million miles of roadway.”
Early on in the discussion about whether big data could match, augment or surpass direct counts of people driving, walking or biking, much of the concern centered on the demographics of the population that carried smartphones. Today, only 15 percent of Americans don’t own a smartphone, according to the Pew Research Center, though that share rises to about 25 percent for those making less than $30,000 a year.
But Martin Morzynski, senior vice president of marketing at Streetlight, argues that smartphone saturation at all income levels yields a sampling massive enough to infer the travel of nearly everyone. “We’ve gotten past the issue of whether the data is representative,” Morzynski said.
Transportation officials have the tools to learn whether low-income residents are being shortchanged in terms of service and access or disproportionately harmed on our streets and highways, he noted. “The issue is how you get agencies to use the data in ways that are equitable.”
Sparing Pedestrians and Making Bikes Count
For a variety of reasons, jurisdictions across the country have found increasing need to have a better handle on how many people are walking and biking in particular places; who they are in terms of gender, race, age and income; what their needs are in terms of access and whether and to what degree better street design can save their lives and prevent injuries. Local officials in dense urban areas are trying to respond to residents demanding an alternative to cars and buses that get stuck in traffic, even as they seek to curb growth in vehicle travel as a climate mitigation measure. In other places, residents are demanding complete streets that are safe for walking and biking as a key aspect of quality of life. At the same time, the number of deaths and injuries to people walking and biking has been trending in the wrong direction in recent years.
State transportation agencies have had little incentive to count anything but vehicles, because that’s what their federal funding has depended on, said Josh Roll, a research scientist at the Oregon Department of Transportation (ODOT) with a focus on active transportation and safety analysis. That is beginning to change, however. “ODOT, to their credit, has adopted new key performance measures that include increasing bike and pedestrian miles traveled. To get there, though, we have to be able to track it.
“With big data and machine learning we hope to be able to develop algorithms to demonstrate, say, where safety risks are higher for people walking and biking,” Roll said, noting that most states and local jurisdictions are just beginning to closely track biking and walking. “We want to leapfrog over the traditional methods past to full system monitoring capability using these mobile data sources.”
Researchers at Dill’s center recently collaborated with others in Texas and North Carolina on a study for the National Institute for Transportation and Communities that explored the possibilities and limitations of using crowdsourced data to augment or replace physical counts.
ODOT’s Roll sat on the project’s technical advisory committee. The team investigated three big data sources — the Strava bicycling app, Streetlight, and GPS figures from local bike-rental systems — in six cities (Boulder, Colo., Charlotte, N.C., Dallas, and Portland, Bend and Eugene, Ore.). They sought to compare and contrast the accuracy of each versus physical counts, while also looking at the potential to the data sources in combination.
While providing significant help in rounding out the picture, each digital source had its limitations. Strava is used primarily by recreational cyclists less interested in transportation, than taking long rides both within and outside the city. The data from bike rental systems — such as Biketown in Portland — is skewed toward the hubs where bikes are made available. And Streetlight’s algorithm is still being perfected to be able to tell bicycle riders from those in slower-moving vehicles in congested areas.
No one source is perfect, but in combination the results are pretty good.
“No one source is perfect, but in combination the results are pretty good,” Dill said. Big data will probably be most useful for tracking trends between bike counts, she said, but cities will probably have to continue to make observational tallies in particular locations for the foreseeable future.
Dill hopes big data options will continue to improve, especially for walking and biking. “It’s really important in how we analyze safety. We have good data on who gets killed; the data on injury is harder, but there is some,” she said. Some researchers are exploring whether anonymized Medicaid data could provide more insight into where and to what extent low-income residents are hurt on our streets, she noted. “But to really analyze for safety, you want to know the exposure — the injuries per mile of walking or biking. As more people walk or bike, the total number of injuries or death will probably go up with increasing numbers of travelers, but you want to know the rate. It’s also really important for providing arguments for better, safer infrastructure. So, you might also want to know relative volumes with protected vs. not-protected lanes, trails vs. on-street.” In other words, better and more ubiquitous counting methods continue to be in demand.
Re-Imagining Boston’s Bus Network
Thanks to the availability of big data, the Massachusetts Bay Transportation Authority (MTBA) recently undertook something that had not happened since buses began plying former streetcar routes nearly a century ago, and probably would not have happened otherwise: A clean-slate redesign of Boston’s entire bus network. The redesign was intended to better serve those who rely most on the bus, by adapting to the shifting concentrations of lower-wage workers and their employment destinations and making it easier for seniors and others to get to newer destinations for health care and other services.
Massachusetts Bay Transportation Authority undertook a clean-slate redesign of Boston’s entire bus network.
MBTA hired Cambridge Systematics (CS) to help service planners scour location-based data, and using some algorithmic alchemy, identify the origins and destinations for millions of trips and sync them up with demographic data. David Baumgartner, a data and transit planning expert at CS, described the process: “We developed a semi-automated procedure to divide the region into over 800 small zones, each representing a walkshed for a bus stop. Then for each that remained, we attached the cellphone location-based services data to it to see how people traveled, where they were traveling, regardless of mode,” he said. The team used a computer program to line up strings of the zones to create every possible bus route: 14 million of them.
“We filtered out the routes that were too short or long, circuitous, or duplicative.” The team then took all the potential routes and assembled them into networks of 10-20 of the most high-frequency routes, and scored them as to how many they served. “This generated about 90,000 possible networks, and we ranked them by how many people they served and how many of them were low-income and minority.”
Big data allows regions to better plan for the next 20 years.
Possible routes were screened through yet another big data exercise, a map of every street in the city considered “busable” — an apparent first among transit agencies. Boston not only features a famously tangled street grid but also has many roadways that are narrow, steep, in poor condition or otherwise unusable by buses that typically measure 40 feet long.
After the rounds of screening, planners identified “a handful of networks” of 10 to 20 high-frequency, core routes, those offering service at 15-minute intervals or better throughout the day, Baumgartner said. Further refinements and ultimate selection were then given over to more manual, human evaluation, including a robust public outreach effort that had collected more than 20,000 comments when that effort closed in July 2022. The MBTA estimates that the new network, which will begin implementation in summer 2023, will result in 25 percent more bus service, and 70 percent more weekend service. In addition, 275,000 more residents will be near high-frequency service and 115,000 residents of color and 40,000 low-income households will gain access to high-frequency service.
Big Data in the Big Apple
What any regular transit user will tell you — and independent research confirms — is that the most important factor in rider satisfaction is transparency: the operating agency letting them know what to expect in terms of wait times and sharing information about delays or other operational hiccups. “The reasons for a delay could be anything from availability of drivers to traffic or a technological breakdown,” said Sheldon Brown, transit software expert at CS. “But it makes a difference between having another cup of coffee or getting soaked at a bus stop because you thought the bus was coming, and it was 15 minutes late.”
Before 2011, the largest bus system in the country (New York City) provided no real-time information about bus arrivals, “Believe it or not,” Brown said. “Previous attempts in the 2000s had failed miserably.”
But by the beginning of the 2010s, an open-source platform for real-time bus information called One Bus Away arrived. That software integrates GPS location information, bus speeds calculated by “dead reckoning” and other data to predict when a bus will arrive at any given stop. The Metropolitan Transit Agency (MTA) partnered with a team of experts, including Brown, to adapt that platform to become a phone app dubbed Bus Time. In addition, the MTA provides the real-time data to third-party mapping and trip-planning applications, and also uses the data to measure system performance. Fulfilling the 20 million user requests for updates during peak hours requires several thunderheads of “cloud” data storage. Using “dynamic scaling,” the system consumes the capacity of up to 30 computer servers during the busiest times, declining to as few as two in the wee hours.
In 2019, just as Covid was arriving, the MTA began adding information about bus crowding to its real-time data stream, using onboard counters plus riders’ cell data. “As part of One Bus privacy policy we don’t use cell data unless you opt into a survey,” Brown said. “The app lets a rider see whether the bus is full and whether it’s worth it to wait for the next bus.” This turned out to be a godsend during the pandemic, when riders were trying to avoid close contact with others, and it helped the agency manage demand in real time, eliminating or adding trips where necessary.
The next frontier, for those who opt in to sharing their smartphone location data, is a feature now available in a pilot program that pushes personalized recommendations about alternate routes for faster travel based on conditions. “The system will learn that you take a particular route every day, but if there are delays and another route or other options will get you there quicker it will suggest that to you,” said Brown.
Big Data Caveats and Dystopian Concerns
As any smartphone user who follows the news will be aware, companies from ad agencies to widget sellers want your data, and they’re willing to pay for it. The fact that a market for location-based data exists is exactly how it is available for transportation data providers to harvest and process it. What does it mean, then, for a public agency to buy data from companies that profit off individual data. “The challenge for agencies is these companies are coming and trying to sell the data and they don’t always know what they are buying,” said Dill. In another wrinkle, private companies often are incorporating public data into their own algorithms and data products. “As we get into this world,” said ODOT’s Roll, “we want to move forward in a way that protects people’s privacy and doesn’t provide an incentive structure that might violate people’s privacy and security.”
Perhaps a bigger worry to some is the potential for cell phone data to be used to restrict how and where people travel. “I remain pretty optimistic about the possibility of more informed infrastructure investments and public input into improving operations,” said Kevin Webb, director of Shared Streets, a nonprofit initiative in partnership with the National Association of City Transportation Officials. “But I am concerned that it seems to be leaning more toward intervention in the way people are traveling. A move by the city of Los Angeles to use cell phone location services to control the speed and operation of scooters in the city prompted a lawsuit, in which Webb filed an amicus brief.
The problem, he said, is that cities are starting down a slippery slope by using big data and location services to restrict where people travel. While “geofencing” scooters or ebikes out of neighborhood streets or parks may seem benign at first, he said, he worries it will lead toward efforts by those with money or influence to gate off public streets to some classes of people, among other less-than-equitable outcomes. In Raleigh, N.C., where he lives, he recently found that the power to his rental ebike was cut remotely as he approached a park. “It makes me wonder about the city’s priorities. We haven’t done anything to make the streets safer, improve the access or make better transit service — we’ve just created a troubling precedent of controlling people’s movement.”
With proper safeguards, though, big data could be invaluable for calibrating forecasting models in something close to real time and tracking progress as we work toward a more sustainable future. “The way a region plans for 20 years or more is by using these computer models that make lots of assumptions about how people get around,” Dill said. “Traditionally, they really only incorporated cars and transit, and they were updated very infrequently. In the last decade they’re doing a better job with biking and walking modeling, in part thanks to new data sources. All this data can be used to develop better models, so if you change land uses to put destinations within walking distance or include protected infrastructure, it could show you how travel, safety, emissions, et cetera, might change.” With that information, she said, we have the ability to prioritize investment in infrastructure and strategies that just might save the planet.