Introduction
So over the years I have worked with GTFS feeds from agencies all around to run analysis and build frequency and trip based GIS visualizations. A separate feed from GTFS is called GTFS-RT that displays the Realtime locations. This feed as well as GTFS is a global standard and if you wanted to work with this kind of data in your own city check with your local transit authority to see if they are one of the Agencies that Openly Publish there information.
This blog post started with a Twitter poll where I asked what people would be interested in me writing about and it was overwhelmingly in favor of Realtime bus locations. Granted the othere options were pretty much geared toward transit policy woks and people like pretty things, but I digress. Vizualizing GTFS-RT has been on my todo list for some time and I am glad that I got around to tackling the problem.
Data Two Ways
Starting this project it was geared toward the agency wanting to create a service that would process the information server side and host as a Hosted Service for use on an ArcGIS Enterprise system. However, I ended up failing to do so, but have been able to get it running on my local machine so I will show you how I did that. I do have some plans to get this running using a Lambda function or ec2 instance in the future, but haven't played around with it too much.
The other way that I was able to get it up and running is through the MapBox PB API. I started this article thinking that I was going to cover it, but to be quite honest you should go read his post it is so much better than anything that I could write so I am just going to cover what I ended up fulling putting together on my end using Python.
Building a Feature Service using Python
So I am going to be honest. I am not sure if I am using this term correctly. In ArcGIS Online/Enterprise world a GIS layer on their online platfor is called a Hosted Feature, but at the end of this you will just be left with a Python script that will output a json file that you can map out.
Getting your data
So like I said previously you will need a GTFS and GTFS-RT for this project which are generally available openly for most transit agencies. The GTFS is the static feed. This feed will have all of the static elements that will repeat over time, like Stop locations, route attributes, and trip alignments. I have separated each of these out into 2 different calls because the GTFS will at most change once a week where as the GTFS-RT is going to update around every 30 seconds.
Rule of Thumb: Transit agencies generally change their GTFS on Monday
GTFS fetch
The first thing that you are going to want to do is fetch the Python file and extract the data. Making sure this is run at the top of the script so you do not run this in a loop and when it starts up the first thing it will do is make sure that there is a fresh GTFS to go with the realtime information that is getting pulled in. The purpose for this is the GTFS-RT does not have all of the relevant information that someone would want, but the GTFS is designed to link up to the GTFS-RT and it has all of the information that you would want.
def getGTFS(gtfs_url):
gtfs = [ # list of gtfs files and their locations
os.path.join(dir, 'agency.txt'),
os.path.join(dir, 'calendar.txt'),
os.path.join(dir, 'calendar_dates.txt'),
os.path.join(dir, 'routes.txt'),
os.path.join(dir, 'shapes.txt'),
os.path.join(dir, 'stop_times.txt'),
os.path.join(dir, 'stops.txt'),
os.path.join(dir, 'transfers.txt'),
os.path.join(dir, 'trips.txt'),
]
for file in gtfs: # delete gtfs files before fetching
if os.path.exists(file):
os.remove(file)
print('FETCHING GTFS...')
zipresp = urlopen(gtfs_url) # Create a new file on the hard drive
tempzip = open("google_transit.zip", "wb") # Write the contents of the downloaded file into the new file
tempzip.write(zipresp.read()) # Close the newly-created file
tempzip.close() # Re-open the newly-created file with ZipFile()
# Extract its contents into <extraction_path> *note that extractall will automatically create the path
zf = zipfile.ZipFile("google_transit.zip")
zf.extractall(dir) # close the ZipFile instance
zf.close()
os.remove(fr"{dir}/google_transit.zip")
print('FETCH COMPLETE!')
GTFS-RT fetch
There are 2 different files when it comes to the GTFS-RT that you will be interested in:
- Trips
- Vehicles
The trips data issues out information about the realtime associated with each trip that is currently in service. You can link up the information in the static GTFS from the trips.txt file using the trip_id column (and then the routes and shapes and stop_times so on and so forth).
The Vehicles data issue out information about the realtime locations of all of the buses.
For more information you can reference the GTFS-Realtime Specification.
When you fetch out each of the datasets you are going to need to use the pip module in the google library google.transit.gtfs_realtime_pb2
. What I have done is put together a little function to the the Protocol Buffer infromation that the feed gives off and then parse it to a string and then to a dict.
def parseDict(pbu):
# TAKES THE DATA FROM U (THE PB URL) AND TURNS IT INTO A DICTIONARY
feed = gtfs_realtime_pb2.FeedMessage()
response = requests.get(pbu)
feed.ParseFromString(response.content)
feed = MessageToDict(feed)
return feed
After you have a dictionary you can then inject data from the GTFS or where ever to the vehicles and trips information. I am not going to go over all of this, but you can take a look at my GTFSRT-parsing Repo for mor in depth workthrough. From there what I did what convert the dict to a geojson format by just creating a blank object labeled as a feature collection and within that feature collection there is a feature that has features. With each feature you need to create and inject data into the four different sections:
- type
- properties
- geometry
- data
allVehicles = {}
allVehicles['type'] = {}
allVehicles['type'] = 'Feature Collection'
allVehicles['features'] = []
feed = parseDict(pburl)
id = 0
for value in feed['entity']:
obj = {}
# LIST OF SECTIONS
list = ["type", "properties", "geometry", "data"]
for i in list: # CREATE SECTIONS
obj[i] = {}
obj["type"] = "Feature"
# START OF DATA SECTION
tripId = value["vehicle"]["trip"]["tripId"]
uni = obj["data"]
uni["vehicleId"] = value["vehicle"]["vehicle"]["id"]
uni["tripId"] = tripId
uni["routeId"] = value["vehicle"]["trip"]["routeId"]
uni["coordinates"] = [value["vehicle"]["position"]["longitude"], value["vehicle"]["position"]["latitude"]]
# START OF GEOMETRY SECTION
obj["geometry"]["type"] = "Point"
obj["geometry"]["coordinates"] = uni["coordinates"]
# START OF PROPERTIES SECTION
obj["properties"] = {}
obj["properties"]['id'] = id
# ADD INDIVIDUAL VEHICLES TO LIST
id += 1
allVehicles['features'].append(obj)
allVehicles = addVehicleInfo(allVehicles)
allVehicles = addVehiclePopups(allVehicles)
return allVehicles
Once you have this up and running you are going to save it locally and have it update at whatever rate you want the data to update with time.sleep(x)
then host.
SOURCES
Special thanks to Gavin Rehkemper @ ESRI Twitter --> @GavinRehkemper Blog