Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning

After dinner in downtown San Francisco, I said goodbye to friends and pulled out my phone to work out how you can get home. It was near 11:30 pm, and Uber estimates were unusually long. I opened Google Maps and checked out walking directions as a substitute. The routes were similar in distance, but I hesitated — not due to how long the walk would take, but because I wasn’t sure how different parts of the route would feel at the moment of night. Google Maps could tell me the fastest way home, but it surely couldn’t help answer the query I used to be actually asking:

Photo by Juliana Chyzhova on Unsplash

Defining the Problem Statement

Given a starting location, ending location, day of the week, and time how can we predict the expected risk on the given walking route? For instance, if I would like to walk from the Ferry Constructing to Lower Nob Hill, Google Maps shows me the next route(s):

**Google Maps — walking route from Chinatown to Market & South Van Ness.** Screenshot by writer from Google Maps.

At a high level, the issue I wanted to resolve was this: given a starting location, ending location, time of day, and day of week, how can we estimate the expected risk along a walking route?

For instance, if I would like to walk from Chinatown to Market & Van Ness, Google Maps presents a pair route options, all taking roughly 40 minutes. While it’s useful to check distance and duration, it doesn’t help answer a more contextual query: which parts of those routes are inclined to look different depending on the time I’m making the walk? How does the identical route compare at 9 am on a Tuesday versus 11 pm on a Saturday?

As walks get longer — or go through areas with very different historical activity patterns — these questions grow to be harder to reply intuitively. While San Francisco shouldn’t be uniquely unsafe in comparison with other major cities, public safety remains to be a meaningful consideration, especially when walking through unfamiliar areas or at unfamiliar times. My goal was to construct a tool for locals and visitors alike that adds context to those decisions — using historical data and machine learning to surface how risk varies across space and time, without reducing town to simplistic labels.

Getting the Data + Pre-Processing

Fetching the Raw Dataset

The San Francisco City and County Departments publish police incident reports day by day through the San Francisco Open Data portal. The dataset spans from January 1, 2018 to the current and includes structured information resembling incident category, subcategory, description, time, and site (latitude and longitude).

Provides a snapshot on how to query the San Francisco Open Data Database. — **Filtered incident records from the San Francisco Open Data Portal**. Screenshot by writer using data from data.sfgov.org.

Categorizing Incidents Reported

One immediate challenge with this data is that not all incidents represent the identical level or kind of risk. Treating all reports equally would blur meaningful differences — for instance, a minor vandalism report mustn’t be weighted the identical as a violent incident. To deal with this, I first extracted all unique mixtures of incident category, subcategory, and outline, which resulted in a bit of over 800 distinct incident triplets.

Slightly than scoring individual incidents directly, I used an LLM to assign severity scores to every unique incident type. This allowed me to normalize semantic differences in the info while keeping the scoring consistent and interpretable. Each incident type was scored on three separate dimensions, each on a 0–10 scale:

Harm rating: the potential risk to human safety and passersby
Property rating: the potential for damage or lack of property
Public disruption rating: the extent to which an incident disrupts normal public activity

These three scores were later combined to form an overall severity signal for every incident, which could then be aggregated spatially and temporally. This approach made it possible to model risk in a way that reflects each the and the of reported incidents, fairly than counting on raw counts alone.

Geospatial Representation

Providing raw latitude and longitude numbers is not going to add much value to the ML model because I want to group aggregate incident context at a block and neighborhood-level. I needed a way to map a block or neighborhood to a hard and fast index to simplify feature engineering and construct a consistent spatial mapping. Cut to the seminal engineering blog published by Uber — H3.

Uber’s H3 blog describes how projecting an icosahedron (20-faced polyhedron) to the surface of the earth and hierarchically breaking them down into hexagonal shapes (and 12 strategically placed pentagons) can assist tessellate your complete map. Hexagons are special because there are one among the few regular polygons that form regular tessellations and its centerpoint is equidistant to it’s neighbors’, which simplifies smoothing over gradients.

Demonstrates how centerpoints on a hexagonal grid is equidistant whereas not equidistant on a square grid. — **Neighbor distance comparison showing unequal distances in square grids and uniform distances in hexagonal grids.** Image by writer.

The web site https://clupasq.github.io/h3-viewer/ is a fun experiment to see what your location’s H3 Index is!

Snapshot to provide context to reader how H3 hexagonal grids are built on the SF map. — **Snapshot from H3 Index Viewer at Resolution 8**.

Temporal Representation

Time is just as necessary as location when modeling walking risk. Nevertheless, naïvely encoding hour and day as integers introduces discontinuities — 23:59 and 00:00 are numerically far apart, although they’re only a minute apart in point of fact.

To deal with this, I encoded time of day and day of week using sine and cosine transformations, which represent cyclical values on a unit circle. This enables the model to learn that late-night and early-morning hours are temporally adjoining, and that days of the week wrap naturally from Saturday back to Sunday.

As well as, I aggregated incidents into 3-hour time windows. Shorter windows were too sparse to provide reliable signals, while larger windows obscured meaningful differences (for instance, early evening versus late night). Three-hour buckets struck a balance between granularity and stability, leading to intuitive periods resembling early morning, afternoon, and late evening.

**Obtaining x, y-coordinates to represent time on a unit circle.** Image by writer

Final Feature Representation

After preprocessing, each data point consisted of:

An H3 index representing location
Cyclically encoded hour and day features
An aggregated severity signal derived from historical incidents

The model was then trained to predict the expected risk for a given H3 cell, at a given time of day and day of week. In practice, which means when a user opens the app and provides a location and time, the system has enough context to estimate how walking risk varies across nearby blocks.

Training the Model Using XGBoost

Why XGBoost?

With the geospatial and temporal features ready, I knew I needed to leverage a model which could capture non-linear patterns within the dataset while providing low latency to perform inference on multiple segments in a route. XGBoost was a natural fit for a pair reasons:

Tree-based models are naturally robust at modeling heterogenous data — categorical spatial indices, cyclical time features, and sparse inputs can coexist without heavy feature scaling or normalization
Feature effects are more interpretable than in deep neural networks, which are inclined to introduce unnecessary opacity for tabular data.
Flexibility in objectives and regularization made it possible to model risk in a way that aligns with the structure of the issue.

While I did consider alternatives resembling linear models, random forests, and neural networks, they were unsatisfactory as a result of inability to capture nuance in data, high latency at inference time, or over-complication for tabular data. XGBoost strikes the most effective balance between performance and practicality.

Modeling Expected Risk

It’s necessary to make clear before we move on that modeling expected risk shouldn’t be a Gaussian problem. When modeling incident rates in town, I noticed per [H3, time] cell that:

several cells have incident count = 0 and/or total risk = 0
several cells have just 1–2 incidents
handful cells have incidents many incidents (> 1000)
extreme events occur, but rarely

These are signs that my model is neither symmetrical nor will the info points cluster around a hard and fast mean. These properties immediately rule out common assumptions like normally distributed errors.

That is where Tweedie regression becomes useful.

What’s Tweedie Regression?

This graph shows that modeling risk, in general, is a right-skewed distribution when plotted on a log plot. — **Zero-inflated data representation in my dataset, i.e. risk models result in a right-skewed distribution as a result of rare and/or extreme events.** Image by writer

Put simply, Tweedie regression says: “Your value is the sum of random events where the variety of events is random and every event has a positive random size.” This suits the crime incident model perfectly.

Tweedie regression combines Poisson and Gamma distribution processes to model the variety of incidents and the scale (risk rating) of every incident. For example:

Poisson process: in window 6pm-9pm on December tenth, 2025 what number of incidents occurred in H3 index 89283082873ffff?
Gamma distribution: how severe was each event that occurred in H3 index 89283082873ffff between 6pm-9pm on December tenth, 2025?

Why This Matters?

A concrete example from the info illustrates why this framing is essential.
Within the Presidio, there was a single, rare high-severity incident that scored near 9/10. In contrast, a block near 300 Hyde Street within the Tenderloin has hundreds of incidents over time, but with a lower average severity. Tweedie breaks it down as:

Expected risk = E[#incidents] × E[severity]

# Presidio
E[#] ≈ ~0
E[severity] = high
→ Expected risk ≈ still ~0
# Tenderloin
E[#] = high
E[severity] = medium
→ Expected risk = large

Subsequently, if the high-risk events are inclined to occur more often in Presidio then it is going to adjust the expected risk accordingly and lift the output scores. Tweedie handles the goal’s zero‑heavy, right‑skewed distribution and the input features we discussed earlier just explain variation in that focus on.

Framing the Final result

The result’s a model that predicts expected risk, not conditional severity and never binary safety labels. This distinction matters. It avoids overreacting to rare but extreme events, while still reflecting sustained patterns that emerge over time.

Final Steps + Deployments

To bring the model to life, I used the Google Maps API to construct a web site which integrates the maps, routes, and direction UI on which I can overlay colours based on the danger scores. I color-coded the segments by taking a percentile of distributions in my data, i.e. rating ≤ P50 = green (secure), rating ≤ P75 = yellow (moderately secure), rating ≤ P90 = orange (moderately dangerous), else red (dangerous). I also added a logic to re-route the user through a safer route if the detour shouldn’t be over 15% of the unique duration. This will be tweaked, but I left it as is for now since with the San Francisco hills a 15% detour could work you rather a lot.

I also deployed backend on Render and frontend on Vercel.

Putting StreetSense To Use!

And now, going back to the primary example we checked out — the journey from Chinatown to Market & Van Ness, but now with our recent model + application we now have built!

Here’s how the walk looks like at 9am on a Tuesday versus 11pm on a Saturday:

Snapshot of my app to illustrate UI/UX + navigating the website. — **My application (StreetSense) — Chinatown to Market & Van Ness at 9am on a Tuesday.** Screenshot by writer.

**My application (StreetSense) — Chinatown to Market & Van Ness at 11pm on a Saturday.** Screenshot by writer

In the primary image, the segments in Chinatown that are green have a lower incident and severity count in comparison with segments that are red and the info backs it too. The cool part concerning the second image is that it routinely re-routes the user through a route which is safer at 11pm on a Saturday night. That is the sort of contextual decision-making I originally wished for — and the motivation behind constructing StreetSense.

Final Thoughts and Potential Improvements

While the present system captures spatial and temporal patterns in historical incidents, there are clear areas for improvement:

incorporating real-time signals
using further ground truth data to validate and train
(a) if an incident was marked as 4/10 risk rating for theft and we are able to discover through the San Francisco database that an arrest was made, we are able to bump it as much as a 5/10
make H3 index sensitive to neighboring cells — Outer Richmond ~ Central Richmond so the model should infer proximity and contextual information needs to be partially shared
Expand spatial features beyond H3 ID (neighbor aggregation, distance to hotspots, land‑use features).
deeper exploration of various methods of handling of incident data + evaluations
(a) experiment with different XGBoost objective functions resembling Pseudo Huber Loss
(b) Leverage hyperparameter optimization frameworks and evaluate different mixtures of values
(c) experiment with neural networks
expanding beyond a single city would all make the model more robust

Like several model built on historical data, StreetSense reflects past patterns fairly than predicting individual outcomes, and it needs to be used as a tool for context fairly than certainty. Ultimately, the goal shouldn’t be to label places as secure or unsafe, but to assist people make more informed, situationally aware selections as they move through a city.

Try StreetSense: https://san-francisco-safety-index.vercel.app/

Data Sources & Licensing

This project uses publicly available data from the San Francisco Open Data Portal:

San Francisco Police Department Incident Reports
– Source: San Francisco Open Data Portal (https://data.sfgov.org)
– License: Open Data Commons Public Domain Dedication and License (PDDL)

All datasets used are publicly available and permitted for reuse, modification, and business use under their respective open data licenses.

Acknowledgement & References

I’d prefer to thank the San Francisco Open Data team for maintaining high-quality public datasets that make projects like this possible.

Additional references that informed my understanding of the methods and ideas utilized in this work include:

Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning

Defining the Problem Statement