Home Artificial Intelligence The Language of Locations: Evaluating Generative AI’s Geocoding Proficiency

The Language of Locations: Evaluating Generative AI’s Geocoding Proficiency

3
The Language of Locations: Evaluating Generative AI’s Geocoding Proficiency

Case Study: Unstructured Location Descriptions of Automobile Accidents

Data Collection and Preparation
To check out and quantify the geocoding capabilities of LLMs, a listing of 100 unstructured location descriptions of car accidents in Minnesota were randomly chosen from a dataset that was scraped from the web. The bottom truth coordinates for all 100 accidents were manually created through the use of varied mapping applications like Google Maps and the Minnesota Department of Transportation’s Traffic Mapping Application (TMA).

Some sample location descriptions are featured below.

US Hwy 71 at MN Hwy 60 , WINDOM, Cottonwood County

EB Highway 10 near Joplin St NW, ELK RIVER, Sherburne County

EB I 90 / HWY 22, FOSTER TWP, Faribault County

Highway 75 milepost 403, SAINT VINCENT TWP, Kittson County

65 Highway / King Road, BRUNSWICK TWP, Kanabec County

As seen within the examples above, there are wide selection of possibilities for a way the outline might be structured, in addition to what defines the situation. One example of that is the fourth description, which incorporates a mile marker number, which is much less prone to be matched in any type of geocoding process, since that information typically isn’t included in any type of reference data. Finding the bottom truth coordinates for descriptions like this one relied heavily on the usage of the Minnesota Department of Transportation’s Linear Referencing System (LRS) which provides a standardized approach of how roads are measured through out the State, with which mile markers play a significant role in. This data might be accessed through the TMA application mentioned previously.

Methodology & Geocoding Strategies
After preparing the info, five separate notebooks were set as much as test out different geocoding processes. Their configurations are as follows.

1. Google Geocoding API, used on the raw location description
2. Esri Geocoding API, used on the raw location description
3. Google Geocoding API, used on an OpenAI GPT 3.5 standardized location description
4. Esri Geocoding API, used on an OpenAI GPT 3.5 standardized location description
5. OpenAI GPT 3.5, used as a geocoder itself

To summarize, the Google and Esri geocoding APIs were used on each the raw descriptions in addition to descriptions that were standardized using a brief prompt that was passed into the OpenAI GPT 3.5 model. The Python code for this standardization process might be seen below.

def standardize_location(df, description_series):
df["ai_location_description"] = df[description_series].apply(_gpt_chat)

return df

def _gpt_chat(input_text):
prompt = """Standardize the next location description into text
that could possibly be fed right into a Geocoding API. When responding, only
return the output text."""

response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": prompt},
{"role": "user", "content": input_text},
],
temperature=0.7,
n=1,
max_tokens=150,
stop=None,
)

return response.selections[0].message.content.strip().split("n")[-1]

The 4 test cases using geocoding APIs used the code below to make API requests to their respective geocoders and return the resulting coordinates for all 100 descriptions.

# Esri Geocoder
def geocode_esri(df, description_series):
df["xy"] = df[description_series].apply(
_single_esri_geocode
)

df["x"] = df["xy"].apply(
lambda row: row.split(",")[0].strip()
)
df["y"] = df["xy"].apply(
lambda row: row.split(",")[1].strip()
)

df["x"] = pd.to_numeric(df["x"], errors="coerce")
df["y"] = pd.to_numeric(df["y"], errors="coerce")

df = df[df["x"].notna()]
df = df[df["y"].notna()]

return df

def _single_esri_geocode(input_text):
base_url = "https://geocode-api.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates"
params = {
"f": "json",
"singleLine": input_text,
"maxLocations": "1",
"token": os.environ["GEOCODE_TOKEN"],
}

response = requests.get(base_url, params=params)

data = response.json()

try:
x = data["candidates"][0]["location"]["x"]
y = data["candidates"][0]["location"]["y"]

except:
x = None
y = None

return f"{x}, {y}"

# Google Geocoder
def geocode_google(df, description_series):
df["xy"] = df[description_series].apply(
_single_google_geocode
)

df["x"] = df["xy"].apply(
lambda row: row.split(",")[0].strip()
)
df["y"] = df["xy"].apply(
lambda row: row.split(",")[1].strip()
)

df["x"] = pd.to_numeric(df["x"], errors="coerce")
df["y"] = pd.to_numeric(df["y"], errors="coerce")

df = df[df["x"].notna()]
df = df[df["y"].notna()]

return df

def _single_google_geocode(input_text):
base_url = "https://maps.googleapis.com/maps/api/geocode/json"
params = {
"address": input_text,
"key": os.environ["GOOGLE_MAPS_KEY"],
"bounds": "43.00,-97.50 49.5,-89.00",
}

response = requests.get(base_url, params=params)

data = response.json()

try:
x = data["results"][0]["geometry"]["location"]["lng"]
y = data["results"][0]["geometry"]["location"]["lat"]

except:
x = None
y = None

return f"{x}, {y}"

Moreover, one final process tested was to make use of GPT 3.5 because the geocoder itself, without the assistance of any geocoding API. The code for this process looked nearly equivalent to the standardization code used above, but featured a distinct prompt, shown below.

Geocode the next address. Return a latitude (Y) and longitude (X) as accurately as possible. When responding, only return the output text in the next format: X, Y

Performance Metrics and Insights
After the assorted processes were developed, each process was run and several other performance metrics were calculated, each by way of execution time and geocoding accuracy. These metrics are listed below.

        |  Geocoding Process  |  Mean  | StdDev |  MAE   |  RMSE  |
| ------------------- | ------ | ------ | ------ | ------ |
| Google with GPT 3.5 | 0.1012 | 1.8537 | 0.3698 | 1.8565 |
| Google with Raw | 0.1047 | 1.1383 | 0.2643 | 1.1431 |
| Esri with GPT 3.5 | 0.0116 | 0.5748 | 0.0736 | 0.5749 |
| Esri with Raw | 0.0001 | 0.0396 | 0.0174 | 0.0396 |
| GPT 3.5 Geocoding | 2.1261 | 80.022 | 45.416 | 80.050 |
       |  Geocoding Process  | 75% ET | 90% ET | 95% ET | Run Time |
| ------------------- | ------ | ------ | ------ | -------- |
| Google with GPT 3.5 | 0.0683 | 0.3593 | 3.3496 | 1m 59.9s |
| Google with Raw | 0.0849 | 0.4171 | 3.3496 | 0m 23.2s |
| Esri with GPT 3.5 | 0.0364 | 0.0641 | 0.1171 | 2m 22.7s |
| Esri with Raw | 0.0362 | 0.0586 | 0.1171 | 0m 51.0s |
| GPT 3.5 Geocoding | 195.54 | 197.86 | 199.13 | 1m 11.9s |

The metrics are explained in additional detail here. Mean represents the mean error (by way of Manhattan distance, or the full of X and Y difference from the bottom truth, in decimal degrees). StdDev represents the usual deviation of error (by way of Manhattan distance, in decimal degrees). MAE represents the mean absolute error (by way of Manhattan distance, in decimal degrees). RMSE represents the foundation mean square error (by way of Manhattan distance, in decimal degrees). 75%, 90%, 95% ET represents the error threshold for that given percent (by way of Euclidean distance, in decimal degrees), meaning that for a given percentage, that percentage of records falls throughout the resulting value’s distance from the bottom truth. Lastly, run time simply represents the full time taken to run the geocoding process on 100 records.

Clearly, GPT 3.5 performs far worse by itself. Although, if a pair outliers are taken out of the image (which were labelled by the model as being situated in other continents), for probably the most part the outcomes of that process don’t look too misplaced, visually at the very least.

Additionally it is interesting to see that the LLM-standardization process actually decreased accuracy, which I personally found a bit surprising, since my whole intention of introducing that component was to hopefully barely improve the general accuracy of the geocoding process. It’s value noting that the prompts themselves might have been a component of the issue here, and it’s value further exploring the role of “prompt engineering” in geospatial contexts.

The last primary takeaway from this evaluation is the execution time differences, with which any process that features the usage of GPT 3.5 performs significantly slower. Esri’s geocoding API can be slower than Google’s on this setting too. Rigorous testing was not performed, nevertheless, so these results must be taken with that into consideration.

3 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here