Home Artificial Intelligence Finding Patterns in Convenience Store Locations with Geospatial Association Rule Mining

Finding Patterns in Convenience Store Locations with Geospatial Association Rule Mining

0
Finding Patterns in Convenience Store Locations with Geospatial Association Rule Mining

Photo by Matt Liu on Unsplash

When walking around Tokyo you’ll often pass quite a few convenience stores, locally generally known as “konbinis”, which is sensible since there are over 56,000 convenience stores in Japan. Often there shall be different chains of convenience store situated very close to at least one one other; it is just not unusual to see stores across the corner from one another or on opposite sides of the road. Given Tokyo’s population density, it’s comprehensible for competing businesses to be forced closer to one another, nevertheless, could there be any relationships between which chains of convenience stores are found near one another?

The goal shall be to gather location data from quite a few convenience store chains in a Tokyo neighbourhood, to know if there are any relationships between which chains are co-located with one another. To do this may require:

  • Ability to question the placement of various convenience stores in Tokyo, with a view to retrieve each store’s name and placement
  • Finding which convenience stores are co-located with one another inside a pre-defined radius
  • Using the info on co-located stores to derive association rules
  • Plotting and visualising results for inspection

Let’s begin!

For our use case we wish to search out convenience stores in Tokyo, so first we’ll have to do a bit of homework on what are the common store chains. A fast Google search tells me that the principal stores are FamilyMart, Lawson, 7-Eleven, Ministop, Every day Yamazaki and NewDays.

Now we all know what we’re searching, lets go to OSMNX; an excellent Python package for searching data in OpenStreetMap (OSM). According the OSM’s schema, we must always have the opportunity to search out the shop name in either the ‘brand:en’ or ‘brand’ field.

We will start by importing some useful libraries for getting our data, and defining a function to return a table of locations for a given convenience store chain inside a specified area:

import geopandas as gpd
from shapely.geometry import Point, Polygon
import osmnx
import shapely
import pandas as pd
import numpy as np
import networkx as nx

def point_finder(place, tags):
'''
Returns a dataframe of coordinates of an entity from OSM.

Parameters:
place (str): a location (i.e., 'Tokyo, Japan')
tags (dict): key value of entity attribute in OSM (i.e., 'Name') and value (i.e., amenity name)
Returns:
results (DataFrame): table of latitude and longitude with entity value
'''

gdf = osmnx.geocode_to_gdf(place)
#Getting the bounding box of the gdf
bounding = gdf.bounds
north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]
location = gdf.geometry.unary_union
#Finding the points inside the area polygon
point = osmnx.geometries_from_bbox(north,
south,
east,
west,
tags=tags)
point.set_crs(crs=4326)
point = point[point.geometry.within(location)]
#Ensuring we're coping with points
point['geometry'] = point['geometry'].apply(lambda x : x.centroid if type(x) == Polygon else x)
point = point[point.geom_type != 'MultiPolygon']
point = point[point.geom_type != 'Polygon']

results = pd.DataFrame({'name' : list(point['name']),
'longitude' : list(point['geometry'].x),
'latitude' : list(point['geometry'].y)}
)

results['name'] = list(tags.values())[0]
return results

convenience_stores = place_finder(place = 'Shinjuku, Tokyo',
tags={"brand:en" : " "})

We will pass each convenience store name and mix the outcomes right into a single table of store name, longitude and latitude. For our use case we are able to deal with the Shinjuku neighbourhood in Tokyo, and see what the abundance of every convenience store looks like:

Frequency count of convenience stores. Image by creator.

Clearly FamilyMart and 7-Eleven dominate within the frequency of stores, but how does this look spatially? Plotting geospatial data is pretty straightforward when using Kepler.gl, which incorporates a pleasant interface for creating visualisations which might be saved as html objects or visualised directly in Jupyter notebooks:

Location map of Shinjuku convenience stores, color coded by store name. Image by creator.
Location map of Shinjuku convenience stores, color coded density in a two minute walking radius (168m). image by creator.

Now that we’ve got our data, the following step shall be to search out nearest neighbours for every convenience store. To do that, we shall be using Scikit Learn’s ‘BallTree’ class to search out the names of the closest convenience stores inside a two minute walking radius. We aren’t keen on what number of stores are considered nearest neighbours, so we are going to just have a look at which convenience store chains are inside the defined radius.

# Convert location to radians
locations = convenience_stores[["latitude", "longitude"]].values
locations_radians = np.radians(locations)

# Create a balltree to look locations
tree = BallTree(locations_radians, leaf_size=15, metric='haversine')

# Find nearest neighbours in a 2 minute walking radius
is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)

# Replace the neighbour indices with store names
df = pd.DataFrame(is_within)
df.columns = ['indices']
df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]

# create temporary index column
convenience_stores = convenience_stores.reset_index()
# set temporary index column as index
convenience_stores = convenience_stores.set_index('index')
# create index-name mapping
index_name_mapping = convenience_stores['name'].to_dict()

# replace index values with names and take away duplicates
df['indices'] = df['indices'].apply(lambda lst: list(set(map(index_name_mapping.get, set(lst)))))
# Append back to original df
convenience_stores['neighbours'] = df['indices']

# Discover when a store has no neighbours
convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]

# Unique store names
unique_elements = set([item for sublist in convenience_stores['neighbours'] for item in sublist])
# Count each stores frequency within the set of neighbours per location
counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]

# Create a recent dataframe with the counts
output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]

If we wish to enhance the accuracy of our work, we could replace the haversine distance measure for something more accurate (i.e., walking times calculated using networkx), but we’ll keep things easy.

This can give us a DataFrame where each row corresponds to a location, and a binary count of which convenience store chains are nearby:

Sample DataFrame of convenience store nearest neighbours for every location. Image by creator.

We now have a dataset able to perform association rule mining. Using the mlxtend library we are able to derive association rules using the Apriori algorithm. There may be a minimum support of 5%, in order that we are able to examine only the foundations related to frequent occurrences in our dataset (i.e., co-located convenience store chains). We use the metric ‘lift’ when deriving rules; lift is the ratio of the proportion of locations that contain each the antecedent and consequent relative to the expected support under the idea of independence.

from mlxtend.frequent_patterns import association_rules, apriori

# Calculate apriori
frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)
# Create rules
rules = association_rules(frequent_set, metric = 'lift')
# Sort rules by the support value
rules.sort_values(['support'], ascending=False)

This offers us the next results table:

Association rules for convenience store data. Image by creator.

We are going to now interpret these association rules to make some high level takeaway learnings. To interpret this table its best to read more about Association Rules, using these links:

Okay, back to the table.

Support is telling us how often different convenience store chains are literally found together. Due to this fact we are able to say that 7-Eleven and FamilyMart are found together in ~31% of the info. A lift over 1 indicates that the presence of the antecedent increases the likelihood of the ensuing, suggesting that the locations of the 2 chains are partially dependent. However, the association between 7-Eleven and Lawson shows the next lift but with a lower confidence.

Every day Yamazaki has a low support near our cutoff and shows a weak relationship with the placement of FamilyMart, given by a lift barely above 1.

Other rules are referring to mixtures of convenience stores. For instance when a 7-Eleven and FamilyMart are already co-located, there may be a high lift value of 1.42 that implies a robust association with Lawson.

If we had just stopped at finding the closest neighbours for every store location, we’d not have been capable of determine anything concerning the relationships between these stores.

An example of why geospatial association rules might be insightful for businesses is in determining recent store locations. If a convenience store chain is opening a recent location, association rules can assist to discover which stores are more likely to co-occur.

The worth on this becomes clear when tailoring marketing campaigns and pricing strategies, because it provides quantitative relationships about which stores are more likely to compete. Since we all know that FamilyMart and 7-Eleven often co-occur, which we display with association rules, it could make sense for each of those chains to pay more attention to how their products compete relative to other chains resembling Lawson and Every day Yamazaki.

In this text we’ve got created geospatial association rules for convenience store chains in a Tokyo neighbourhood. This was done using data extraction from OpenStreetMap, finding nearest neighbour convenience store chains, visualising data on maps, and creating association rules using an Apriori algorithm.

Thanks for reading!

LEAVE A REPLY

Please enter your comment!
Please enter your name here