Home Artificial Intelligence ChatGPT Code Interpreter: How It Saved Me Hours of Work

ChatGPT Code Interpreter: How It Saved Me Hours of Work

7
ChatGPT Code Interpreter: How It Saved Me Hours of Work

Data cleansing and preprocessing

The following step after reading the info is to scrub and preprocess it. I’m sure you heard it several times that this step often takes probably the most time.

There are a couple of issues that must be handled before creating the world map visualizations. ChatGPT detected them and provided an answer in lower than a minute. Quite impressive!

(image by creator)

The code within the above snippet:

# code generated by Code Interpreter
# Check if the country names are consistent across each dataframes
population_countries = set(population_data['Country Name'])
coordinates_countries = set(country_coordinates['name'])

# Countries within the population data but not within the coordinates data
missing_in_coordinates = population_countries - coordinates_countries

# Countries within the coordinates data but not within the population data
missing_in_population = coordinates_countries - population_countries

There are some countries missing within the coordinates dataset and likewise some within the population dataset.

What ChatGPT did within the code snippet above:

  • Create Python sets of country names in each datasets
  • Compare them individually to seek out the difference as a set

ChatGPT solved it in lower than a minute:

The code within the above snippet:

# code generated by Code Interpreter
# Mapping of country names from the population dataset to the coordinates dataset
country_name_mapping = {
'Bahamas, The': 'Bahamas',
'Brunei Darussalam': 'Brunei',
'Cabo Verde': 'Cape Verde',
'Congo, Dem. Rep.': 'Congo [DRC]',
'Congo, Rep.': 'Congo [Republic]',
"Cote d'Ivoire": "Côte d'Ivoire",
'Czechia': 'Czech Republic',
'Egypt, Arab Rep.': 'Egypt',
'Gambia, The': 'Gambia',
'Hong Kong SAR, China': 'Hong Kong',
'Iran, Islamic Rep.': 'Iran',
"Korea, Dem. People's Rep.": 'North Korea',
'Korea, Rep.': 'South Korea',
'Kyrgyz Republic': 'Kyrgyzstan',
'Lao PDR': 'Laos',
'Macao SAR, China': 'Macau',
'Micronesia, Fed. Sts.': 'Micronesia',
'Myanmar': 'Myanmar [Burma]',
'North Macedonia': 'Macedonia [FYROM]',
'Russian Federation': 'Russia',
'Sao Tome and Principe': 'São Tomé and Príncipe',
'Slovak Republic': 'Slovakia',
'Syrian Arab Republic': 'Syria',
'Turkiye': 'Turkey',
'Venezuela, RB': 'Venezuela',
'Yemen, Rep.': 'Yemen',
'Eswatini': 'Swaziland'
}

# Clean the population dataset
population_data_clean = population_data.replace({"Country Name": country_name_mapping})

# Drop the rows that don't exist within the coordinates dataset
population_data_clean = population_data_clean[population_data_clean['Country Name'].isin(coordinates_countries)]

# Merge the population data with the country coordinates data
merged_data = pd.merge(population_data_clean, country_coordinates, left_on='Country Name', right_on='name')

# Keep only the vital columns
merged_data = merged_data[['Country Name', '2022', 'latitude', 'longitude']]

# Rename the columns for clarity
merged_data.columns = ['Country', 'Population', 'Latitude', 'Longitude']

merged_data.head()

Cleaned and preprocessed data (image by creator)

We now have the info in a pleasant and clean format. This step would normally take plenty of time and manual efforts. ChatGPT did it in a few minute.

7 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here