Home Artificial Intelligence Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries 1. Dask 2. SymPy 3. Xarray Conclusions

Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries 1. Dask 2. SymPy 3. Xarray Conclusions

5
Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries
1. Dask
2. SymPy
3. Xarray
Conclusions

Introducing Xarray

Xarray is a Python library that extends the features and functionalities of NumPy, giving us the likelihood to work with labeled arrays and datasets.

As they are saying on their website, in truth:

Xarray makes working with labeled multi-dimensional arrays in Python easy, efficient, and fun!

And also:

Xarray introduces labels in the shape of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and fewer error-prone developer experience.

In other words, it extends the functionality of NumPy arrays by adding labels or coordinates to the array dimensions. These labels provide metadata and enable more advanced evaluation and manipulation of multi-dimensional data.

For instance, in NumPy, arrays are accessed using integer-based indexing.

In Xarray, as a substitute, each dimension can have a label related to it, making it easier to know and manipulate the information based on meaningful names.

For instance, as a substitute of accessing data with arr[0, 1, 2], we will use arr.sel(x=0, y=1, z=2) in Xarray, where x, y, and z are dimension labels.

This makes the code way more readable!

So, let’s see some features of Xarray.

Some features of Xarray in motion

As usual, to put in it:

$ pip install xarray

Suppose we wish to create some data related to temperature and we wish to label these with coordinates like latitude and longitude. We are able to do it like so:

import xarray as xr
import numpy as np

# Create temperature data
temperature = np.random.rand(100, 100) * 20 + 10

# Create coordinate arrays for latitude and longitude
latitudes = np.linspace(-90, 90, 100)
longitudes = np.linspace(-180, 180, 100)

# Create an Xarray data array with labeled coordinates
da = xr.DataArray(
temperature,
dims=['latitude', 'longitude'],
coords={'latitude': latitudes, 'longitude': longitudes}
)

# Access data using labeled coordinates
subset = da.sel(latitude=slice(-45, 45), longitude=slice(-90, 0))

And if we print them we get:

# Print data
print(subset)

>>>

array([[13.45064786, 29.15218061, 14.77363206, ..., 12.00262833,
16.42712411, 15.61353963],
[23.47498117, 20.25554247, 14.44056286, ..., 19.04096482,
15.60398491, 24.69535367],
[25.48971105, 20.64944534, 21.2263141 , ..., 25.80933737,
16.72629302, 29.48307134],
...,
[10.19615833, 17.106716 , 10.79594252, ..., 29.6897709 ,
20.68549602, 29.4015482 ],
[26.54253304, 14.21939699, 11.085207 , ..., 15.56702191,
19.64285595, 18.03809074],
[26.50676351, 15.21217526, 23.63645069, ..., 17.22512125,
13.96942377, 13.93766583]])
Coordinates:
* latitude (latitude) float64 -44.55 -42.73 -40.91 ... 40.91 42.73 44.55
* longitude (longitude) float64 -89.09 -85.45 -81.82 ... -9.091 -5.455 -1.818

So, let’s see the method step-by-step:

  1. We’ve created the temperature values as a NumPy array.
  2. We’ve defined the latitudes and longitueas values as NumPy arrays.
  3. We’ve stored all the information in an Xarray array with the tactic DataArray().
  4. We’ve chosen a subset of the latitudes and longitudes with the tactic sel() that selects the values we wish for our subset.

The result can be easily readable, so labeling is admittedly helpful in a variety of cases.

Suppose we’re collecting data related to temperatures through the yr. We wish to know if we’ve some null values in our array. Here’s how we will accomplish that:

import xarray as xr
import numpy as np
import pandas as pd

# Create temperature data with missing values
temperature = np.random.rand(365, 50, 50) * 20 + 10
temperature[0:10, :, :] = np.nan # Set the primary 10 days as missing values

# Create time, latitude, and longitude coordinate arrays
times = pd.date_range('2023-01-01', periods=365, freq='D')
latitudes = np.linspace(-90, 90, 50)
longitudes = np.linspace(-180, 180, 50)

# Create an Xarray data array with missing values
da = xr.DataArray(
temperature,
dims=['time', 'latitude', 'longitude'],
coords={'time': times, 'latitude': latitudes, 'longitude': longitudes}
)

# Count the variety of missing values along the time dimension
missing_count = da.isnull().sum(dim='time')

# Print missing values
print(missing_count)

>>>


array([[10, 10, 10, ..., 10, 10, 10],
[10, 10, 10, ..., 10, 10, 10],
[10, 10, 10, ..., 10, 10, 10],
...,
[10, 10, 10, ..., 10, 10, 10],
[10, 10, 10, ..., 10, 10, 10],
[10, 10, 10, ..., 10, 10, 10]])
Coordinates:
* latitude (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0
* longitude (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0

And so we obtain that we’ve 10 null values.

Also, if we have a look closely on the code, we will see that we will apply Pandas’ methods to an Xarray like isnull.sum(), as on this case, that counts the whole variety of missing values.

The temptation to handle and analyze multi-dimensional data is high when we’ve the likelihood to label our arrays. So, why not try it?

For instance, suppose we’re still collecting data related to temperatures at certain latitudes and longitudes.

We should want to calculate the mean, the max, and the median temperatures. We are able to do it like so:

import xarray as xr
import numpy as np
import pandas as pd

# Create synthetic temperature data
temperature = np.random.rand(365, 50, 50) * 20 + 10

# Create time, latitude, and longitude coordinate arrays
times = pd.date_range('2023-01-01', periods=365, freq='D')
latitudes = np.linspace(-90, 90, 50)
longitudes = np.linspace(-180, 180, 50)

# Create an Xarray dataset
ds = xr.Dataset(
{
'temperature': (['time', 'latitude', 'longitude'], temperature),
},
coords={
'time': times,
'latitude': latitudes,
'longitude': longitudes,
}
)

# Perform statistical evaluation on the temperature data
mean_temperature = ds['temperature'].mean(dim='time')
max_temperature = ds['temperature'].max(dim='time')
min_temperature = ds['temperature'].min(dim='time')

# Print values
print(f"mean temperature:n {mean_temperature}n")
print(f"max temperature:n {max_temperature}n")
print(f"min temperature:n {min_temperature}n")

>>>

mean temperature:

array([[19.99931701, 20.36395016, 20.04110699, ..., 19.98811842,
20.08895803, 19.86064693],
[19.84016491, 19.87077812, 20.27445405, ..., 19.8071972 ,
19.62665953, 19.58231185],
[19.63911165, 19.62051976, 19.61247548, ..., 19.85043831,
20.13086891, 19.80267099],
...,
[20.18590514, 20.05931149, 20.17133483, ..., 20.52858247,
19.83882433, 20.66808513],
[19.56455575, 19.90091128, 20.32566232, ..., 19.88689221,
19.78811145, 19.91205212],
[19.82268297, 20.14242279, 19.60842148, ..., 19.68290006,
20.00327294, 19.68955107]])
Coordinates:
* latitude (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0
* longitude (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0

max temperature:

array([[29.98465531, 29.97609171, 29.96821276, ..., 29.86639343,
29.95069558, 29.98807808],
[29.91802049, 29.92870312, 29.87625447, ..., 29.92519055,
29.9964299 , 29.99792388],
[29.96647016, 29.7934891 , 29.89731136, ..., 29.99174546,
29.97267052, 29.96058079],
...,
[29.91699117, 29.98920555, 29.83798369, ..., 29.90271746,
29.93747041, 29.97244906],
[29.99171911, 29.99051943, 29.92706773, ..., 29.90578739,
29.99433847, 29.94506567],
[29.99438621, 29.98798699, 29.97664488, ..., 29.98669576,
29.91296382, 29.93100249]])
Coordinates:
* latitude (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0
* longitude (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0

min temperature:

array([[10.0326431 , 10.07666029, 10.02795524, ..., 10.17215336,
10.00264909, 10.05387097],
[10.00355858, 10.00610942, 10.02567816, ..., 10.29100316,
10.00861792, 10.16955806],
[10.01636216, 10.02856619, 10.00389027, ..., 10.0929342 ,
10.01504103, 10.06219179],
...,
[10.00477003, 10.0303088 , 10.04494723, ..., 10.05720692,
10.122994 , 10.04947012],
[10.00422182, 10.0211205 , 10.00183528, ..., 10.03818058,
10.02632697, 10.06722953],
[10.10994581, 10.12445222, 10.03002468, ..., 10.06937041,
10.04924046, 10.00645499]])
Coordinates:
* latitude (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0
* longitude (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0

And we obtained what we wanted, also in a clearly readable way.

And again, as before, to calculate the max, min, and mean values of temperatures we’ve used Pandas’ functions applied to an array.

5 COMMENTS

  1. … [Trackback]

    […] There you will find 12095 more Information to that Topic: bardai.ai/artificial-intelligence/beyond-numpy-and-pandas-unlocking-the-potential-of-lesser-known-python-libraries1-dask2-sympy3-xarrayconclusions/ […]

  2. … [Trackback]

    […] Read More on that Topic: bardai.ai/artificial-intelligence/beyond-numpy-and-pandas-unlocking-the-potential-of-lesser-known-python-libraries1-dask2-sympy3-xarrayconclusions/ […]

  3. … [Trackback]

    […] There you will find 94851 additional Info on that Topic: bardai.ai/artificial-intelligence/beyond-numpy-and-pandas-unlocking-the-potential-of-lesser-known-python-libraries1-dask2-sympy3-xarrayconclusions/ […]

LEAVE A REPLY

Please enter your comment!
Please enter your name here