Pandas 2.0: A Game-Changer for Data Scientists? 1. Performance, Speed, and Memory-Efficiency 2. Arrow Data Types and Numpy Indices 3. Easier Handling of Missing Values 4. Copy-On-Write Optimization 5. Optional Dependencies Taking it for a spin! The Verdict: Performance, Flexibility, Interoperability! About me

Artificial Intelligence

Pandas 2.0: A Game-Changer for Data Scientists? 1. Performance, Speed, and Memory-Efficiency 2. Arrow Data Types and Numpy Indices 3. Easier Handling of Missing Values 4. Copy-On-Write Optimization 5. Optional Dependencies Taking it for a spin! The Verdict: Performance, Flexibility, Interoperability! About me

admin

July 1, 2023

Pandas 2.0: A Game-Changer for Data Scientists?
1. Performance, Speed, and Memory-Efficiency
2. Arrow Data Types and Numpy Indices
3. Easier Handling of Missing Values
4. Copy-On-Write Optimization
5. Optional Dependencies
Taking it for a spin!
The Verdict: Performance, Flexibility, Interoperability!
About me

Being built on top of numpy made it hard for pandas to handle missing values in a hassle-free, flexible way, since

As an example, , which just isn’t ideal:

Missing Values: Conversion to drift. Snippet by Writer.

Note how points mechanically changes from int64 to float64 after the introduction of a singleNone value.

, especially inside a data-centric AI paradigm.

Erroneous typesets directly impact data preparation decisions, cause incompatibilities between different chunks of information, and even when passing silently, they could compromise certain operations that output nonsensical ends in return.

For instance, on the Data-Centric AI Community, we’re currenlty working on a project around synthetic data for data privacy. Considered one of the features, NOC (number of youngsters), has missing values and due to this fact it’s mechanically converted to float when the information is loaded. The, when passing the information right into a generative model as a float , we’d get output values as decimals akin to 2.5 — unless you’re a mathematician with 2 kids, a newborn, and a weird humorousness, having 2.5 children just isn’t OK.

dtype = 'numpy_nullable', so we will keep our original data types (int64 on this case):

Leveraging ‘numpy_nullable’, pandas 2.0 can handle missing values without changing the unique data types. Snippet by Writer.

, but under the hood it signifies that now pandas can natively . This makes operations , since pandas doesn’t need to implement its own version for handling null values for every data type.

1 COMMENT

LEAVE A REPLY Cancel reply