Do More with NumPy Array Type Hints: Annotate & Validate Shape & Dtype

array object can take many concrete forms. It may be a one-dimensional (1D) array of Booleans, or a three-dimensional (3D) array of 8-bit unsigned integers. Because the built-in function isinstance() will show, every array is an instance of np.ndarray, no matter shape or the variety of elements stored within the array, i.e., the dtype. Similarly, many type-annotated interfaces still only specify np.ndarray:

import numpy as np

def process(
    x: np.ndarray,
    y: np.ndarray,
    ) -> np.ndarray: ...

Such type annotations are insufficient: most interfaces have strong expectations of the form or dtype of passed arrays. Most code will fail if a 3D array is passed where a 1D array is predicted, or an array of dates is passed where an array of floats is predicted.

Taking full advantage of the generic np.ndarray, array shape and dtype characteristics can now be fully specified:

def process(
    x: np.ndarray[tuple[int], np.dtype[np.bool_]],
    y: np.ndarray[tuple[int, int, int], np.dtype[np.uint8]],
    ) -> np.ndarray[tuple[int], np.dtype[np.float64]]: ...

With such detail, recent versions of static evaluation tools like mypy and pyright can find issues before code is even run. Further, run-time validators specialized for NumPy, like StaticFrame‘s sf.CallGuard, can re-use the identical annotations for run-time validation.

Generic Types in Python

Generic built-in containers equivalent to list and dict will be made concrete by specifying, for every interface, the contained types. A function can declare it takes a list of str with list[str]; or a dict of str to bool will be specified with dict[str, bool].

The Generic `np.ndarray`

An np.ndarray is an N-dimensional array of a single element type (or dtype). The np.ndarray generic takes two type parameters: the primary defines the form with a tuple, the second defines the element type with the generic np.dtype. While np.ndarray has taken two type parameters for a while, the definition of the primary parameter, shape, was not full specified until NumPy 2.1.

The Shape Type Parameter

When creating an array with interfaces like np.empty or np.full, a shape argument is given as a tuple. The length of the tuple defines the array’s dimensionality; the magnitude of every position defines the scale of that dimension. Thus a shape (10,) is a 1D array of 10 elements; a shape (10, 100, 1000) is a 3 dimensional array of size 10 by 100 by 1000.

When using a tuple to define shape within the np.ndarray generic, at present only the variety of dimensions can generally be used for type checking. Thus, a tuple[int] can specify a 1D array; a tuple[int, int, int] can specify a 3D array; a tuple[int, ...], specifying a tuple of zero or more integers, denotes an N-dimensional array. It may be possible in the long run to type-check an np.ndarray with specific magnitudes per dimension (using Literal), but this will not be yet broadly supported.

The `dtype` Type Parameter

The NumPy dtype object defines element types and, for some types, other characteristics equivalent to size (for Unicode and string types) or unit (for np.datetime64 types). The dtype itself is generic, taking a NumPy “generic” type as a sort parameter. Probably the most narrow types specify specific element characteristics, for instance np.uint8, np.float64, or np.bool_. Beyond these narrow types, NumPy provides more general types, equivalent to np.integer, np.inexact, or np.number.

Making `np.ndarray` Concrete

The next examples illustrate concrete np.ndarray definitions:

A 1D array of Booleans:

np.ndarray[tuple[int], np.dtype[np.bool_]]

A 3D array of unsigned 8-bit integers:

np.ndarray[tuple[int, int, int], np.dtype[np.uint8]]

A two-dimensional (2D) array of Unicode strings:

np.ndarray[tuple[int, int], np.dtype[np.str_]]

A 1D array of any numeric type:

np.ndarray[tuple[int], np.dtype[np.number]]

Static Type Checking with Mypy

Once the generic np.ndarray is made concrete, mypy or similar type checkers can, for some code paths, discover values which might be incompatible with an interface.

For instance, the function below requires a 1D array of signed integers. As shown below, unsigned integers, or dimensionalities apart from one, fail mypy checks.

def process1(x: np.ndarray[tuple[int], np.dtype[np.signedinteger]]): ...

a1 = np.empty(100, dtype=np.int16)
process1(a1) # mypy passes

a2 = np.empty(100, dtype=np.uint8)
process1(a2) # mypy fails
# error: Argument 1 to "process1" has incompatible type
# "ndarray[tuple[int], dtype[unsignedinteger[_8Bit]]]";
# expected "ndarray[tuple[int], dtype[signedinteger[Any]]]"  [arg-type]

a3 = np.empty((100, 100, 100), dtype=np.int64)
process1(a3) # mypy fails
# error: Argument 1 to "process1" has incompatible type
# "ndarray[tuple[int, int, int], dtype[signedinteger[_64Bit]]]";
# expected "ndarray[tuple[int], dtype[signedinteger[Any]]]"

Runtime Validation with `sf.CallGuard`

Not all array operations can statically define the form or dtype of a resulting array. Because of this, static evaluation is not going to catch all mismatched interfaces. Higher than creating redundant validation code across many functions, type annotations will be re-used for run-time validation with tools specialized for NumPy types.

The StaticFrame CallGuard interface offers two decorators, check and warn, which raise exceptions or warnings, respectively, on validation errors. These decorators will validate type-annotations against the characteristics of run-time objects.

For instance, by adding sf.CallGuard.check to the function below, the arrays fail validation with expressive CallGuard exceptions:

import static_frame as sf

@sf.CallGuard.check
def process2(x: np.ndarray[tuple[int], np.dtype[np.signedinteger]]): ...

b1 = np.empty(100, dtype=np.uint8)
process2(b1)
# static_frame.core.type_clinic.ClinicError:
# In args of (x: ndarray[tuple[int], dtype[signedinteger]]) -> Any
# └── In arg x
#     └── ndarray[tuple[int], dtype[signedinteger]]
#         └── dtype[signedinteger]
#             └── Expected signedinteger, provided uint8 invalid

b2 = np.empty((10, 100), dtype=np.int8)
process2(b2)
# static_frame.core.type_clinic.ClinicError:
# In args of (x: ndarray[tuple[int], dtype[signedinteger]]) -> Any
# └── In arg x
#     └── ndarray[tuple[int], dtype[signedinteger]]
#         └── tuple[int]
#             └── Expected tuple length of 1, provided tuple length of two

Conclusion

More will be done to enhance NumPy typing. For instance, the np.object_ type may very well be made generic such that Python types contained in an object array may very well be defined. For instance, a 1D object array of pairs of integers may very well be annotated as:

np.ndarray[tuple[int], np.dtype[np.object_[tuple[int, int]]]]

Further, units of np.datetime64 cannot yet be statically specified. For instance, date units may very well be distinguished from nanosecond units with annotations like np.dtype[np.datetime64[Literal['D']]] or np.dtype[np.datetime64[Literal['ns']]].

Even with limitations, fully-specified NumPy type annotations catch errors and improve code quality. As shown, Static Evaluation can discover mismatched shape or dtype, and validation with sf.CallGuard can provide strong run-time guarantees.

Do More with NumPy Array Type Hints: Annotate & Validate Shape & Dtype

Generic Types in Python

The Generic `np.ndarray`

The Shape Type Parameter

The `dtype` Type Parameter

Making `np.ndarray` Concrete

Static Type Checking with Mypy

Runtime Validation with `sf.CallGuard`

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

a Leaderboard for Real World Use Cases

Patch Time Series Transformer in Hugging Face

Constitutional AI with Open LLMs

Hugging Face Text Generation Inference available for AWS Inferentia2

The best way to Leverage Slash Commands to Code Effectively

Do More with NumPy Array Type Hints: Annotate & Validate Shape & Dtype

Generic Types in Python

The Generic np.ndarray

The Shape Type Parameter

The dtype Type Parameter

Making np.ndarray Concrete

Static Type Checking with Mypy

Runtime Validation with sf.CallGuard

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

The Generic `np.ndarray`

The `dtype` Type Parameter

Making `np.ndarray` Concrete

Runtime Validation with `sf.CallGuard`

What are your thoughts on this topic?
Let us know in the comments below.