We should move from native numpy and pandas to dask. With dask we can run in parallel on large datasets.
numpy
pandas
dask
https://dask.org/