You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "raulcd (via GitHub)" <gi...@apache.org> on 2023/04/25 08:37:30 UTC
[GitHub] [arrow] raulcd opened a new issue, #35321: [Python] test_extension_to_pandas_storage_type fails with `NotImplementedError: extension>`
raulcd opened a new issue, #35321:
URL: https://github.com/apache/arrow/issues/35321
### Describe the bug, including details regarding any error messages, version, and platform.
The pandas nightly tests and release verification jobs are failing:
- [test-conda-python-3.8-pandas-latest](https://github.com/ursacomputing/crossbow/actions/runs/4792360361/jobs/8523791672)
- [test-conda-python-3.8-pandas-nightly](https://github.com/ursacomputing/crossbow/actions/runs/4792357427/jobs/8523785745)
- [test-conda-python-3.9-pandas-upstream_devel](https://github.com/ursacomputing/crossbow/actions/runs/4792356978/jobs/8523784707)
- [verify-rc-source-python-linux-almalinux-8-amd64](https://github.com/ursacomputing/crossbow/actions/runs/4786502455/jobs/8510514881)
- [verify-rc-source-python-linux-conda-latest-amd64](https://github.com/ursacomputing/crossbow/actions/runs/4786507381/jobs/8510525045)
- [verify-rc-source-python-linux-ubuntu-20.04-amd64](https://github.com/ursacomputing/crossbow/actions/runs/4786509036/jobs/8510528619)
- [verify-rc-source-python-linux-ubuntu-22.04-amd64](https://github.com/ursacomputing/crossbow/actions/runs/4786508280/jobs/8510527050)
- [verify-rc-source-python-macos-amd64](https://github.com/ursacomputing/crossbow/actions/runs/4786510209/jobs/8510531034)
- [verify-rc-source-python-macos-arm64](https://github.com/ursacomputing/crossbow/actions/runs/4786502769/jobs/8510515284)
- [verify-rc-source-python-macos-conda-amd64](https://github.com/ursacomputing/crossbow/actions/runs/4786513371/jobs/8510537552)
Due to the following test failing:
```
test_extension_to_pandas_storage_type[registered_period_type0]
test_extension_to_pandas_storage_type[registered_period_type1]
test_extension_to_pandas_storage_type[registered_period_type2]
```
This started happening since the new pandas 2.0.1 was released: https://pypi.org/project/pandas/#history
The full error:
```
=================================== FAILURES ===================================
________ test_extension_to_pandas_storage_type[registered_period_type0] ________
registered_period_type = (PeriodType(DataType(int64)), <class 'pyarrow.lib.ExtensionArray'>)
@pytest.mark.pandas
def test_extension_to_pandas_storage_type(registered_period_type):
period_type, _ = registered_period_type
np_arr = np.array([1, 2, 3, 4], dtype='i8')
storage = pa.array([1, 2, 3, 4], pa.int64())
arr = pa.ExtensionArray.from_storage(period_type, storage)
if isinstance(period_type, PeriodTypeWithToPandasDtype):
pandas_dtype = period_type.to_pandas_dtype()
else:
pandas_dtype = np_arr.dtype
# Test arrays
result = arr.to_pandas()
assert result.dtype == pandas_dtype
# Test chunked arrays
chunked_arr = pa.chunked_array([arr])
result = chunked_arr.to_numpy()
assert result.dtype == np_arr.dtype
result = chunked_arr.to_pandas()
assert result.dtype == pandas_dtype
# Test Table.to_pandas
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', None, None]),
pa.array([True, None, True, False]),
arr
]
my_schema = pa.schema([('f0', pa.int8()),
('f1', pa.string()),
('f2', pa.bool_()),
('ext', period_type)])
table = pa.Table.from_arrays(data, schema=my_schema)
result = table.to_pandas()
assert result["ext"].dtype == pandas_dtype
import pandas as pd
if Version(pd.__version__) > Version("2.0.0"):
# Check the usage of types_mapper
> result = table.to_pandas(types_mapper=pd.ArrowDtype)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_extension_type.py:1302:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow/array.pxi:852: in pyarrow.lib._PandasConvertible.to_pandas
???
pyarrow/table.pxi:4114: in pyarrow.lib.Table._to_pandas
???
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:820: in table_to_blockmanager
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:1170: in _table_to_blocks
return [_reconstruct_block(item, columns, extension_columns)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:1170: in <listcomp>
return [_reconstruct_block(item, columns, extension_columns)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:781: in _reconstruct_block
block = _int.make_block(pd_ext_arr, placement=placement)
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/internals/api.py:73: in make_block
if is_datetime64tz_dtype(values.dtype) or is_period_dtype(values.dtype):
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/dtypes/common.py:415: in is_period_dtype
return arr_or_dtype.type is Period
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = extension<test.period<PeriodType>>[pyarrow]
@property
def type(self):
"""
Returns associated scalar type.
"""
pa_type = self.pyarrow_dtype
if pa.types.is_integer(pa_type):
return int
elif pa.types.is_floating(pa_type):
return float
elif pa.types.is_string(pa_type) or pa.types.is_large_string(pa_type):
return str
elif (
pa.types.is_binary(pa_type)
or pa.types.is_fixed_size_binary(pa_type)
or pa.types.is_large_binary(pa_type)
):
return bytes
elif pa.types.is_boolean(pa_type):
return bool
elif pa.types.is_duration(pa_type):
if pa_type.unit == "ns":
return Timedelta
else:
return timedelta
elif pa.types.is_timestamp(pa_type):
if pa_type.unit == "ns":
return Timestamp
else:
return datetime
elif pa.types.is_date(pa_type):
return date
elif pa.types.is_time(pa_type):
return time
elif pa.types.is_decimal(pa_type):
return Decimal
elif pa.types.is_dictionary(pa_type):
# TODO: Potentially change this & CategoricalDtype.type to
# something more representative of the scalar
return CategoricalDtypeType
elif pa.types.is_list(pa_type) or pa.types.is_large_list(pa_type):
return list
elif pa.types.is_map(pa_type):
return dict
elif pa.types.is_null(pa_type):
# TODO: None? pd.NA? pa.null?
return type(pa_type)
else:
> raise NotImplementedError(pa_type)
E NotImplementedError: extension<test.period<PeriodType>>
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/arrays/arrow/dtype.py:148: NotImplementedError
________ test_extension_to_pandas_storage_type[registered_period_type1] ________
registered_period_type = (PeriodTypeWithClass(DataType(int64)), <class 'pyarrow.tests.test_extension_type.PeriodArray'>)
@pytest.mark.pandas
def test_extension_to_pandas_storage_type(registered_period_type):
period_type, _ = registered_period_type
np_arr = np.array([1, 2, 3, 4], dtype='i8')
storage = pa.array([1, 2, 3, 4], pa.int64())
arr = pa.ExtensionArray.from_storage(period_type, storage)
if isinstance(period_type, PeriodTypeWithToPandasDtype):
pandas_dtype = period_type.to_pandas_dtype()
else:
pandas_dtype = np_arr.dtype
# Test arrays
result = arr.to_pandas()
assert result.dtype == pandas_dtype
# Test chunked arrays
chunked_arr = pa.chunked_array([arr])
result = chunked_arr.to_numpy()
assert result.dtype == np_arr.dtype
result = chunked_arr.to_pandas()
assert result.dtype == pandas_dtype
# Test Table.to_pandas
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', None, None]),
pa.array([True, None, True, False]),
arr
]
my_schema = pa.schema([('f0', pa.int8()),
('f1', pa.string()),
('f2', pa.bool_()),
('ext', period_type)])
table = pa.Table.from_arrays(data, schema=my_schema)
result = table.to_pandas()
assert result["ext"].dtype == pandas_dtype
import pandas as pd
if Version(pd.__version__) > Version("2.0.0"):
# Check the usage of types_mapper
> result = table.to_pandas(types_mapper=pd.ArrowDtype)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_extension_type.py:1302:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow/array.pxi:852: in pyarrow.lib._PandasConvertible.to_pandas
???
pyarrow/table.pxi:4114: in pyarrow.lib.Table._to_pandas
???
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:820: in table_to_blockmanager
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:1170: in _table_to_blocks
return [_reconstruct_block(item, columns, extension_columns)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:1170: in <listcomp>
return [_reconstruct_block(item, columns, extension_columns)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:781: in _reconstruct_block
block = _int.make_block(pd_ext_arr, placement=placement)
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/internals/api.py:73: in make_block
if is_datetime64tz_dtype(values.dtype) or is_period_dtype(values.dtype):
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/dtypes/common.py:415: in is_period_dtype
return arr_or_dtype.type is Period
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = extension<test.period<PeriodTypeWithClass>>[pyarrow]
@property
def type(self):
"""
Returns associated scalar type.
"""
pa_type = self.pyarrow_dtype
if pa.types.is_integer(pa_type):
return int
elif pa.types.is_floating(pa_type):
return float
elif pa.types.is_string(pa_type) or pa.types.is_large_string(pa_type):
return str
elif (
pa.types.is_binary(pa_type)
or pa.types.is_fixed_size_binary(pa_type)
or pa.types.is_large_binary(pa_type)
):
return bytes
elif pa.types.is_boolean(pa_type):
return bool
elif pa.types.is_duration(pa_type):
if pa_type.unit == "ns":
return Timedelta
else:
return timedelta
elif pa.types.is_timestamp(pa_type):
if pa_type.unit == "ns":
return Timestamp
else:
return datetime
elif pa.types.is_date(pa_type):
return date
elif pa.types.is_time(pa_type):
return time
elif pa.types.is_decimal(pa_type):
return Decimal
elif pa.types.is_dictionary(pa_type):
# TODO: Potentially change this & CategoricalDtype.type to
# something more representative of the scalar
return CategoricalDtypeType
elif pa.types.is_list(pa_type) or pa.types.is_large_list(pa_type):
return list
elif pa.types.is_map(pa_type):
return dict
elif pa.types.is_null(pa_type):
# TODO: None? pd.NA? pa.null?
return type(pa_type)
else:
> raise NotImplementedError(pa_type)
E NotImplementedError: extension<test.period<PeriodTypeWithClass>>
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/arrays/arrow/dtype.py:148: NotImplementedError
________ test_extension_to_pandas_storage_type[registered_period_type2] ________
registered_period_type = (PeriodTypeWithToPandasDtype(DataType(int64)), <class 'pyarrow.lib.ExtensionArray'>)
@pytest.mark.pandas
def test_extension_to_pandas_storage_type(registered_period_type):
period_type, _ = registered_period_type
np_arr = np.array([1, 2, 3, 4], dtype='i8')
storage = pa.array([1, 2, 3, 4], pa.int64())
arr = pa.ExtensionArray.from_storage(period_type, storage)
if isinstance(period_type, PeriodTypeWithToPandasDtype):
pandas_dtype = period_type.to_pandas_dtype()
else:
pandas_dtype = np_arr.dtype
# Test arrays
result = arr.to_pandas()
assert result.dtype == pandas_dtype
# Test chunked arrays
chunked_arr = pa.chunked_array([arr])
result = chunked_arr.to_numpy()
assert result.dtype == np_arr.dtype
result = chunked_arr.to_pandas()
assert result.dtype == pandas_dtype
# Test Table.to_pandas
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', None, None]),
pa.array([True, None, True, False]),
arr
]
my_schema = pa.schema([('f0', pa.int8()),
('f1', pa.string()),
('f2', pa.bool_()),
('ext', period_type)])
table = pa.Table.from_arrays(data, schema=my_schema)
result = table.to_pandas()
assert result["ext"].dtype == pandas_dtype
import pandas as pd
if Version(pd.__version__) > Version("2.0.0"):
# Check the usage of types_mapper
> result = table.to_pandas(types_mapper=pd.ArrowDtype)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_extension_type.py:1302:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow/array.pxi:852: in pyarrow.lib._PandasConvertible.to_pandas
???
pyarrow/table.pxi:4114: in pyarrow.lib.Table._to_pandas
???
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:820: in table_to_blockmanager
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:1170: in _table_to_blocks
return [_reconstruct_block(item, columns, extension_columns)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:1170: in <listcomp>
return [_reconstruct_block(item, columns, extension_columns)
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:781: in _reconstruct_block
block = _int.make_block(pd_ext_arr, placement=placement)
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/internals/api.py:73: in make_block
if is_datetime64tz_dtype(values.dtype) or is_period_dtype(values.dtype):
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/dtypes/common.py:415: in is_period_dtype
return arr_or_dtype.type is Period
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = extension<test.period<PeriodTypeWithToPandasDtype>>[pyarrow]
@property
def type(self):
"""
Returns associated scalar type.
"""
pa_type = self.pyarrow_dtype
if pa.types.is_integer(pa_type):
return int
elif pa.types.is_floating(pa_type):
return float
elif pa.types.is_string(pa_type) or pa.types.is_large_string(pa_type):
return str
elif (
pa.types.is_binary(pa_type)
or pa.types.is_fixed_size_binary(pa_type)
or pa.types.is_large_binary(pa_type)
):
return bytes
elif pa.types.is_boolean(pa_type):
return bool
elif pa.types.is_duration(pa_type):
if pa_type.unit == "ns":
return Timedelta
else:
return timedelta
elif pa.types.is_timestamp(pa_type):
if pa_type.unit == "ns":
return Timestamp
else:
return datetime
elif pa.types.is_date(pa_type):
return date
elif pa.types.is_time(pa_type):
return time
elif pa.types.is_decimal(pa_type):
return Decimal
elif pa.types.is_dictionary(pa_type):
# TODO: Potentially change this & CategoricalDtype.type to
# something more representative of the scalar
return CategoricalDtypeType
elif pa.types.is_list(pa_type) or pa.types.is_large_list(pa_type):
return list
elif pa.types.is_map(pa_type):
return dict
elif pa.types.is_null(pa_type):
# TODO: None? pd.NA? pa.null?
return type(pa_type)
else:
> raise NotImplementedError(pa_type)
E NotImplementedError: extension<test.period<PeriodTypeWithToPandasDtype>>
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/arrays/arrow/dtype.py:148: NotImplementedError
```
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] raulcd commented on issue #35321: [Python] test_extension_to_pandas_storage_type fails with `NotImplementedError: extension>`
Posted by "raulcd (via GitHub)" <gi...@apache.org>.
raulcd commented on issue #35321:
URL: https://github.com/apache/arrow/issues/35321#issuecomment-1521399192
@jorisvandenbossche @AlenkaF This is also happening for our 12.0.0 - RC 0 and is failing on verification at the moment. I am unsure whether we require a new RC with a fix for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on issue #35321: [Python] test_extension_to_pandas_storage_type fails with `NotImplementedError: extension>`
Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35321:
URL: https://github.com/apache/arrow/issues/35321#issuecomment-1521469578
Actually, suppose this is a regression in pandas, so I am going to skip the test on pandas 2.0.1. The question about impact on release process stays the same though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on issue #35321: [Python] test_extension_to_pandas_storage_type fails with `NotImplementedError: extension>`
Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35321:
URL: https://github.com/apache/arrow/issues/35321#issuecomment-1521413500
I think this just needs a small correction in our test (how we check to skip, similarly as my comment at https://github.com/apache/arrow/pull/35248#discussion_r1172414341), and so this doesn't actually impact any runtime behaviour. If we are OK with the fact that those tests fail when using pandas 2.0.1, we might not need a new RC.
Doing a PR to fix it right now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche closed issue #35321: [Python] test_extension_to_pandas_storage_type fails with `NotImplementedError: extension>`
Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche closed issue #35321: [Python] test_extension_to_pandas_storage_type fails with `NotImplementedError: extension<test.period<PeriodTypeWithToPandasDtype>>`
URL: https://github.com/apache/arrow/issues/35321
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org