You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/04/04 12:29:45 UTC

[GitHub] [arrow] jorisvandenbossche opened a new issue, #34880: [Python][CI] Windows tests are failing with latest pandas 2.0

jorisvandenbossche opened a new issue, #34880:
URL: https://github.com/apache/arrow/issues/34880

   See eg https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/46696219. It seems to be related to int32 vs int64 being created
   
   <details>
   
   ```
   ================================== FAILURES ===================================
   _____________ TestZeroCopyConversion.test_zero_copy_dictionaries ______________
   self = <pyarrow.tests.test_pandas.TestZeroCopyConversion object at 0x000001947D9150E0>
       def test_zero_copy_dictionaries(self):
           arr = pa.DictionaryArray.from_arrays(
               np.array([0, 0]),
               np.array([5]))
       
           result = arr.to_pandas(zero_copy_only=True)
           values = pd.Categorical([5, 5])
       
   >       tm.assert_series_equal(pd.Series(result), pd.Series(values),
                                  check_names=False)
   E       AssertionError: Attributes of Series are different
   E       
   E       Attribute "dtype" are different
   E       [left]:  CategoricalDtype(categories=[5], ordered=False)
   E       [right]: CategoricalDtype(categories=[5], ordered=False)
   pyarrow\tests\test_pandas.py:2578: AssertionError
   _______ test_dataset_read_pandas_common_metadata[_metadata-False-True] ________
   tempdir = WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_dataset_read_pandas_commo2')
   use_legacy_dataset = True, preserve_index = False, metadata_fname = '_metadata'
       @pytest.mark.pandas
       @parametrize_legacy_dataset
       @pytest.mark.parametrize('preserve_index', [True, False, None])
       @pytest.mark.parametrize('metadata_fname', ["_metadata", "_common_metadata"])
       def test_dataset_read_pandas_common_metadata(
           tempdir, use_legacy_dataset, preserve_index, metadata_fname
       ):
           # ARROW-1103
           nfiles = 5
           size = 5
       
           dirpath = tempdir / guid()
           dirpath.mkdir()
       
           test_data = []
           frames = []
           paths = []
           for i in range(nfiles):
               df = _test_dataframe(size, seed=i)
               df.index = pd.Index(np.arange(i * size, (i + 1) * size), name='index')
       
               path = dirpath / '{}.parquet'.format(i)
       
               table = pa.Table.from_pandas(df, preserve_index=preserve_index)
       
               # Obliterate metadata
               table = table.replace_schema_metadata(None)
               assert table.schema.metadata is None
       
               _write_table(table, path)
               test_data.append(table)
               frames.append(df)
               paths.append(path)
       
           # Write _metadata common file
           table_for_metadata = pa.Table.from_pandas(
               df, preserve_index=preserve_index
           )
           pq.write_metadata(table_for_metadata.schema, dirpath / metadata_fname)
       
           dataset = pq.ParquetDataset(dirpath, use_legacy_dataset=use_legacy_dataset)
           columns = ['uint8', 'strings']
           result = dataset.read_pandas(columns=columns).to_pandas()
           expected = pd.concat([x[columns] for x in frames])
           expected.index.name = (
               df.index.name if preserve_index is not False else None)
   >       tm.assert_frame_equal(result, expected)
   pyarrow\tests\parquet\test_pandas.py:698: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   left = RangeIndex(start=0, stop=25, step=1)
   right = Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
          18, 19, 20, 21, 22, 23, 24],
         dtype='int32')
   obj = 'DataFrame.index'
       def _check_types(left, right, obj: str = "Index") -> None:
           if not exact:
               return
       
           assert_class_equal(left, right, exact=exact, obj=obj)
           assert_attr_equal("inferred_type", left, right, obj=obj)
       
           # Skip exact dtype checking when `check_categorical` is False
           if is_categorical_dtype(left.dtype) and is_categorical_dtype(right.dtype):
               if check_categorical:
                   assert_attr_equal("dtype", left, right, obj=obj)
                   assert_index_equal(left.categories, right.categories, exact=exact)
               return
       
   >       assert_attr_equal("dtype", left, right, obj=obj)
   E       AssertionError: DataFrame.index are different
   E       
   E       Attribute "dtype" are different
   E       [left]:  int64
   E       [right]: int32
   C:\Miniconda38-x64\envs\arrow\lib\site-packages\pandas\_testing\asserters.py:247: AssertionError
   _______ test_dataset_read_pandas_common_metadata[_metadata-False-False] _______
   tempdir = WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_dataset_read_pandas_commo3')
   use_legacy_dataset = False, preserve_index = False, metadata_fname = '_metadata'
       @pytest.mark.pandas
       @parametrize_legacy_dataset
       @pytest.mark.parametrize('preserve_index', [True, False, None])
       @pytest.mark.parametrize('metadata_fname', ["_metadata", "_common_metadata"])
       def test_dataset_read_pandas_common_metadata(
           tempdir, use_legacy_dataset, preserve_index, metadata_fname
       ):
           # ARROW-1103
           nfiles = 5
           size = 5
       
           dirpath = tempdir / guid()
           dirpath.mkdir()
       
           test_data = []
           frames = []
           paths = []
           for i in range(nfiles):
               df = _test_dataframe(size, seed=i)
               df.index = pd.Index(np.arange(i * size, (i + 1) * size), name='index')
       
               path = dirpath / '{}.parquet'.format(i)
       
               table = pa.Table.from_pandas(df, preserve_index=preserve_index)
       
               # Obliterate metadata
               table = table.replace_schema_metadata(None)
               assert table.schema.metadata is None
       
               _write_table(table, path)
               test_data.append(table)
               frames.append(df)
               paths.append(path)
       
           # Write _metadata common file
           table_for_metadata = pa.Table.from_pandas(
               df, preserve_index=preserve_index
           )
           pq.write_metadata(table_for_metadata.schema, dirpath / metadata_fname)
       
           dataset = pq.ParquetDataset(dirpath, use_legacy_dataset=use_legacy_dataset)
           columns = ['uint8', 'strings']
           result = dataset.read_pandas(columns=columns).to_pandas()
           expected = pd.concat([x[columns] for x in frames])
           expected.index.name = (
               df.index.name if preserve_index is not False else None)
   >       tm.assert_frame_equal(result, expected)
   pyarrow\tests\parquet\test_pandas.py:698: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   left = RangeIndex(start=0, stop=25, step=1)
   right = Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
          18, 19, 20, 21, 22, 23, 24],
         dtype='int32')
   obj = 'DataFrame.index'
       def _check_types(left, right, obj: str = "Index") -> None:
           if not exact:
               return
       
           assert_class_equal(left, right, exact=exact, obj=obj)
           assert_attr_equal("inferred_type", left, right, obj=obj)
       
           # Skip exact dtype checking when `check_categorical` is False
           if is_categorical_dtype(left.dtype) and is_categorical_dtype(right.dtype):
               if check_categorical:
                   assert_attr_equal("dtype", left, right, obj=obj)
                   assert_index_equal(left.categories, right.categories, exact=exact)
               return
       
   >       assert_attr_equal("dtype", left, right, obj=obj)
   E       AssertionError: DataFrame.index are different
   E       
   E       Attribute "dtype" are different
   E       [left]:  int64
   E       [right]: int32
   C:\Miniconda38-x64\envs\arrow\lib\site-packages\pandas\_testing\asserters.py:247: AssertionError
   ____ test_dataset_read_pandas_common_metadata[_common_metadata-False-True] ____
   tempdir = WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_dataset_read_pandas_commo8')
   use_legacy_dataset = True, preserve_index = False
   metadata_fname = '_common_metadata'
       @pytest.mark.pandas
       @parametrize_legacy_dataset
       @pytest.mark.parametrize('preserve_index', [True, False, None])
       @pytest.mark.parametrize('metadata_fname', ["_metadata", "_common_metadata"])
       def test_dataset_read_pandas_common_metadata(
           tempdir, use_legacy_dataset, preserve_index, metadata_fname
       ):
           # ARROW-1103
           nfiles = 5
           size = 5
       
           dirpath = tempdir / guid()
           dirpath.mkdir()
       
           test_data = []
           frames = []
           paths = []
           for i in range(nfiles):
               df = _test_dataframe(size, seed=i)
               df.index = pd.Index(np.arange(i * size, (i + 1) * size), name='index')
       
               path = dirpath / '{}.parquet'.format(i)
       
               table = pa.Table.from_pandas(df, preserve_index=preserve_index)
       
               # Obliterate metadata
               table = table.replace_schema_metadata(None)
               assert table.schema.metadata is None
       
               _write_table(table, path)
               test_data.append(table)
               frames.append(df)
               paths.append(path)
       
           # Write _metadata common file
           table_for_metadata = pa.Table.from_pandas(
               df, preserve_index=preserve_index
           )
           pq.write_metadata(table_for_metadata.schema, dirpath / metadata_fname)
       
           dataset = pq.ParquetDataset(dirpath, use_legacy_dataset=use_legacy_dataset)
           columns = ['uint8', 'strings']
           result = dataset.read_pandas(columns=columns).to_pandas()
           expected = pd.concat([x[columns] for x in frames])
           expected.index.name = (
               df.index.name if preserve_index is not False else None)
   >       tm.assert_frame_equal(result, expected)
   pyarrow\tests\parquet\test_pandas.py:698: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   left = RangeIndex(start=0, stop=25, step=1)
   right = Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
          18, 19, 20, 21, 22, 23, 24],
         dtype='int32')
   obj = 'DataFrame.index'
       def _check_types(left, right, obj: str = "Index") -> None:
           if not exact:
               return
       
           assert_class_equal(left, right, exact=exact, obj=obj)
           assert_attr_equal("inferred_type", left, right, obj=obj)
       
           # Skip exact dtype checking when `check_categorical` is False
           if is_categorical_dtype(left.dtype) and is_categorical_dtype(right.dtype):
               if check_categorical:
                   assert_attr_equal("dtype", left, right, obj=obj)
                   assert_index_equal(left.categories, right.categories, exact=exact)
               return
       
   >       assert_attr_equal("dtype", left, right, obj=obj)
   E       AssertionError: DataFrame.index are different
   E       
   E       Attribute "dtype" are different
   E       [left]:  int64
   E       [right]: int32
   C:\Miniconda38-x64\envs\arrow\lib\site-packages\pandas\_testing\asserters.py:247: AssertionError
   ___ test_dataset_read_pandas_common_metadata[_common_metadata-False-False] ____
   tempdir = WindowsPath('C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_dataset_read_pandas_commo9')
   use_legacy_dataset = False, preserve_index = False
   metadata_fname = '_common_metadata'
       @pytest.mark.pandas
       @parametrize_legacy_dataset
       @pytest.mark.parametrize('preserve_index', [True, False, None])
       @pytest.mark.parametrize('metadata_fname', ["_metadata", "_common_metadata"])
       def test_dataset_read_pandas_common_metadata(
           tempdir, use_legacy_dataset, preserve_index, metadata_fname
       ):
           # ARROW-1103
           nfiles = 5
           size = 5
       
           dirpath = tempdir / guid()
           dirpath.mkdir()
       
           test_data = []
           frames = []
           paths = []
           for i in range(nfiles):
               df = _test_dataframe(size, seed=i)
               df.index = pd.Index(np.arange(i * size, (i + 1) * size), name='index')
       
               path = dirpath / '{}.parquet'.format(i)
       
               table = pa.Table.from_pandas(df, preserve_index=preserve_index)
       
               # Obliterate metadata
               table = table.replace_schema_metadata(None)
               assert table.schema.metadata is None
       
               _write_table(table, path)
               test_data.append(table)
               frames.append(df)
               paths.append(path)
       
           # Write _metadata common file
           table_for_metadata = pa.Table.from_pandas(
               df, preserve_index=preserve_index
           )
           pq.write_metadata(table_for_metadata.schema, dirpath / metadata_fname)
       
           dataset = pq.ParquetDataset(dirpath, use_legacy_dataset=use_legacy_dataset)
           columns = ['uint8', 'strings']
           result = dataset.read_pandas(columns=columns).to_pandas()
           expected = pd.concat([x[columns] for x in frames])
           expected.index.name = (
               df.index.name if preserve_index is not False else None)
   >       tm.assert_frame_equal(result, expected)
   pyarrow\tests\parquet\test_pandas.py:698: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   left = RangeIndex(start=0, stop=25, step=1)
   right = Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
          18, 19, 20, 21, 22, 23, 24],
         dtype='int32')
   obj = 'DataFrame.index'
       def _check_types(left, right, obj: str = "Index") -> None:
           if not exact:
               return
       
           assert_class_equal(left, right, exact=exact, obj=obj)
           assert_attr_equal("inferred_type", left, right, obj=obj)
       
           # Skip exact dtype checking when `check_categorical` is False
           if is_categorical_dtype(left.dtype) and is_categorical_dtype(right.dtype):
               if check_categorical:
                   assert_attr_equal("dtype", left, right, obj=obj)
                   assert_index_equal(left.categories, right.categories, exact=exact)
               return
       
   >       assert_attr_equal("dtype", left, right, obj=obj)
   E       AssertionError: DataFrame.index are different
   E       
   E       Attribute "dtype" are different
   E       [left]:  int64
   E       [right]: int32
   C:\Miniconda38-x64\envs\arrow\lib\site-packages\pandas\_testing\asserters.py:247: AssertionError
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] assignUser closed issue #34880: [Python][CI] Windows tests are failing with latest pandas 2.0

Posted by "assignUser (via GitHub)" <gi...@apache.org>.

assignUser closed issue #34880: [Python][CI] Windows tests are failing with latest pandas 2.0
URL: https://github.com/apache/arrow/issues/34880


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on issue #34880: [Python][CI] Windows tests are failing with latest pandas 2.0

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.

jorisvandenbossche commented on issue #34880:
URL: https://github.com/apache/arrow/issues/34880#issuecomment-1495906767

   I suppose this might need some additional fixed in the same line of https://github.com/apache/arrow/pull/34498 (there we only fixed the failures that appeared on non-windows builds)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org