You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/11 17:56:39 UTC

[GitHub] [arrow-rs] ghuls opened a new issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

ghuls opened a new issue #286:
URL: https://github.com/apache/arrow-rs/issues/286


   **Describe the bug**
   
   Original bug report is here (agains polars, which was using arrow-rs for parsing Feather v2 files (IPC)):
   https://github.com/pola-rs/polars/issues/623
   
   Unable to load Feather v2 files created by pyarrow and pandas.
   
   Those files can be loaded fine by pyarrow and pandas itself.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   Try to load the attached Feather files:
   [test_feather_file.zip](https://github.com/apache/arrow-rs/files/6461057/test_feather_file.zip)
   )
   
   ```
   test_pandas.feather: Original Feather file
   test_arrow.feather: loading test_pandas.feather with pyarrow and saving with pyarrow: df_pa = pa.feather.read_feather('test_pandas.feather')
   test_polars.feather:  Loading test_pandas.feather with pyarrow and saving with polars (this one can be read by arrow-rs)
   test_pandas_from_polars.feather: Loading test_polars.feather with polars and using the to_pandas option.
   ```
   
   **Expected behavior**
   
   Feather v2 files can be opened by arrow-rs.
   
   **Additional context**
   
   ```python
   import polars as pl
   import pyarrow as pa
   import pandas as pd
   
   # Reading Feather file created with Pandas with pyarrow works fine.
   df_pa = pa.feather.read_feather('test_pandas.feather')
   
   # Write pyarrow dataframe to Feather file.
   df_pa.to_feather('test_arrow.feather')
   
   # Convert pyarrow dataframe to polars dataframe.
   df_pl = pl.DataFrame(df_pa)
   
   # Convert polars dataframe to pandas dataframe.
   df_pd = df_pl.to_pandas()
   
   # Write pandas dataframe  to feather file.
   df_pd.to_feather('test_pandas_from_polars.feather')
   
   
   In [88]: df_pa
   Out[88]: 
      motif1  motif2  motif3  motif4 regions
   0     1.2     3.0     0.3     5.6    reg1
   1     6.7     3.0     4.3     5.6    reg2
   2     3.5     3.0     0.0     0.0    reg3
   3     0.0     3.0     0.0     5.6    reg4
   4     2.4     3.0     7.8     1.2    reg5
   5     2.4     3.0     0.6     0.0    reg6
   6     2.4     3.0     7.7     0.0    reg7
   
   In [89]: df_pl
   Out[89]: 
   shape: (7, 5)
   ╭────────┬────────┬────────┬────────┬─────────╮
   │ motif1 ┆ motif2 ┆ motif3 ┆ motif4 ┆ regions │
   │ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---     │
   │ f64    ┆ f64    ┆ f64    ┆ f64    ┆ str     │
   ╞════════╪════════╪════════╪════════╪═════════╡
   │ 1.2    ┆ 3      ┆ 0.3    ┆ 5.6    ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.7    ┆ 3      ┆ 4.3    ┆ 5.6    ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5    ┆ 3      ┆ 0.0    ┆ 0.0    ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0    ┆ 3      ┆ 0.0    ┆ 5.6    ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4    ┆ 3      ┆ 7.8    ┆ 1.2    ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4    ┆ 3      ┆ 0.6    ┆ 0.0    ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4    ┆ 3      ┆ 7.7    ┆ 0.0    ┆ "reg7"  │
   ╰────────┴────────┴────────┴────────┴─────────╯
   
   In [90]: df_pd
   Out[90]: 
      motif1  motif2  motif3  motif4 regions
   0     1.2     3.0     0.3     5.6    reg1
   1     6.7     3.0     4.3     5.6    reg2
   2     3.5     3.0     0.0     0.0    reg3
   3     0.0     3.0     0.0     5.6    reg4
   4     2.4     3.0     7.8     1.2    reg5
   5     2.4     3.0     0.6     0.0    reg6
   6     2.4     3.0     7.7     0.0    reg7
   
   
   
   In [103]: pl.read_ipc('test_polars.feather')
   Out[103]: 
   shape: (7, 5)
   ╭────────┬────────┬────────┬────────┬─────────╮
   │ motif1 ┆ motif2 ┆ motif3 ┆ motif4 ┆ regions │
   │ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---     │
   │ f64    ┆ f64    ┆ f64    ┆ f64    ┆ str     │
   ╞════════╪════════╪════════╪════════╪═════════╡
   │ 1.2    ┆ 3      ┆ 0.3    ┆ 5.6    ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.7    ┆ 3      ┆ 4.3    ┆ 5.6    ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5    ┆ 3      ┆ 0.0    ┆ 0.0    ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0    ┆ 3      ┆ 0.0    ┆ 5.6    ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4    ┆ 3      ┆ 7.8    ┆ 1.2    ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4    ┆ 3      ┆ 0.6    ┆ 0.0    ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4    ┆ 3      ┆ 7.7    ┆ 0.0    ┆ "reg7"  │
   ╰────────┴────────┴────────┴────────┴─────────╯
   
   In [104]: pl.read_ipc('test_arrow.feather')
   thread '<unnamed>' panicked at 'assertion failed: prefix.is_empty() && suffix.is_empty()', /github/home/.cargo/git/checkouts/arrow-rs-3b86e19e889d5acc/d008f31/arrow/src/buffer/immutable.rs:179:9
   ---------------------------------------------------------------------------
   PanicException                            Traceback (most recent call last)
   <ipython-input-104-f9a22f9a0eb1> in <module>
   ----> 1 pl.read_ipc('test_arrow.feather')
   
   ~/software/anaconda3/envs/create_cistarget_databases/lib/python3.8/site-packages/polars/functions.py in read_ipc(file)
       278     """
       279     file = _prepare_file_arg(file)
   --> 280     return DataFrame.read_ipc(file)
       281 
       282 
   
   ~/software/anaconda3/envs/create_cistarget_databases/lib/python3.8/site-packages/polars/frame.py in read_ipc(file)
       235         """
       236         self = DataFrame.__new__(DataFrame)
   --> 237         self._df = PyDataFrame.read_ipc(file)
       238         return self
       239 
   
   PanicException: assertion failed: prefix.is_empty() && suffix.is_empty()
   
   In [105]: pl.read_ipc('test_pandas.feather')
   thread '<unnamed>' panicked at 'assertion failed: prefix.is_empty() && suffix.is_empty()', /github/home/.cargo/git/checkouts/arrow-rs-3b86e19e889d5acc/d008f31/arrow/src/buffer/immutable.rs:179:9
   ---------------------------------------------------------------------------
   PanicException                            Traceback (most recent call last)
   <ipython-input-105-35809d9ae65f> in <module>
   ----> 1 pl.read_ipc('test_pandas.feather')
   
   ~/software/anaconda3/envs/create_cistarget_databases/lib/python3.8/site-packages/polars/functions.py in read_ipc(file)
       278     """
       279     file = _prepare_file_arg(file)
   --> 280     return DataFrame.read_ipc(file)
       281 
       282 
   
   ~/software/anaconda3/envs/create_cistarget_databases/lib/python3.8/site-packages/polars/frame.py in read_ipc(file)
       235         """
       236         self = DataFrame.__new__(DataFrame)
   --> 237         self._df = PyDataFrame.read_ipc(file)
       238         return self
       239 
   
   PanicException: assertion failed: prefix.is_empty() && suffix.is_empty()
   
   In [106]: pl.read_ipc('test_pandas_from_polars.feather')
   thread '<unnamed>' panicked at 'assertion failed: prefix.is_empty() && suffix.is_empty()', /github/home/.cargo/git/checkouts/arrow-rs-3b86e19e889d5acc/d008f31/arrow/src/buffer/immutable.rs:179:9
   ---------------------------------------------------------------------------
   PanicException                            Traceback (most recent call last)
   <ipython-input-107-d0a17f51c6ac> in <module>
   ----> 1 pl.read_ipc('test_pandas_from_polars.feather')
   
   ~/software/anaconda3/envs/create_cistarget_databases/lib/python3.8/site-packages/polars/functions.py in read_ipc(file)
       278     """
       279     file = _prepare_file_arg(file)
   --> 280     return DataFrame.read_ipc(file)
       281 
       282 
   
   ~/software/anaconda3/envs/create_cistarget_databases/lib/python3.8/site-packages/polars/frame.py in read_ipc(file)
       235         """
       236         self = DataFrame.__new__(DataFrame)
   --> 237         self._df = PyDataFrame.read_ipc(file)
       238         return self
       239 
   
   PanicException: assertion failed: prefix.is_empty() && suffix.is_empty()
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls edited a comment on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls edited a comment on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-865384731


   @jorgecarleitao I think I might have figured out the problem.
   
   ```python
   import polars as pl
   import pyarrow as pa
   import pandas as pd
   
   # Read Feather file written with pandas, with pa,feather.read_feather (wrapped inside pl.read_ipc) in Polars dataframe.
   df_pl = pl.read_ipc('test_pandas.feather', use_pyarrow=True)
   
   # Convert Polars dataframe to arrow table and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow(), 'test_polars_to_arrow_uncompressed.feather', compression='uncompressed', version=2)
   
   # Convert Polars dataframe to arrow table and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow(), 'test_polars_to_arrow_lz4.feather', compression='lz4', version=2)
   
   # Convert Polars dataframe to arrow table and convert arrow table to pandas dataframe and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow().to_pandas(), 'test_polars_to_arrow_to_pandas_uncompressed.feather', compression='uncompressed', version=2)
   
   # Convert Polars dataframe to arrow table and convert arrow table to pandas dataframe and write to Feather v2 file with lz4 compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow().to_pandas(), 'test_polars_to_arrow_to_pandas_lz4.feather', compression='lz4', version=2)
   
   
   # Now try to read all those files with polars without using the pyarrow Feather reading code, but the arrow-rs code instead.
   
   # Reading Feather v2 file without compression containing saved arrow table data, works.
   In [9]: pl.read_ipc('test_polars_to_arrow_uncompressed.feather', use_pyarrow=False)
   Out[9]: 
   shape: (7, 5)
   ╭────────────────────┬────────┬─────────────────────┬────────────────────┬─────────╮
   │ motif1             ┆ motif2 ┆ motif3              ┆ motif4             ┆ regions │
   │ ---                ┆ ---    ┆ ---                 ┆ ---                ┆ ---     │
   │ f32                ┆ f32    ┆ f32                 ┆ f32                ┆ str     │
   ╞════════════════════╪════════╪═════════════════════╪════════════════════╪═════════╡
   │ 1.2000000476837158 ┆ 3      ┆ 0.30000001192092896 ┆ 5.599999904632568  ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.699999809265137  ┆ 3      ┆ 4.300000190734863   ┆ 5.599999904632568  ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5                ┆ 3      ┆ 0.0                 ┆ 0.0                ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0                ┆ 3      ┆ 0.0                 ┆ 5.599999904632568  ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.800000190734863   ┆ 1.2000000476837158 ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 0.6000000238418579  ┆ 0.0                ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.699999809265137   ┆ 0.0                ┆ "reg7"  │
   ╰────────────────────┴────────┴─────────────────────┴────────────────────┴─────────╯
   
   
   # Reading Feather v2 file without compression containing saved pandas dataframe, works.
   In [10]: pl.read_ipc('test_polars_to_arrow_to_pandas_uncompressed.feather', use_pyarrow=False)
   Out[10]: 
   shape: (7, 5)
   ╭────────────────────┬────────┬─────────────────────┬────────────────────┬─────────╮
   │ motif1             ┆ motif2 ┆ motif3              ┆ motif4             ┆ regions │
   │ ---                ┆ ---    ┆ ---                 ┆ ---                ┆ ---     │
   │ f32                ┆ f32    ┆ f32                 ┆ f32                ┆ str     │
   ╞════════════════════╪════════╪═════════════════════╪════════════════════╪═════════╡
   │ 1.2000000476837158 ┆ 3      ┆ 0.30000001192092896 ┆ 5.599999904632568  ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.699999809265137  ┆ 3      ┆ 4.300000190734863   ┆ 5.599999904632568  ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5                ┆ 3      ┆ 0.0                 ┆ 0.0                ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0                ┆ 3      ┆ 0.0                 ┆ 5.599999904632568  ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.800000190734863   ┆ 1.2000000476837158 ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 0.6000000238418579  ┆ 0.0                ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.699999809265137   ┆ 0.0                ┆ "reg7"  │
   ╰────────────────────┴────────┴─────────────────────┴────────────────────┴─────────╯
   
   
   # Reading Feather v2 file with lz4 compression containing saved pandas dataframe, gives the error from the first post.
   In [11]: pl.read_ipc('test_polars_to_arrow_to_pandas_lz4.feather', use_pyarrow=False)
   thread '<unnamed>' panicked at 'assertion failed: prefix.is_empty() && suffix.is_empty()', /github/home/.cargo/git/checkouts/arrow-rs-3b86e19e889d5acc/9f56afb/arrow/src/buffer/immutable.rs:179:9
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ---------------------------------------------------------------------------
   PanicException                            Traceback (most recent call last)
   <ipython-input-11-04613b1d0975> in <module>
   ----> 1 pl.read_ipc('test_polars_to_arrow_to_pandas_lz4.feather', use_pyarrow=False)
   /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/polars/functions.py in read_ipc(file, use_pyarrow)
       337     """
       338     file = _prepare_file_arg(file)
   --> 339     return DataFrame.read_ipc(file, use_pyarrow)
       340 
       341 
   
   /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/polars/frame.py in read_ipc(file, use_pyarrow)
       302 
       303         self = DataFrame.__new__(DataFrame)
   --> 304         self._df = PyDataFrame.read_ipc(file)
       305         return self
       306 
   
   PanicException: assertion failed: prefix.is_empty() && suffix.is_empty()
   
   
   # Reading Feather v2 file with lz4 compression containing saved pyarrow table, results in killing of iPython due to trying to allocate a too big buffer.
   In [12]: pl.read_ipc('test_polars_to_arrow_lz4.feather', use_pyarrow=False)
   Out[12]: memory allocation of 2702793507844465093 bytes failed
   Aborted
   ```
   
   So to me it looks like that arrow-rs is not detecting that pyarrow saved the Feather file with lz4 compression and I guess it is reading data (or offsets) from the wrong locations.
   
   ```python
   In [6]: ?pa.feather.write_feather
   Signature:
   pa.feather.write_feather(
       df,
       dest,
       compression=None,
       compression_level=None,
       chunksize=None,
       version=2,
   )
   Docstring:
   Write a pandas.DataFrame to Feather format.
   
   Parameters
   ----------
   df : pandas.DataFrame or pyarrow.Table
       Data to write out as Feather format.
   dest : str
       Local destination path.
   compression : string, default None
       Can be one of {"zstd", "lz4", "uncompressed"}. The default of None uses
       LZ4 for V2 files if it is available, otherwise uncompressed.
   compression_level : int, default None
       Use a compression level particular to the chosen compressor. If None
       use the default compression level
   chunksize : int, default None
       For V2 files, the internal maximum size of Arrow RecordBatch chunks
       when writing the Arrow IPC file format. None means use the default,
       which is currently 64K
   version : int, default 2
       Feather file version. Version 2 is the current. Version 1 is the more
       limited legacy format
   File:      /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/feather.py
   Type:      function
   ```
   
   Feather files are attached:
   [test_feather_polars_to_pyarrow.zip](https://github.com/apache/arrow-rs/files/6689794/test_feather_polars_to_pyarrow.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-843068396


   > 
   > IPC File Format
   > 
   > We define a “file format” supporting random access that is build with the stream format. The file starts and ends with a magic string ARROW1 (plus padding). What follows in the file is identical to the stream format. At the end of the file, we write a footer containing a redundant copy of the schema (which is a part of the streaming format) plus memory offsets and sizes for each of the data blocks in the file. This enables random access any record batch in the file. See [File.fbs](
   https://github.com/apache/arrow/blob/master/format/File.fbs) for the precise details of the file footer.
   > 
   > Schematically we have:
   > 
   ```
   <magic number "ARROW1">
   <empty padding bytes [to 8 byte boundary]>
   <STREAMING FORMAT with EOS>
   <FOOTER>
   <FOOTER SIZE: int32>
   <magic number "ARROW1">
   ```
   > 
   > In the file format, there is no requirement that dictionary keys should be defined in a DictionaryBatch before they are used in a RecordBatch, as long as the keys are defined somewhere in the file. Further more, it is invalid to have more than one non-delta dictionary batch per dictionary ID (i.e. dictionary replacement is not supported). Delta dictionaries are applied in the order they appear in the file footer.
   > 
   
   https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls edited a comment on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls edited a comment on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839462080


   It should be IPC on disk with optional compression with lz4 or zstd:
   
   https://arrow.apache.org/docs/python/feather.html
   https://ursalabs.org/blog/2020-feather-v2/
   
   Feather v1 is indeed a total different format. (header bytes: `FEA1` instead of `ARROW1`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao edited a comment on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
jorgecarleitao edited a comment on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839484860


   Nice, learnt something new today. Thanks for the explanation
   
   This is indeed a bug, and a dangerous one because that prefix and suffix imply that we allowed misaligned bytes to go to the `MutableBuffer` (that check is like the last line of defense against UB).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839484860


   Nice, learnt something new today. Thanks for the explanation
   
   This is indeed a bug, and a dangerous one because that prefix and suffix imply that allow mis-aligned bytes to go to the `MutableBuffer`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839524898


   More details: in both files, I am getting the following:
   
   ```
   Reading Utf8
   field_node: FieldNode { length: 7, null_count: 0 }
   offset buffer: Buffer { offset: 200, length: 55 }
   offsets: [32, 0, 407708164, 545407072, 8388608, 67108864, 134217728, 201326592]
   values buffer: Buffer { offset: 256, length: 51 }
   ```
   
   * `offsets[0] != 0` indicates a problem: offsets are expected to start from zero on any array with offsets.
   * `offsets[i+1] < offsets[i+1]` for some `i`, which indicates a problem: offsets are expected to be monotonically increasing
   
   I do not have a root cause yet, these are just observations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-865845189


   @nevi-me A pity it is not supported (yet) as Pandas and pyarrow will write Feather files with lz4 compression by default (at least when using the official packages). At least arrow-rs should detect that a compression codec is used that it does not support yet, instead of doing the wrong thing and reading compressed data as uncompressed data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839481639


   Here is the original commit that introduced Feather v2 support in Arrow: https://github.com/apache/arrow/commit/e03251c2408f75441844b3293904c5ea43343ba3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839462080


   It should be IPC on disk with optional compression with lz4 or zstd:
   
   https://arrow.apache.org/docs/python/feather.html
   https://ursalabs.org/blog/2020-feather-v2/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] nevi-me commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
nevi-me commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-865817529


   @ghuls compression isn't supported, see https://github.com/apache/arrow-rs/issues/70 and https://issues.apache.org/jira/browse/ARROW-8676. I had a PR for this, but struggled with getting integration tests to pass, so I abandoned it as I didn't have more time for it.
   
   Here's the PR: https://github.com/apache/arrow/pull/9137


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839578135


   Could it be that this difference you see is due tostreaming IPC vs random access IPC format?
   
   > For most cases, it is most convenient to use the RecordBatchStreamReader or RecordBatchFileReader class, depending on which variant of the IPC format you want to read. The former requires a InputStream source, while the latter requires a RandomAccessFile.
   > 
   > Reading Arrow IPC data is inherently zero-copy if the source allows it. For example, a BufferReader or MemoryMappedFile can typically be zero-copy. Exceptions are when the data must be transformed on the fly, e.g. when buffer compression has been enabled on the IPC stream or file.
   
   https://arrow.apache.org/docs/cpp/ipc.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-865384731


   @jorgecarleitao I think I might have figured out the problem.
   
   ```python
   import polars as pl
   import pyarrow as pa
   import pandas as pd
   
   # Read Feather file written with pandas, with pa,feather.read_feather (wrapped inside pl.read_ipc) in Polars dataframe.
   df_pl = pl.read_ipc('test_pandas.feather', use_pyarrow=True)
   
   # Convert Polars dataframe to arrow table and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow(), 'test_polars_to_arrow_uncompressed.feather', compression='uncompressed', version=2)
   
   # Convert Polars dataframe to arrow table and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow(), 'test_polars_to_arrow_lz4.feather', compression='lz4', version=2)
   
   # Convert Polars dataframe to arrow table and convert arrow table to pandas dataframe and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow().to_pandas(), 'test_polars_to_arrow_to_pandas_uncompressed.feather', compression='uncompressed', version=2)
   
   # Convert Polars dataframe to arrow table and convert arrow table to pandas dataframe and write to Feather v2 file with lz4 compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow().to_pandas(), 'test_polars_to_arrow_to_pandas_lz4.feather', compression='lz4', version=2)
   
   
   # Now try to read all those files with polars without using the pyarrow Feather reading code, but the arrow-rs code instead.
   
   # Reading Feather v2 file without compression containing saved arrow table data, works.
   In [9]: pl.read_ipc('test_polars_to_arrow_uncompressed.feather', use_pyarrow=False)
   Out[9]: 
   shape: (7, 5)
   ╭────────────────────┬────────┬─────────────────────┬────────────────────┬─────────╮
   │ motif1             ┆ motif2 ┆ motif3              ┆ motif4             ┆ regions │
   │ ---                ┆ ---    ┆ ---                 ┆ ---                ┆ ---     │
   │ f32                ┆ f32    ┆ f32                 ┆ f32                ┆ str     │
   ╞════════════════════╪════════╪═════════════════════╪════════════════════╪═════════╡
   │ 1.2000000476837158 ┆ 3      ┆ 0.30000001192092896 ┆ 5.599999904632568  ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.699999809265137  ┆ 3      ┆ 4.300000190734863   ┆ 5.599999904632568  ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5                ┆ 3      ┆ 0.0                 ┆ 0.0                ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0                ┆ 3      ┆ 0.0                 ┆ 5.599999904632568  ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.800000190734863   ┆ 1.2000000476837158 ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 0.6000000238418579  ┆ 0.0                ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.699999809265137   ┆ 0.0                ┆ "reg7"  │
   ╰────────────────────┴────────┴─────────────────────┴────────────────────┴─────────╯
   
   
   # Reading Feather v2 file without compression containing saved pandas dataframe, works.
   In [10]: pl.read_ipc('test_polars_to_arrow_to_pandas_uncompressed.feather', use_pyarrow=False)
   Out[10]: 
   shape: (7, 5)
   ╭────────────────────┬────────┬─────────────────────┬────────────────────┬─────────╮
   │ motif1             ┆ motif2 ┆ motif3              ┆ motif4             ┆ regions │
   │ ---                ┆ ---    ┆ ---                 ┆ ---                ┆ ---     │
   │ f32                ┆ f32    ┆ f32                 ┆ f32                ┆ str     │
   ╞════════════════════╪════════╪═════════════════════╪════════════════════╪═════════╡
   │ 1.2000000476837158 ┆ 3      ┆ 0.30000001192092896 ┆ 5.599999904632568  ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.699999809265137  ┆ 3      ┆ 4.300000190734863   ┆ 5.599999904632568  ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5                ┆ 3      ┆ 0.0                 ┆ 0.0                ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0                ┆ 3      ┆ 0.0                 ┆ 5.599999904632568  ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.800000190734863   ┆ 1.2000000476837158 ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 0.6000000238418579  ┆ 0.0                ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.699999809265137   ┆ 0.0                ┆ "reg7"  │
   ╰────────────────────┴────────┴─────────────────────┴────────────────────┴─────────╯
   
   
   # Reading Feather v2 file with lz4 compression containing saved pandas dataframe, gives the error from the first post.
   In [11]: pl.read_ipc('test_polars_to_arrow_to_pandas_lz4.feather', use_pyarrow=False)
   thread '<unnamed>' panicked at 'assertion failed: prefix.is_empty() && suffix.is_empty()', /github/home/.cargo/git/checkouts/arrow-rs-3b86e19e889d5acc/9f56afb/arrow/src/buffer/immutable.rs:179:9
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ---------------------------------------------------------------------------
   PanicException                            Traceback (most recent call last)
   <ipython-input-11-04613b1d0975> in <module>
   ----> 1 pl.read_ipc('test_polars_to_arrow_to_pandas_lz4.feather', use_pyarrow=False)
   /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/polars/functions.py in read_ipc(file, use_pyarrow)
       337     """
       338     file = _prepare_file_arg(file)
   --> 339     return DataFrame.read_ipc(file, use_pyarrow)
       340 
       341 
   
   /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/polars/frame.py in read_ipc(file, use_pyarrow)
       302 
       303         self = DataFrame.__new__(DataFrame)
   --> 304         self._df = PyDataFrame.read_ipc(file)
       305         return self
       306 
   
   PanicException: assertion failed: prefix.is_empty() && suffix.is_empty()
   
   
   # Reading Feather v2 file with lz4 compression containing saved pyarrow table, results in killing of iPython due to trying to allocate a too big buffer.
   In [12]: pl.read_ipc('test_polars_to_arrow_lz4.feather', use_pyarrow=False)
   Out[12]: memory allocation of 2702793507844465093 bytes failed
   Aborted
   ```
   
   So to me it looks like that arrow-rs is not detecting that pyarrow saved the Feather file with compression and I guess it is reading data (or offsets) from the wrong locations.
   
   
   [test_feather_polars_to_pyarrow.zip](https://github.com/apache/arrow-rs/files/6689794/test_feather_polars_to_pyarrow.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-865384731


   @jorgecarleitao I think I might have figured out the problem.
   
   ```python
   import polars as pl
   import pyarrow as pa
   import pandas as pd
   
   # Read Feather file written with pandas, with pa,feather.read_feather (wrapped inside pl.read_ipc) in Polars dataframe.
   df_pl = pl.read_ipc('test_pandas.feather', use_pyarrow=True)
   
   # Convert Polars dataframe to arrow table and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow(), 'test_polars_to_arrow_uncompressed.feather', compression='uncompressed', version=2)
   
   # Convert Polars dataframe to arrow table and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow(), 'test_polars_to_arrow_lz4.feather', compression='lz4', version=2)
   
   # Convert Polars dataframe to arrow table and convert arrow table to pandas dataframe and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow().to_pandas(), 'test_polars_to_arrow_to_pandas_uncompressed.feather', compression='uncompressed', version=2)
   
   # Convert Polars dataframe to arrow table and convert arrow table to pandas dataframe and write to Feather v2 file with lz4 compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow().to_pandas(), 'test_polars_to_arrow_to_pandas_lz4.feather', compression='lz4', version=2)
   
   
   # Now try to read all those files with polars without using the pyarrow Feather reading code, but the arrow-rs code instead.
   
   # Reading Feather v2 file without compression containing saved arrow table data, works.
   In [9]: pl.read_ipc('test_polars_to_arrow_uncompressed.feather', use_pyarrow=False)
   Out[9]: 
   shape: (7, 5)
   ╭────────────────────┬────────┬─────────────────────┬────────────────────┬─────────╮
   │ motif1             ┆ motif2 ┆ motif3              ┆ motif4             ┆ regions │
   │ ---                ┆ ---    ┆ ---                 ┆ ---                ┆ ---     │
   │ f32                ┆ f32    ┆ f32                 ┆ f32                ┆ str     │
   ╞════════════════════╪════════╪═════════════════════╪════════════════════╪═════════╡
   │ 1.2000000476837158 ┆ 3      ┆ 0.30000001192092896 ┆ 5.599999904632568  ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.699999809265137  ┆ 3      ┆ 4.300000190734863   ┆ 5.599999904632568  ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5                ┆ 3      ┆ 0.0                 ┆ 0.0                ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0                ┆ 3      ┆ 0.0                 ┆ 5.599999904632568  ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.800000190734863   ┆ 1.2000000476837158 ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 0.6000000238418579  ┆ 0.0                ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.699999809265137   ┆ 0.0                ┆ "reg7"  │
   ╰────────────────────┴────────┴─────────────────────┴────────────────────┴─────────╯
   
   
   # Reading Feather v2 file without compression containing saved pandas dataframe, works.
   In [10]: pl.read_ipc('test_polars_to_arrow_to_pandas_uncompressed.feather', use_pyarrow=False)
   Out[10]: 
   shape: (7, 5)
   ╭────────────────────┬────────┬─────────────────────┬────────────────────┬─────────╮
   │ motif1             ┆ motif2 ┆ motif3              ┆ motif4             ┆ regions │
   │ ---                ┆ ---    ┆ ---                 ┆ ---                ┆ ---     │
   │ f32                ┆ f32    ┆ f32                 ┆ f32                ┆ str     │
   ╞════════════════════╪════════╪═════════════════════╪════════════════════╪═════════╡
   │ 1.2000000476837158 ┆ 3      ┆ 0.30000001192092896 ┆ 5.599999904632568  ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.699999809265137  ┆ 3      ┆ 4.300000190734863   ┆ 5.599999904632568  ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5                ┆ 3      ┆ 0.0                 ┆ 0.0                ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0                ┆ 3      ┆ 0.0                 ┆ 5.599999904632568  ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.800000190734863   ┆ 1.2000000476837158 ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 0.6000000238418579  ┆ 0.0                ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.699999809265137   ┆ 0.0                ┆ "reg7"  │
   ╰────────────────────┴────────┴─────────────────────┴────────────────────┴─────────╯
   
   
   # Reading Feather v2 file with lz4 compression containing saved pandas dataframe, gives the error from the first post.
   In [11]: pl.read_ipc('test_polars_to_arrow_to_pandas_lz4.feather', use_pyarrow=False)
   thread '<unnamed>' panicked at 'assertion failed: prefix.is_empty() && suffix.is_empty()', /github/home/.cargo/git/checkouts/arrow-rs-3b86e19e889d5acc/9f56afb/arrow/src/buffer/immutable.rs:179:9
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ---------------------------------------------------------------------------
   PanicException                            Traceback (most recent call last)
   <ipython-input-11-04613b1d0975> in <module>
   ----> 1 pl.read_ipc('test_polars_to_arrow_to_pandas_lz4.feather', use_pyarrow=False)
   /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/polars/functions.py in read_ipc(file, use_pyarrow)
       337     """
       338     file = _prepare_file_arg(file)
   --> 339     return DataFrame.read_ipc(file, use_pyarrow)
       340 
       341 
   
   /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/polars/frame.py in read_ipc(file, use_pyarrow)
       302 
       303         self = DataFrame.__new__(DataFrame)
   --> 304         self._df = PyDataFrame.read_ipc(file)
       305         return self
       306 
   
   PanicException: assertion failed: prefix.is_empty() && suffix.is_empty()
   
   
   # Reading Feather v2 file with lz4 compression containing saved pyarrow table, results in killing of iPython due to trying to allocate a too big buffer.
   In [12]: pl.read_ipc('test_polars_to_arrow_lz4.feather', use_pyarrow=False)
   Out[12]: memory allocation of 2702793507844465093 bytes failed
   Aborted
   ```
   
   So to me it looks like that arrow-rs is not detecting that pyarrow saved the Feather file with compression and I guess it is reading data (or offsets) from the wrong locations.
   
   
   [test_feather_polars_to_pyarrow.zip](https://github.com/apache/arrow-rs/files/6689794/test_feather_polars_to_pyarrow.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839538790


   It makes sense that you see the same in the Feather file created by pyarrow and pandas as pandas uses the same `pyarrow.feather` code: https://github.com/pandas-dev/pandas/blob/059c8bac51e47d6eaaa3e36d6a293a22312925e6/pandas/io/feather_format.py


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839511211


   I investigated this and there is something funny going on: the file reports that there is an array whose buffer of type `u8` has `201326592` slots, but the buffers' total length is 51. This happens on the 5th column, which is a `Utf8`.
   
   This behavior is consistent among `test_pandas.feather` and `test_arrow.feather` on the zip.
   
   That number of slots seems incorrect. I need to check if this is a problem while reading those slots from the file or whewther they are already written as that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ghuls edited a comment on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
ghuls edited a comment on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-865384731


   @jorgecarleitao I think I might have figured out the problem.
   
   ```python
   import polars as pl
   import pyarrow as pa
   import pandas as pd
   
   # Read Feather file written with pandas, with pa,feather.read_feather (wrapped inside pl.read_ipc) in Polars dataframe.
   df_pl = pl.read_ipc('test_pandas.feather', use_pyarrow=True)
   
   # Convert Polars dataframe to arrow table and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow(), 'test_polars_to_arrow_uncompressed.feather', compression='uncompressed', version=2)
   
   # Convert Polars dataframe to arrow table and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow(), 'test_polars_to_arrow_lz4.feather', compression='lz4', version=2)
   
   # Convert Polars dataframe to arrow table and convert arrow table to pandas dataframe and write to Feather v2 file without compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow().to_pandas(), 'test_polars_to_arrow_to_pandas_uncompressed.feather', compression='uncompressed', version=2)
   
   # Convert Polars dataframe to arrow table and convert arrow table to pandas dataframe and write to Feather v2 file with lz4 compression (with pyarrow).
   pa.feather.write_feather(df_pl.to_arrow().to_pandas(), 'test_polars_to_arrow_to_pandas_lz4.feather', compression='lz4', version=2)
   
   
   # Now try to read all those files with polars without using the pyarrow Feather reading code, but the arrow-rs code instead.
   
   # Reading Feather v2 file without compression containing saved arrow table data, works.
   In [9]: pl.read_ipc('test_polars_to_arrow_uncompressed.feather', use_pyarrow=False)
   Out[9]: 
   shape: (7, 5)
   ╭────────────────────┬────────┬─────────────────────┬────────────────────┬─────────╮
   │ motif1             ┆ motif2 ┆ motif3              ┆ motif4             ┆ regions │
   │ ---                ┆ ---    ┆ ---                 ┆ ---                ┆ ---     │
   │ f32                ┆ f32    ┆ f32                 ┆ f32                ┆ str     │
   ╞════════════════════╪════════╪═════════════════════╪════════════════════╪═════════╡
   │ 1.2000000476837158 ┆ 3      ┆ 0.30000001192092896 ┆ 5.599999904632568  ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.699999809265137  ┆ 3      ┆ 4.300000190734863   ┆ 5.599999904632568  ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5                ┆ 3      ┆ 0.0                 ┆ 0.0                ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0                ┆ 3      ┆ 0.0                 ┆ 5.599999904632568  ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.800000190734863   ┆ 1.2000000476837158 ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 0.6000000238418579  ┆ 0.0                ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.699999809265137   ┆ 0.0                ┆ "reg7"  │
   ╰────────────────────┴────────┴─────────────────────┴────────────────────┴─────────╯
   
   
   # Reading Feather v2 file without compression containing saved pandas dataframe, works.
   In [10]: pl.read_ipc('test_polars_to_arrow_to_pandas_uncompressed.feather', use_pyarrow=False)
   Out[10]: 
   shape: (7, 5)
   ╭────────────────────┬────────┬─────────────────────┬────────────────────┬─────────╮
   │ motif1             ┆ motif2 ┆ motif3              ┆ motif4             ┆ regions │
   │ ---                ┆ ---    ┆ ---                 ┆ ---                ┆ ---     │
   │ f32                ┆ f32    ┆ f32                 ┆ f32                ┆ str     │
   ╞════════════════════╪════════╪═════════════════════╪════════════════════╪═════════╡
   │ 1.2000000476837158 ┆ 3      ┆ 0.30000001192092896 ┆ 5.599999904632568  ┆ "reg1"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 6.699999809265137  ┆ 3      ┆ 4.300000190734863   ┆ 5.599999904632568  ┆ "reg2"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 3.5                ┆ 3      ┆ 0.0                 ┆ 0.0                ┆ "reg3"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 0.0                ┆ 3      ┆ 0.0                 ┆ 5.599999904632568  ┆ "reg4"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.800000190734863   ┆ 1.2000000476837158 ┆ "reg5"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 0.6000000238418579  ┆ 0.0                ┆ "reg6"  │
   ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
   │ 2.4000000953674316 ┆ 3      ┆ 7.699999809265137   ┆ 0.0                ┆ "reg7"  │
   ╰────────────────────┴────────┴─────────────────────┴────────────────────┴─────────╯
   
   
   # Reading Feather v2 file with lz4 compression containing saved pandas dataframe, gives the error from the first post.
   In [11]: pl.read_ipc('test_polars_to_arrow_to_pandas_lz4.feather', use_pyarrow=False)
   thread '<unnamed>' panicked at 'assertion failed: prefix.is_empty() && suffix.is_empty()', /github/home/.cargo/git/checkouts/arrow-rs-3b86e19e889d5acc/9f56afb/arrow/src/buffer/immutable.rs:179:9
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ---------------------------------------------------------------------------
   PanicException                            Traceback (most recent call last)
   <ipython-input-11-04613b1d0975> in <module>
   ----> 1 pl.read_ipc('test_polars_to_arrow_to_pandas_lz4.feather', use_pyarrow=False)
   /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/polars/functions.py in read_ipc(file, use_pyarrow)
       337     """
       338     file = _prepare_file_arg(file)
   --> 339     return DataFrame.read_ipc(file, use_pyarrow)
       340 
       341 
   
   /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/polars/frame.py in read_ipc(file, use_pyarrow)
       302 
       303         self = DataFrame.__new__(DataFrame)
   --> 304         self._df = PyDataFrame.read_ipc(file)
       305         return self
       306 
   
   PanicException: assertion failed: prefix.is_empty() && suffix.is_empty()
   
   
   # Reading Feather v2 file with lz4 compression containing saved pyarrow table, results in killing of iPython due to trying to allocate a too big buffer.
   In [12]: pl.read_ipc('test_polars_to_arrow_lz4.feather', use_pyarrow=False)
   Out[12]: memory allocation of 2702793507844465093 bytes failed
   Aborted
   ```
   
   So to me it looks like that arrow-rs is not detecting that pyarrow saved the Feather file with lz4 compression and I guess it is reading data (or offsets) from the wrong locations.
   
   ```python
   In [6]: ?pa.feather.write_feather
   Signature:
   pa.feather.write_feather(
       df,
       dest,
       compression=None,
       compression_level=None,
       chunksize=None,
       version=2,
   )
   Docstring:
   Write a pandas.DataFrame to Feather format.
   
   Parameters
   ----------
   df : pandas.DataFrame or pyarrow.Table
       Data to write out as Feather format.
   dest : str
       Local destination path.
   compression : string, default None
       Can be one of {"zstd", "lz4", "uncompressed"}. The default of None uses
       LZ4 for V2 files if it is available, otherwise uncompressed.
   compression_level : int, default None
       Use a compression level particular to the chosen compressor. If None
       use the default compression level
   chunksize : int, default None
       For V2 files, the internal maximum size of Arrow RecordBatch chunks
       when writing the Arrow IPC file format. None means use the default,
       which is currently 64K
   version : int, default 2
       Feather file version. Version 2 is the current. Version 1 is the more
       limited legacy format
   File:      /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/feather.py
   Type:      function
   ```
   
   Feather files are attached:
   [test_feather_polars_to_pyarrow.zip](https://github.com/apache/arrow-rs/files/6689794/test_feather_polars_to_pyarrow.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-839446499


   I did not know this: is `feather` compatible with IPC?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org