You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/15 10:04:18 UTC

[GitHub] [arrow] thatlittleboy opened a new issue, #12899: [Python] Reading feather format file fails with "Ran out of field metadata, likely malformed"

thatlittleboy opened a new issue, #12899:
URL: https://github.com/apache/arrow/issues/12899

   Consider the following example with pandas:
   
   ```python
   [ins] In [11]: df = pd.DataFrame({
             ...:     "cat1": pd.Categorical(["a", "b", "a"]),
             ...:     "cat2": pd.cut(range(1, 10, 3), [-1, 5, 10]),
             ...: })
   
   [ins] In [14]: df['cat2'].cat.categories
   Out[14]: IntervalIndex([(-1, 5], (5, 10]], dtype='interval[int64, right]')
   ```
   
   I have a categorical column `cat2` whose category dtypes are intervals.
   
   I can write the dataframe to a feather file, no issues, but reading it throws an ArrowInvalid error:
   
   ```python
   [ins] In [19]: feather.write_feather(df, "test.feather")
   
   [ins] In [20]: feather.read_feather("test.feather")
   ---------------------------------------------------------------------------
   ArrowInvalid                              Traceback (most recent call last)
   Input In [20], in <cell line: 1>()
   ----> 1 feather.read_feather("test.feather")
   
   File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/feather.py:220, in read_feather(source, columns, use_threads, memory_map)
       198 """
       199 Read a pandas.DataFrame from Feather format. To read as pyarrow.Table use
       200 feather.read_table.
      (...)
       217 df : pandas.DataFrame
       218 """
       219 _check_pandas_version()
   --> 220 return (read_table(
       221     source, columns=columns, memory_map=memory_map,
       222     use_threads=use_threads).to_pandas(use_threads=use_threads))
   
   File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/feather.py:248, in read_table(source, columns, memory_map, use_threads)
       244 reader = _feather.FeatherReader(
       245     source, use_memory_map=memory_map, use_threads=use_threads)
       247 if columns is None:
   --> 248     return reader.read()
       250 column_types = [type(column) for column in columns]
       251 if all(map(lambda t: t == int, column_types)):
   
   File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/_feather.pyx:88, in pyarrow._feather.FeatherReader.read()
   
   File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/error.pxi:99, in pyarrow.lib.check_status()
   
   ArrowInvalid: Ran out of field metadata, likely malformed
   ```
   
   The error only occurs with the `cat2` (category[interval]) column. For normal categorical columns like `cat1` in my example, there are no issues.
   I note that Interval types are supposedly supported ([here](https://github.com/apache/arrow/blob/master/docs/source/status.rst)), so is this a bug or am I misunderstanding anything (and the error is expected)?
   
   
   ## versions
   
   python 3.9.10
   pandas==1.4.2
   pyarrow==7.0.0
   mac OS 12.2
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #12899: [Python] Reading feather format file fails with "Ran out of field metadata, likely malformed"

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #12899:
URL: https://github.com/apache/arrow/issues/12899#issuecomment-1105465317

   I opened https://issues.apache.org/jira/browse/ARROW-16231 for this, we can further track this bug there, so therefore closing this issue. Thanks again for the report!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche closed issue #12899: [Python] Reading feather format file fails with "Ran out of field metadata, likely malformed"

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche closed issue #12899: [Python] Reading feather format file fails with "Ran out of field metadata, likely malformed"
URL: https://github.com/apache/arrow/issues/12899


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #12899: [Python] Reading feather format file fails with "Ran out of field metadata, likely malformed"

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #12899:
URL: https://github.com/apache/arrow/issues/12899#issuecomment-1100058626

   @thatlittleboy thanks for the report! I can reproduce this, both on 7.0.0 as on latest development version. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org