You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Gert Hulselmans (Jira)" <ji...@apache.org> on 2020/09/21 17:17:00 UTC
[jira] [Created] (ARROW-10056) PyArrow writes invalid Feather v2
file: OSError: Verification of flatbuffer-encoded Footer failed.
Gert Hulselmans created ARROW-10056:
---------------------------------------
Summary: PyArrow writes invalid Feather v2 file: OSError: Verification of flatbuffer-encoded Footer failed.
Key: ARROW-10056
URL: https://issues.apache.org/jira/browse/ARROW-10056
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 1.0.1
Environment: CentOS7
conda environment with pyarrow 1.0.1, numpy 1.19.1 and pandas 1.1.1
Reporter: Gert Hulselmans
pyarrow writes an invalid Feather v2 file, which it can't read afterwards.
{code:java}
OSError: Verification of flatbuffer-encoded Footer failed.
{code}
The following code reproduces the problem for me:
{code:python}
import pyarrow as pa
import numpy as np
import pandas as pd
nbr_regions = 1223024
nbr_motifs = 4891
# Create (big) dataframe.
df = pd.DataFrame(
np.arange(nbr_regions * nbr_motifs, dtype=np.float32).reshape((nbr_regions, nbr_motifs)),
index=pd.Index(['region' + str(i) for i in range(nbr_regions)], name='regions'),
columns=pd.Index(['motif' + str(i) for i in range(nbr_motifs)], name='motifs')
)
# Transpose dataframe
df_transposed = df.transpose()
# Write transposed dataframe to Feather v2 format.
pf.write_feather(df_transposed, 'df_transposed.feather')
# Trying to read the transposed dataframe from Feather v2 format, results in this error:
df_transposed_read = pf.read_feather('df_transposed.feather')
{code}
{code:python}
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-64-b41ad5157e77> in <module>
----> 1 df_transposed_read = pf.read_feather('df_transposed.feather')
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py in read_feather(source, columns, use_threads, memory_map)
213 """
214 _check_pandas_version()
--> 215 return (read_table(source, columns=columns, memory_map=memory_map)
216 .to_pandas(use_threads=use_threads))
217
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py in read_table(source, columns, memory_map)
235 """
236 reader = ext.FeatherReader()
--> 237 reader.open(source, use_memory_map=memory_map)
238
239 if columns is None:
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.open()
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
OSError: Verification of flatbuffer-encoded Footer failed.
{code}
Later I discovered that it happens also if the original dataframe is created in the transposed order:
{code:python}
# Create (big) dataframe.
df_without_transpose = pd.DataFrame(
np.arange(nbr_motifs * nbr_regions, dtype=np.float32).reshape((nbr_motifs, nbr_regions)),
index=pd.Index(['motif' + str(i) for i in range(nbr_motifs)], name='motifs'),
columns=pd.Index(['region' + str(i) for i in range(nbr_regions)], name='regions'),
)
pf.write_feather(df_without_transpose, 'df_without_transpose.feather')
df_without_transpose_read = pf.read_feather('df_without_transpose.feather')
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-91-3cdad1d58c35> in <module>
----> 1 df_without_transpose_read = pf.read_feather('df_without_transpose.feather')
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py in read_feather(source, columns, use_threads, memory_map)
213 """
214 _check_pandas_version()
--> 215 return (read_table(source, columns=columns, memory_map=memory_map)
216 .to_pandas(use_threads=use_threads))
217
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py in read_table(source, columns, memory_map)
235 """
236 reader = ext.FeatherReader()
--> 237 reader.open(source, use_memory_map=memory_map)
238
239 if columns is None:
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.open()
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
/software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
OSError: Verification of flatbuffer-encoded Footer failed.
{code}
Writing to Feather v1 format works:
{code:python}
pf.write_feather(df_transposed, 'df_transposed.v1.feather', version=1)
df_transposed_read_v1 = pf.read_feather('df_transposed.v1.feather')
# Now do the same, but also save the index in the Feather v1 file.
df_transposed_reset_index = df_transposed.reset_index()
pf.write_feather(df_transposed_reset_index, 'df_transposed_reset_index.v1.feather', version=1)
df_transposed_reset_index_read_v1 = pf.read_feather('df_transposed_reset_index.v1.feather')
# Returns True
df_transposed_reset_index_read_v1.equals(df_transposed)
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)