You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Justin Lewis (JIRA)" <ji...@apache.org> on 2018/11/20 16:42:00 UTC

[jira] [Created] (PARQUET-1459) preserve_index=False with empty dataframe fails to write

Justin Lewis created PARQUET-1459:
-------------------------------------

             Summary: preserve_index=False with empty dataframe fails to write
                 Key: PARQUET-1459
                 URL: https://issues.apache.org/jira/browse/PARQUET-1459
             Project: Parquet
          Issue Type: Bug
         Environment: conda list --explicit
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2018.10.15-ha4d7672_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-7.2.0-hdf63c60_3.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libgfortran-3.0.0-1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-7.2.0-hdf63c60_3.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.6-h470a237_2.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/icu-58.2-hfc679d8_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.2.1-hfc679d8_5.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libiconv-1.15-h470a237_3.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.1-hfc679d8_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/openblas-0.3.3-ha44fe06_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/openssl-1.0.2p-h470a237_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.4-h470a237_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/yaml-0.1.7-h470a237_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/zlib-1.2.11-h470a237_3.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/blas-1.1-openblas.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/boost-cpp-1.68.0-h3a22d5f_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libedit-3.1.20170329-haf1bffa_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/readline-7.0-haf1bffa_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.9-ha92aebf_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/sqlite-3.25.3-hb1c47c0_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/unixodbc-2.3.7-h09ba92c_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/python-3.6.6-h5001a0f_3.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/atomicwrites-1.2.1-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/attrs-18.2.0-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/backcall-0.1.0-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/certifi-2018.10.15-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/click-7.0-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/decorator-4.3.0-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/ipython_genutils-0.2.0-py_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/numpy-1.15.4-py36_blas_openblashb06ca3d_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/parso-0.3.1-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pickleshare-0.7.5-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/pluggy-0.8.0-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/ptyprocess-0.6.0-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/py-1.7.0-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/pytz-2018.7-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pyyaml-3.13-py36h470a237_1.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/wcwidth-0.1.7-py_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/arrow-cpp-0.11.1-py36h3bd774a_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/jedi-0.13.1-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/more-itertools-4.3.0-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pexpect-4.6.0-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.7.5-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/setuptools-40.6.2-py36_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/traitlets-4.3.2-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pandas-0.23.4-py36hf8a1672_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/parquet-cpp-1.5.1-2.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/pygments-2.2.0-py_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pytest-4.0.0-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/wheel-0.32.3-py36_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pip-18.1-py36_1000.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/prompt_toolkit-2.0.7-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pyarrow-0.11.1-py36hfc679d8_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/ipython-7.1.1-py36h24bf2e0_1000.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/turbodbc-3.0.0-py36h38e7a2c_0.tar.bz2
            Reporter: Justin Lewis


{code:java}
import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa


def test_write_empty_preserve_index():

# passes

df = pd.DataFrame()
table = pa.Table.from_pandas(df, preserve_index=True)
pq.write_table(table, 'test1.parquet')
table2 = pq.read_table('test1.parquet')
df2 = table2.to_pandas()
pd.util.testing.assert_frame_equal(df, df2)


def test_write_empty_no_preserve_index():
df = pd.DataFrame()
table = pa.Table.from_pandas(df, preserve_index=False)

# fails here
pq.write_table(table, 'test2.parquet')

table2 = pq.read_table('test2.parquet')
df2 = table2.to_pandas()
pd.util.testing.assert_frame_equal(df, df2){code}
 

First test passes.  Second one fails with this:

 
{code:java}
___________________________________ test_write_empty_no_preserve_index ___________________________________

def test_write_empty_no_preserve_index():
df = pd.DataFrame()
table = pa.Table.from_pandas(df, preserve_index=False)

# fails here
> pq.write_table(table, 'test2.parquet')

test_empty.py:24: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:1125: in write_table
writer.write_table(table, row_group_size=row_group_size)
../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:361: in __exit__
self.close()
../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:380: in close
self.writer.close()
pyarrow/_parquet.pyx:916: in pyarrow._parquet.ParquetWriter.close
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

> ???
E pyarrow.lib.ArrowIOError: Root node did not have children

pyarrow/error.pxi:83: ArrowIOError
{code}
 

I haven't had a chance to investigate but seems not desired behavior.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)