You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Francois Saint-Jacques (JIRA)" <ji...@apache.org> on 2019/04/08 14:44:00 UTC

[jira] [Updated] (ARROW-5138) [Python/C++] Row group retrieval doesn't restore index properly

     [ https://issues.apache.org/jira/browse/ARROW-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francois Saint-Jacques updated ARROW-5138:
------------------------------------------
    Labels: parquet  (was: )

> [Python/C++] Row group retrieval doesn't restore index properly
> ---------------------------------------------------------------
>
>                 Key: ARROW-5138
>                 URL: https://issues.apache.org/jira/browse/ARROW-5138
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Florian Jetter
>            Priority: Minor
>              Labels: parquet
>
> When retrieving row groups the index is no longer properly restored to its initial value and is set to an range index starting at zero no matter what. version 0.12.1 restored and int64 index with the correct index values.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> print(pa.__version__)
> df = pd.DataFrame(
>     {"a": [1, 2, 3, 4]}
> )
> print("total DF")
> print(df.index)
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf, chunk_size=2)
> reader = pa.BufferReader(buf.getvalue().to_pybytes())
> parquet_file = pq.ParquetFile(reader)
> rg = parquet_file.read_row_group(1)
> df_restored = rg.to_pandas()
> print("Row group")
> print(df_restored.index)
> {code}
> Previous behavior
> {code:python}
> 0.12.1
> total DF
> RangeIndex(start=0, stop=4, step=1)
> Row group
> Int64Index([2, 3], dtype='int64')
> {code}
> Behavior now
> {code:python}
> 0.13.0
> total DF
> RangeIndex(start=0, stop=4, step=1)
> Row group
> RangeIndex(start=0, stop=2, step=1)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)