You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/09 22:22:21 UTC
[GitHub] [arrow] quazzuk opened a new issue #8420: Is this expected behaviour?
quazzuk opened a new issue #8420:
URL: https://github.com/apache/arrow/issues/8420
```
import os
import pyarrow as pa
import pyarrow.parquet as pq
df = pd.DataFrame(dict(symbol=["A", "B", "C", "D"], year=[2017, 2018, 2019, 2020], close=np.arange(4)))
root_path = "test"
os.makedirs(root_path, exist_ok=True)
dataset = ds.dataset(root_path, format="parquet", partitioning="hive")
table1 = pa.Table.from_pandas(df)
print(f"\nbefore:\n{table.schema.to_string(show_field_metadata=False)}")
pq.write_to_dataset(table, root_path=root_path, partition_cols=["symbol", "year"])
table2 = dataset.to_table()
print(f"\nafter:\n{table2.schema.to_string(show_field_metadata=False)}")
```
before:
symbol: string
year: int64
close: int64
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 582
after:
close: int64
symbol: string
year: int32
-- schema metadata --
pandas: '{"index_columns": [], "column_indexes": [{"name": null, "field_n' + 300
i.e. column ordering and types. I suspect this might be due to partitioning. Should I be storing additional metadata and using it when subsequently retrieving?
Thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] quazzuk closed issue #8420: Is this expected behaviour?
Posted by GitBox <gi...@apache.org>.
quazzuk closed issue #8420:
URL: https://github.com/apache/arrow/issues/8420
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org