You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alan Snow (Jira)" <ji...@apache.org> on 2021/05/18 14:48:00 UTC

[jira] [Commented] (ARROW-12823) [Parquet][Python] Read and write file/column metadata using pandas attrs

    [ https://issues.apache.org/jira/browse/ARROW-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346973#comment-17346973 ] 

Alan Snow commented on ARROW-12823:
-----------------------------------

Seems like writing metadata could happen in [get_column_metadata|https://github.com/apache/arrow/blob/aa37d197a63a7efbc0660f9cea2f75cc08c30587/python/pyarrow/pandas_compat.py#L139]

Possibly add an "attrs" item so it doesn't conflict with "metadata".

> [Parquet][Python] Read and write file/column metadata using pandas attrs
> ------------------------------------------------------------------------
>
>                 Key: ARROW-12823
>                 URL: https://issues.apache.org/jira/browse/ARROW-12823
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Parquet, Python
>            Reporter: Alan Snow
>            Priority: Minor
>
> Related: https://github.com/pandas-dev/pandas/issues/20521
> What the general thoughts are to use [DataFrame.attrs|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.attrs.html#pandas-dataframe-attrs] and [Series.attrs|https://pandas.pydata.org/pandas-docs/stable//reference/api/pandas.Series.attrs.html#pandas-series-attrs] for reading and writing metadata to/from parquet?
> For example, here is how the metadata would be written:
> {code:python}
> pdf = pandas.DataFrame({"a": [1]})
> pdf.attrs = {"name": "my custom dataset"}
> pdf.a.attrs = {"long_name": "Description about data", "nodata": -1, "units": "metre"}
> pdf.to_parquet("file.parquet"){code}
> Then, when loading in the data:
> {code:python}
> pdf = pandas.read_parquet("file.parquet")
> pdf.attrs{code}
> {"name": "my custom dataset"}
> {code:java}
> pdf.a.attrs{code}
> {"long_name": "Description about data", "nodata": -1, "units": "metre"}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)