You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Mathieu Dutour Sikiric (Jira)" <ji...@apache.org> on 2020/08/17 20:12:01 UTC
[jira] [Created] (ARROW-9774) Document metadata
Mathieu Dutour Sikiric created ARROW-9774:
---------------------------------------------
Summary: Document metadata
Key: ARROW-9774
URL: https://issues.apache.org/jira/browse/ARROW-9774
Project: Apache Arrow
Issue Type: Improvement
Components: Documentation
Affects Versions: 1.0.0
Environment: Linux
Reporter: Mathieu Dutour Sikiric
I would like to write down a dataframe into a parquet file.
The problem that I have is the output dataframe shows up as
```0 \{'field0': 5, 'field1': 8}
1 \{'field0': 5, 'field1': 8}
2 \{'field0': 4, 'field1': 7}```
while what I want is
```0 \{'A': 5, 'B': 8}
1 \{'A': 5, 'B': 8}
2 \{'A': 4, 'B': 7}```
As I understand the discrepancy is because I did not pass the metadata in the creation of the table. That is I did
schema_metadata = ::arrow::key_value_metadata(\{{"pandas", metadata.data()}});
schema = std::make_shared<arrow::Schema>(schema_vector, schema_metadata);
arrow_table = arrow::Table::Make(schema, columns, row_group_size);
status = parquet::arrow::WriteTable( *arrow_table, pool, out_stream, row_group_size, writer_properties, ...)
The problem is that I could not find any documentation on how the metadata is to be built. Adding documentation would be much helpful.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)