You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Mathieu Dutour Sikiric (Jira)" <ji...@apache.org> on 2020/08/17 20:12:01 UTC

[jira] [Created] (ARROW-9774) Document metadata

Mathieu Dutour Sikiric created ARROW-9774:
---------------------------------------------

             Summary: Document metadata
                 Key: ARROW-9774
                 URL: https://issues.apache.org/jira/browse/ARROW-9774
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Documentation
    Affects Versions: 1.0.0
         Environment: Linux
            Reporter: Mathieu Dutour Sikiric


I would like to write down a dataframe into a parquet file.

The problem that I have is the output dataframe shows up as

```0 \{'field0': 5, 'field1': 8}
1 \{'field0': 5, 'field1': 8}
2 \{'field0': 4, 'field1': 7}```

while what I want is

```0 \{'A': 5, 'B': 8}
1 \{'A': 5, 'B': 8}
2 \{'A': 4, 'B': 7}```

As I understand the discrepancy is because I did not pass the metadata in the creation of the table. That is I did

schema_metadata = ::arrow::key_value_metadata(\{{"pandas", metadata.data()}});

schema = std::make_shared<arrow::Schema>(schema_vector, schema_metadata);

arrow_table = arrow::Table::Make(schema, columns, row_group_size);

status = parquet::arrow::WriteTable( *arrow_table, pool, out_stream, row_group_size, writer_properties, ...)

The problem is that I could not find any documentation on how the metadata is to be built. Adding documentation would be much helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)