You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yue Ni (Jira)" <ji...@apache.org> on 2022/04/06 13:09:00 UTC

[jira] [Commented] (ARROW-16131) [C++] Record batch specific metadata is not saved in IPC file

    [ https://issues.apache.org/jira/browse/ARROW-16131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518113#comment-17518113 ] 

Yue Ni commented on ARROW-16131:
--------------------------------

[~lidavidm] this seems to be duplicated with ARROW-6940. I didn't see the latest reply on the mailing list so I created this ticket for tracking this issue. After creating this ticket, I went on to fix it by submitting PR [https://github.com/apache/arrow/pull/12812] and then saw your comment here haha

ARROW-6940 has mentioned arrow flight change, which I didn't touch here.

> [C++] Record batch specific metadata is not saved in IPC file
> -------------------------------------------------------------
>
>                 Key: ARROW-16131
>                 URL: https://issues.apache.org/jira/browse/ARROW-16131
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 7.0.0
>            Reporter: Yue Ni
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When writing an IPC file having multiple record batches, the schema provided to `IpcFormatWriter` is correctly written to IPC file's footer, however, if the record batch written has its batch specific metadata associated with it, this metadata is not written.
> This can be reproduced with the following test case (using pyarrow):
> {code:java}
> def test_chunked_record_batch_meta():
>     num_batches = 2
>     ipc_file = "/tmp/batches_with_metadata.arrow"
>     int_array = pa.array([i for i in range(chunk_size)])
>     schema = pa.schema(
>         [
>             ("values", pa.int64()),
>         ],
>         metadata={"foo": "bar"},
>     )
>     writer = pa.RecordBatchFileWriter(
>         ipc_file, schema
>     )
>     for i in range(num_batches):
>         # follow examples here:
>         # https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py
>         batch = pa.record_batch(
>             [int_array],
>             names=["values"],
>             metadata={"batch_id": str},
>         )
>         writer.write_batch(batch)
>     writer.close()
>     mmapped_file = pa.memory_map(ipc_file)
>     reader = pa.ipc.open_file(mmapped_file)
>     batch_0 = reader.get_record_batch(0)
>     assert batch_0.schema.metadata {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)