You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/04/25 15:57:00 UTC
[jira] [Resolved] (ARROW-16131) [C++] Record batch specific metadata is not saved in IPC file
[ https://issues.apache.org/jira/browse/ARROW-16131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou resolved ARROW-16131.
------------------------------------
Fix Version/s: 8.0.0
Resolution: Fixed
Issue resolved by pull request 12812
[https://github.com/apache/arrow/pull/12812]
> [C++] Record batch specific metadata is not saved in IPC file
> -------------------------------------------------------------
>
> Key: ARROW-16131
> URL: https://issues.apache.org/jira/browse/ARROW-16131
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 7.0.0
> Reporter: Yue Ni
> Assignee: Yue Ni
> Priority: Major
> Labels: pull-request-available
> Fix For: 8.0.0
>
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> When writing an IPC file having multiple record batches, the schema provided to `IpcFormatWriter` is correctly written to IPC file's footer, however, if the record batch written has its batch specific metadata associated with it, this metadata is not written.
> This can be reproduced with the following test case (using pyarrow):
> {code:java}
> def test_chunked_record_batch_meta():
> num_batches = 2
> ipc_file = "/tmp/batches_with_metadata.arrow"
> int_array = pa.array([i for i in range(chunk_size)])
> schema = pa.schema(
> [
> ("values", pa.int64()),
> ],
> metadata={"foo": "bar"},
> )
> writer = pa.RecordBatchFileWriter(
> ipc_file, schema
> )
> for i in range(num_batches):
> # follow examples here:
> # https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py
> batch = pa.record_batch(
> [int_array],
> names=["values"],
> metadata={"batch_id": str(i)},
> )
> writer.write_batch(batch)
> writer.close()
> mmapped_file = pa.memory_map(ipc_file)
> reader = pa.ipc.open_file(mmapped_file)
> batch_0 = reader.get_record_batch(0)
> assert batch_0.schema.metadata {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)