You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Yue Ni (Jira)" <ji...@apache.org> on 2022/04/06 05:22:00 UTC

[jira] [Created] (ARROW-16131) record batch specific metadata is not saved in IPC file

Yue Ni created ARROW-16131:
------------------------------

             Summary: record batch specific metadata is not saved in IPC file
                 Key: ARROW-16131
                 URL: https://issues.apache.org/jira/browse/ARROW-16131
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
    Affects Versions: 7.0.0
            Reporter: Yue Ni


When writing an IPC file having multiple record batches, the schema provided to `IpcFormatWriter` is correctly written to IPC file's footer, however, if the record batch written has its batch specific metadata associated with it, this metadata is not written.

This can be reproduced with the following test case (using pyarrow):

```python

def test_chunked_record_batch_meta():
    num_batches = 2
    ipc_file = "/tmp/batches_with_metadata.arrow"

    int_array = pa.array([i for i in range(chunk_size)])
    schema = pa.schema(
        [
            ("values", pa.int64()),
        ],
        metadata=\{"foo": "bar" },
    )

    writer = pa.RecordBatchFileWriter(
        ipc_file, schema
    )

    for i in range(num_batches):
        # follow examples here:
        # https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py
        batch = pa.record_batch(
            [int_array],
            names=["values"],
            metadata=\{"batch_id": str(i)},
        )
        writer.write_batch(batch)

    writer.close()

    mmapped_file = pa.memory_map(ipc_file)
    reader = pa.ipc.open_file(mmapped_file)
    batch_0 = reader.get_record_batch(0)
    assert batch_0.schema.metadata

```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)