You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2022/04/05 08:43:00 UTC

[jira] [Created] (ARROW-16118) [C++] Reduce memory usage when writing to IPC

Jorge Leitão created ARROW-16118:
------------------------------------

             Summary: [C++] Reduce memory usage when writing to IPC
                 Key: ARROW-16118
                 URL: https://issues.apache.org/jira/browse/ARROW-16118
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Jorge Leitão


Writing a record batch to IPC ([header][buffers]) currently requires O(N*B) where N is the average size of the buffer and B the number of buffers.

This is because we need the buffer location and total number of bytes to write the header of the record, which is only known after e.g. compressing them.

When the writer supports seeking, this memory usage can be reduced to O(N) where N is the average size of a primitive buffer over all fields. This is done using the following pseudo-code implementation:


{code:java}
start = writer.seek(current);
empty_locations = create_empty_header(schema)
write_header(writer, empty_locations)
locations = write_buffers(writer, batch)
writer.seek(start)
write_header(writer, locations)
{code}

This has a significantly lower memory footprint. O(N) vs O(N*B)

It could be interesting for the C++ implementation to support this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)