You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2022/04/05 08:43:00 UTC
[jira] [Created] (ARROW-16118) [C++] Reduce memory usage when writing to IPC
Jorge Leitão created ARROW-16118:
------------------------------------
Summary: [C++] Reduce memory usage when writing to IPC
Key: ARROW-16118
URL: https://issues.apache.org/jira/browse/ARROW-16118
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Jorge Leitão
Writing a record batch to IPC ([header][buffers]) currently requires O(N*B) where N is the average size of the buffer and B the number of buffers.
This is because we need the buffer location and total number of bytes to write the header of the record, which is only known after e.g. compressing them.
When the writer supports seeking, this memory usage can be reduced to O(N) where N is the average size of a primitive buffer over all fields. This is done using the following pseudo-code implementation:
{code:java}
start = writer.seek(current);
empty_locations = create_empty_header(schema)
write_header(writer, empty_locations)
locations = write_buffers(writer, batch)
writer.seek(start)
write_header(writer, locations)
{code}
This has a significantly lower memory footprint. O(N) vs O(N*B)
It could be interesting for the C++ implementation to support this.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)