You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by SHI BEI <sh...@foxmail.com> on 2023/05/13 00:59:09 UTC
回复:Re: Reusing RecordBatch objects and their memory space
Hi, I understand that this may be a two-dimensional issue, where memory pool can effectively avoid the system call overhead during memory usage, including memory allocation and release, memory fragmentation, and memory sorting, etc. In my use case, I have already used memory pool. However, within the RecordBatchReader::ReadNext interface, temporary object instances such as ChunkedArray and Table are generated every time, and space is reallocated for the Buffer inside ChunkedArray. When RecordBatchReader::ReadNext is repeatedly called multiple times, the destructors of these temporary objects will be very clearly reflected in the CPU flame graph, resulting in significant performance overhead.
SHI BEI
shibei.lh@foxmail.com
原始邮件
发件人:"Will Jones"< will.jones127@gmail.com >;
发件时间:2023/5/13 1:04
收件人:"dev"< dev@arrow.apache.org >;
主题:Re: Reusing RecordBatch objects and their memory space
Hello,
I'm not sure if there are easy ways to avoid calling the destructors.
However, I would point out memory space reuse is handled through memory
pools; if you have one enabled it shouldn't be handing memory back to the
OS between each iteration.
Best,
Will Jones
On Fri, May 12, 2023 at 9:59 AM SHI BEI wrote:
> Hi community,
>
>
> I'm using the RecordBatchReader::ReadNext interface to read Parquet
> data in my project, and I've noticed that there are a lot of temporary
> object destructors being generated during usage. Has the community
> considered providing an interface to reuse RecordBatch objects
> and their memory space for storing data?
>
>
>
>
> SHI BEI
> shibei.lh@foxmail.com