You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by SHI BEI <sh...@foxmail.com> on 2023/05/13 00:59:09 UTC

回复:Re: Reusing RecordBatch objects and their memory space

Hi, I understand that this may be a two-dimensional issue, where&nbsp;memory pool&nbsp;can effectively avoid the&nbsp;system call overhead&nbsp;during memory usage, including&nbsp;memory allocation&nbsp;and release,&nbsp;memory fragmentation, and memory sorting, etc. In my use case, I have already used memory pool. However, within the RecordBatchReader::ReadNext interface, temporary object instances such as ChunkedArray and Table are generated every time, and space is reallocated for the Buffer inside ChunkedArray. When RecordBatchReader::ReadNext is repeatedly called multiple times, the destructors of these temporary objects will be very clearly reflected in the&nbsp;CPU flame graph, resulting in significant performance overhead.




SHI&nbsp;BEI
shibei.lh@foxmail.com








                       
原始邮件
                       
                     

发件人:"Will Jones"< will.jones127@gmail.com &gt;;

发件时间:2023/5/13 1:04

收件人:"dev"< dev@arrow.apache.org &gt;;

主题:Re: Reusing RecordBatch objects and their memory space


Hello,

I'm not sure if there are easy ways to avoid calling the destructors.
However, I would point out memory space reuse is handled through memory
pools; if you have one enabled it shouldn't be handing memory back to the
OS between each iteration.

Best,

Will Jones

On Fri, May 12, 2023 at 9:59 AM SHI BEI  wrote:

&gt; Hi community,
&gt;
&gt;
&gt; I'm using the&nbsp;RecordBatchReader::ReadNext interface to read Parquet
&gt; data in my project, and I've noticed that there are a lot of temporary
&gt; object destructors being generated during usage. Has the community
&gt; considered providing an interface to reuse&nbsp;RecordBatch&nbsp;objects
&gt; and their memory space for storing data?
&gt;
&gt;
&gt;
&gt;
&gt; SHI&nbsp;BEI
&gt; shibei.lh@foxmail.com