You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Amos Bird <am...@gmail.com> on 2016/09/01 03:27:12 UTC

Re: Why not use shared_ptr or other ref counting to handle RowBatch's auxilliary memory?

Thanks for the explanation. It now makes a lot sense to me. This may
deverve being added to the document so that new comers like me would
understand the reason behind the scene.

Tim Armstrong writes:

> There are various reasons - mainly we want more control over memory usage
> and accounting than shared_ptr allows.
>
> Generally we avoid shared_ptr in Impala since it makes it harder to reason
> about when resources are released. E.g. we typically want to know/control
> exactly when memory is freed up.
>
> Using shared_ptr doesn't help with accounting memory accurately against
> different plan nodes. E.g. if you have multiple join nodes in the same
> pipeline, and each of them is processing a batch that references the same
> disk io buffer, how do you attribute the memory? The most sensible approach
> is to have the bottom-most node be the "owner" of the resource, then
> transfer that ownership up by attaching it to the last batch that
> references it. To do that we need to explicitly know which the last batch
> is, so we have to explicitly track that anyway, which means that shared_ptr
> doesn't really help us manage memory lifetime.
>
> I can see some advantages to tracking all the resources each batch
> references them (e.g. having non-owning and owning references) - it would
> make memory transfer issues easier to debug, but I don't think shared_ptr
> helps with that accounting.
>
>
> I think there may be some advantages to explicitly reference counting
> resources for debugging memory issues.
>
> On Wed, Aug 31, 2016 at 5:06 AM, Amos Bird <am...@gmail.com> wrote:
>
>>
>> Hi there,
>>
>> I'm reading
>> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches.
>> It says "If an operator is accumulating batches, this means that it must
>> be careful not to destroy or reset a batch if previous batches are still
>> in use, because this could release memory resources that are used by the
>> previous batches."
>>
>> This seems to be a good place to use shared_ptr. I'm curious why impala
>> handles this problem using some sort of coding conventions. Is it
>> because we use MemPools?
>>
>> I may be very ignorance. Any explanation is highly appreciated!
>>
>> Regards,
>> Amos
>>
>>
>>
>>