You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Timothy Farkas <tf...@mapr.com> on 2018/08/03 04:43:43 UTC

Ownership Of VectorContainers In HashJoin After Calling Next

Hi All,

What is the expected behavior for HashJoin when it calls next for its left
or right upstream record batches. Is ownership of an upstream
VectorContainer supposed to pass from from the left or right upstream
record batches to HashJoin immediately after a call to next? Or is
ownership of a VectorContainer supposed to stay with an upstream record
batch immediately after a call to next?

Thanks,
Tim

Re: Ownership Of VectorContainers In HashJoin After Calling Next

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Tim,

Here's a bit more background info from my experience.

As Sorabh said, the downstream operator has visibility to the incoming record batch regardless of ownership.

If the downstream operator will retain the incoming vectors (buffers), then it must take ownership of them. This is the case for pass-through operators such as filters, or for buffering operators such as sort.

If the downstream operator will observe, but not reuse, the incoming data, then ownership is (mostly) moot. Sounds like this is the case for HashJoin.

The key item I wish to add here is timing. Accepting ownership of the incoming buffer shifts memory accounting from the incoming to the downstream operator. As we have seem over and over, this can then cause the downstream operator to suffer OOM errors if, say, the downstream operator then wants to allocate an SV2 or some such, but finds itself over the memory limit. 

I can imagine that, to avoid this issue, downstream operators may delay taking ownership of the incoming vectors as long as possible. (The sort, for example, first checks the incoming batch size, then spills as needed, to ensure that there capacity in the operator allocator to accept ownership.)

Other operators may have decided to never take ownership if they will simply observe, then discard incoming batches (or vectors.) One very likely candidate for such an approach would be the SelectionVectorRemover since it copies data from incoming to a new set of outgoing batches. (Have not checked if, in fact, the SVR does avoid taking ownership.)

Would be great to adjust memory management so that we can use a more uniform approach. As we've discussed multiple times, there is no good reason for us to hold a gun to our own head with the current per-operator allocators, especially when we issue an OOM for memory already located.

Perhaps you have some ideas for how to improve this as part of your resource management work?

Thanks,
- Paul

 

    On Friday, August 3, 2018, 12:39:35 AM PDT, Timothy Farkas <tf...@mapr.com> wrote:  
 
 Thanks for the explanation Sorabh, that clears things up for me.

Tim

On Thu, Aug 2, 2018 at 11:38 PM, Sorabh Hamirwasia <sh...@mapr.com>
wrote:

> When next is called on upstream operator then 2 things can happen:
>
>    1. Current operator either work on incoming and produces/copy the
>    records to its outgoing batch. In this case there is no transfer of
>    ownership of incoming batch buffer from upstream to current operator
>    allocator. Current operator will allocate separate memory footprint for
> its
>    outgoing batch buffer. Also current operator is supposed to release the
>    incoming batch buffer once its done working on it.
>    2. Current operator does a transfer of buffers from incoming batch value
>    vectors to outgoing value vectors (like in Filter, limit (see [1]),
> etc).
>    In this case ownership of buffers in incoming batch is transferred to
>    current operator allocator.
>
> But I have seen different operator behaving differently. For Hash Join
> since join operators has to evaluate join condition for each probe side
> row, I don't think it will do any transfers. For build side it will build
> hash table on column involved in join condition but also has to store other
> columns if in projection list of query. So probably it might do transfer
> for those columns only (haven't looked into code though).
>
> [1]:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_drill_blob_006dc10a88c1708b793e3a38ac52a0
> 266bb07deb_exec_java-2Dexec_src_main_java_org_apache_
> drill_exec_physical_impl_limit_LimitRecordBatch.java-23L181&d=DwIBaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=4eQVr8zB8ZBff-yxTimdOQ&m=
> 3PkPHZrX33YZhmngIGbjYtgGn_KYzXNVUyJPEUQVFyo&s=7YyEpHwWOp2I8zQjNEdFXLu-
> QnVCq7neR7lyg7bfPVs&e=
>
> Thanks,
> Sorabh
>
> On Thu, Aug 2, 2018 at 9:43 PM, Timothy Farkas <tf...@mapr.com> wrote:
>
> > Hi All,
> >
> > What is the expected behavior for HashJoin when it calls next for its
> left
> > or right upstream record batches. Is ownership of an upstream
> > VectorContainer supposed to pass from from the left or right upstream
> > record batches to HashJoin immediately after a call to next? Or is
> > ownership of a VectorContainer supposed to stay with an upstream record
> > batch immediately after a call to next?
> >
> > Thanks,
> > Tim
> >
>
  

Re: Ownership Of VectorContainers In HashJoin After Calling Next

Posted by Timothy Farkas <tf...@mapr.com>.
Thanks for the explanation Sorabh, that clears things up for me.

Tim

On Thu, Aug 2, 2018 at 11:38 PM, Sorabh Hamirwasia <sh...@mapr.com>
wrote:

> When next is called on upstream operator then 2 things can happen:
>
>    1. Current operator either work on incoming and produces/copy the
>    records to its outgoing batch. In this case there is no transfer of
>    ownership of incoming batch buffer from upstream to current operator
>    allocator. Current operator will allocate separate memory footprint for
> its
>    outgoing batch buffer. Also current operator is supposed to release the
>    incoming batch buffer once its done working on it.
>    2. Current operator does a transfer of buffers from incoming batch value
>    vectors to outgoing value vectors (like in Filter, limit (see [1]),
> etc).
>    In this case ownership of buffers in incoming batch is transferred to
>    current operator allocator.
>
> But I have seen different operator behaving differently. For Hash Join
> since join operators has to evaluate join condition for each probe side
> row, I don't think it will do any transfers. For build side it will build
> hash table on column involved in join condition but also has to store other
> columns if in projection list of query. So probably it might do transfer
> for those columns only (haven't looked into code though).
>
> [1]:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_drill_blob_006dc10a88c1708b793e3a38ac52a0
> 266bb07deb_exec_java-2Dexec_src_main_java_org_apache_
> drill_exec_physical_impl_limit_LimitRecordBatch.java-23L181&d=DwIBaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=4eQVr8zB8ZBff-yxTimdOQ&m=
> 3PkPHZrX33YZhmngIGbjYtgGn_KYzXNVUyJPEUQVFyo&s=7YyEpHwWOp2I8zQjNEdFXLu-
> QnVCq7neR7lyg7bfPVs&e=
>
> Thanks,
> Sorabh
>
> On Thu, Aug 2, 2018 at 9:43 PM, Timothy Farkas <tf...@mapr.com> wrote:
>
> > Hi All,
> >
> > What is the expected behavior for HashJoin when it calls next for its
> left
> > or right upstream record batches. Is ownership of an upstream
> > VectorContainer supposed to pass from from the left or right upstream
> > record batches to HashJoin immediately after a call to next? Or is
> > ownership of a VectorContainer supposed to stay with an upstream record
> > batch immediately after a call to next?
> >
> > Thanks,
> > Tim
> >
>

Re: Ownership Of VectorContainers In HashJoin After Calling Next

Posted by Sorabh Hamirwasia <sh...@mapr.com>.
When next is called on upstream operator then 2 things can happen:

   1. Current operator either work on incoming and produces/copy the
   records to its outgoing batch. In this case there is no transfer of
   ownership of incoming batch buffer from upstream to current operator
   allocator. Current operator will allocate separate memory footprint for its
   outgoing batch buffer. Also current operator is supposed to release the
   incoming batch buffer once its done working on it.
   2. Current operator does a transfer of buffers from incoming batch value
   vectors to outgoing value vectors (like in Filter, limit (see [1]), etc).
   In this case ownership of buffers in incoming batch is transferred to
   current operator allocator.

But I have seen different operator behaving differently. For Hash Join
since join operators has to evaluate join condition for each probe side
row, I don't think it will do any transfers. For build side it will build
hash table on column involved in join condition but also has to store other
columns if in projection list of query. So probably it might do transfer
for those columns only (haven't looked into code though).

[1]:
https://github.com/apache/drill/blob/006dc10a88c1708b793e3a38ac52a0266bb07deb/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/limit/LimitRecordBatch.java#L181

Thanks,
Sorabh

On Thu, Aug 2, 2018 at 9:43 PM, Timothy Farkas <tf...@mapr.com> wrote:

> Hi All,
>
> What is the expected behavior for HashJoin when it calls next for its left
> or right upstream record batches. Is ownership of an upstream
> VectorContainer supposed to pass from from the left or right upstream
> record batches to HashJoin immediately after a call to next? Or is
> ownership of a VectorContainer supposed to stay with an upstream record
> batch immediately after a call to next?
>
> Thanks,
> Tim
>