You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Abdel Hakim Deneche <ad...@maprtech.com> on 2015/12/15 20:23:47 UTC

Question about the RecordIterator

Amit,

I am looking at DRILL-4190 where one of the sort operators is hitting it's
allocator limit when it's sending data downstream. This generally happen
when a downstream operator is holding those batches in memory (e.g. Window
Operator).

The same query is running fine on 1.2.0 which seems to suggest that the
recent changes to MergeJoinBatch "may" be causing the issue.

It looks like RecordIterator is holding all incoming batches into a
TreeRangeMap and if I'm not mistaken it doesn't release anything until it's
closed. Is this correct ?

I am not familiar with how merge join used to work before RecordIterator.
Was it also the case that we hold all incoming batches in memory ?

Thanks

-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Question about the RecordIterator

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.

Ok, thanks. I will add a comment to the JIRA and assign it to you ;)

On Tue, Dec 15, 2015 at 12:02 PM, Amit Hadke <am...@gmail.com> wrote:

> Yup that may be it. I'll add an option to not hold on to left side iterator
> batches.
>
> On Tue, Dec 15, 2015 at 11:56 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > RecordIterator.mark() is only called for the right side of the merge
> join.
> > How about the left side, de we ever release the batches on the left side
> ?
> > In 4190 the sort that runs out of memory is on the left side of the
> merge.
> >
> > On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com
> > > wrote:
> >
> > > I see, it's in RecordIterator.mark()
> > >
> > > On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com> wrote:
> > >
> > >> Amit,
> > >>
> > >> thanks for the prompt answer. Can you point me, in the code, where the
> > >> purge is done ?
> > >>
> > >>
> > >>
> > >> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <am...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Hakim,
> > >>> RecordIterator will not hold all batches in memory. It holds batches
> > from
> > >>> last mark() operation.
> > >>> It will purge batches as join moves along.
> > >>>
> > >>> Worst case case is when there are lots of repeating values on right
> > side
> > >>> which iterator will hold in memory.
> > >>>
> > >>> ~ Amit.
> > >>>
> > >>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <
> > >>> adeneche@maprtech.com
> > >>> > wrote:
> > >>>
> > >>> > Amit,
> > >>> >
> > >>> > I am looking at DRILL-4190 where one of the sort operators is
> hitting
> > >>> it's
> > >>> > allocator limit when it's sending data downstream. This generally
> > >>> happen
> > >>> > when a downstream operator is holding those batches in memory (e.g.
> > >>> Window
> > >>> > Operator).
> > >>> >
> > >>> > The same query is running fine on 1.2.0 which seems to suggest that
> > the
> > >>> > recent changes to MergeJoinBatch "may" be causing the issue.
> > >>> >
> > >>> > It looks like RecordIterator is holding all incoming batches into a
> > >>> > TreeRangeMap and if I'm not mistaken it doesn't release anything
> > until
> > >>> it's
> > >>> > closed. Is this correct ?
> > >>> >
> > >>> > I am not familiar with how merge join used to work before
> > >>> RecordIterator.
> > >>> > Was it also the case that we hold all incoming batches in memory ?
> > >>> >
> > >>> > Thanks
> > >>> >
> > >>> > --
> > >>> >
> > >>> > Abdelhakim Deneche
> > >>> >
> > >>> > Software Engineer
> > >>> >
> > >>> >   <http://www.mapr.com/>
> > >>> >
> > >>> >
> > >>> > Now Available - Free Hadoop On-Demand Training
> > >>> > <
> > >>> >
> > >>>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Abdelhakim Deneche
> > >>
> > >> Software Engineer
> > >>
> > >>   <http://www.mapr.com/>
> > >>
> > >>
> > >> Now Available - Free Hadoop On-Demand Training
> > >> <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Question about the RecordIterator

Posted by Amit Hadke <am...@gmail.com>.

Yup that may be it. I'll add an option to not hold on to left side iterator
batches.

On Tue, Dec 15, 2015 at 11:56 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> RecordIterator.mark() is only called for the right side of the merge join.
> How about the left side, de we ever release the batches on the left side ?
> In 4190 the sort that runs out of memory is on the left side of the merge.
>
> On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > I see, it's in RecordIterator.mark()
> >
> > On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com> wrote:
> >
> >> Amit,
> >>
> >> thanks for the prompt answer. Can you point me, in the code, where the
> >> purge is done ?
> >>
> >>
> >>
> >> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <am...@gmail.com>
> >> wrote:
> >>
> >>> Hi Hakim,
> >>> RecordIterator will not hold all batches in memory. It holds batches
> from
> >>> last mark() operation.
> >>> It will purge batches as join moves along.
> >>>
> >>> Worst case case is when there are lots of repeating values on right
> side
> >>> which iterator will hold in memory.
> >>>
> >>> ~ Amit.
> >>>
> >>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <
> >>> adeneche@maprtech.com
> >>> > wrote:
> >>>
> >>> > Amit,
> >>> >
> >>> > I am looking at DRILL-4190 where one of the sort operators is hitting
> >>> it's
> >>> > allocator limit when it's sending data downstream. This generally
> >>> happen
> >>> > when a downstream operator is holding those batches in memory (e.g.
> >>> Window
> >>> > Operator).
> >>> >
> >>> > The same query is running fine on 1.2.0 which seems to suggest that
> the
> >>> > recent changes to MergeJoinBatch "may" be causing the issue.
> >>> >
> >>> > It looks like RecordIterator is holding all incoming batches into a
> >>> > TreeRangeMap and if I'm not mistaken it doesn't release anything
> until
> >>> it's
> >>> > closed. Is this correct ?
> >>> >
> >>> > I am not familiar with how merge join used to work before
> >>> RecordIterator.
> >>> > Was it also the case that we hold all incoming batches in memory ?
> >>> >
> >>> > Thanks
> >>> >
> >>> > --
> >>> >
> >>> > Abdelhakim Deneche
> >>> >
> >>> > Software Engineer
> >>> >
> >>> >   <http://www.mapr.com/>
> >>> >
> >>> >
> >>> > Now Available - Free Hadoop On-Demand Training
> >>> > <
> >>> >
> >>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>> > >
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Abdelhakim Deneche
> >>
> >> Software Engineer
> >>
> >>   <http://www.mapr.com/>
> >>
> >>
> >> Now Available - Free Hadoop On-Demand Training
> >> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >>
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Question about the RecordIterator

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.

RecordIterator.mark() is only called for the right side of the merge join.
How about the left side, de we ever release the batches on the left side ?
In 4190 the sort that runs out of memory is on the left side of the merge.

On Tue, Dec 15, 2015 at 11:51 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> I see, it's in RecordIterator.mark()
>
> On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com> wrote:
>
>> Amit,
>>
>> thanks for the prompt answer. Can you point me, in the code, where the
>> purge is done ?
>>
>>
>>
>> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <am...@gmail.com>
>> wrote:
>>
>>> Hi Hakim,
>>> RecordIterator will not hold all batches in memory. It holds batches from
>>> last mark() operation.
>>> It will purge batches as join moves along.
>>>
>>> Worst case case is when there are lots of repeating values on right side
>>> which iterator will hold in memory.
>>>
>>> ~ Amit.
>>>
>>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <
>>> adeneche@maprtech.com
>>> > wrote:
>>>
>>> > Amit,
>>> >
>>> > I am looking at DRILL-4190 where one of the sort operators is hitting
>>> it's
>>> > allocator limit when it's sending data downstream. This generally
>>> happen
>>> > when a downstream operator is holding those batches in memory (e.g.
>>> Window
>>> > Operator).
>>> >
>>> > The same query is running fine on 1.2.0 which seems to suggest that the
>>> > recent changes to MergeJoinBatch "may" be causing the issue.
>>> >
>>> > It looks like RecordIterator is holding all incoming batches into a
>>> > TreeRangeMap and if I'm not mistaken it doesn't release anything until
>>> it's
>>> > closed. Is this correct ?
>>> >
>>> > I am not familiar with how merge join used to work before
>>> RecordIterator.
>>> > Was it also the case that we hold all incoming batches in memory ?
>>> >
>>> > Thanks
>>> >
>>> > --
>>> >
>>> > Abdelhakim Deneche
>>> >
>>> > Software Engineer
>>> >
>>> >   <http://www.mapr.com/>
>>> >
>>> >
>>> > Now Available - Free Hadoop On-Demand Training
>>> > <
>>> >
>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>>
>> Abdelhakim Deneche
>>
>> Software Engineer
>>
>>   <http://www.mapr.com/>
>>
>>
>> Now Available - Free Hadoop On-Demand Training
>> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Question about the RecordIterator

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.

I see, it's in RecordIterator.mark()

On Tue, Dec 15, 2015 at 11:50 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> Amit,
>
> thanks for the prompt answer. Can you point me, in the code, where the
> purge is done ?
>
>
>
> On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <am...@gmail.com> wrote:
>
>> Hi Hakim,
>> RecordIterator will not hold all batches in memory. It holds batches from
>> last mark() operation.
>> It will purge batches as join moves along.
>>
>> Worst case case is when there are lots of repeating values on right side
>> which iterator will hold in memory.
>>
>> ~ Amit.
>>
>> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <
>> adeneche@maprtech.com
>> > wrote:
>>
>> > Amit,
>> >
>> > I am looking at DRILL-4190 where one of the sort operators is hitting
>> it's
>> > allocator limit when it's sending data downstream. This generally happen
>> > when a downstream operator is holding those batches in memory (e.g.
>> Window
>> > Operator).
>> >
>> > The same query is running fine on 1.2.0 which seems to suggest that the
>> > recent changes to MergeJoinBatch "may" be causing the issue.
>> >
>> > It looks like RecordIterator is holding all incoming batches into a
>> > TreeRangeMap and if I'm not mistaken it doesn't release anything until
>> it's
>> > closed. Is this correct ?
>> >
>> > I am not familiar with how merge join used to work before
>> RecordIterator.
>> > Was it also the case that we hold all incoming batches in memory ?
>> >
>> > Thanks
>> >
>> > --
>> >
>> > Abdelhakim Deneche
>> >
>> > Software Engineer
>> >
>> >   <http://www.mapr.com/>
>> >
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > >
>> >
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Question about the RecordIterator

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.

Amit,

thanks for the prompt answer. Can you point me, in the code, where the
purge is done ?



On Tue, Dec 15, 2015 at 11:42 AM, Amit Hadke <am...@gmail.com> wrote:

> Hi Hakim,
> RecordIterator will not hold all batches in memory. It holds batches from
> last mark() operation.
> It will purge batches as join moves along.
>
> Worst case case is when there are lots of repeating values on right side
> which iterator will hold in memory.
>
> ~ Amit.
>
> On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > Amit,
> >
> > I am looking at DRILL-4190 where one of the sort operators is hitting
> it's
> > allocator limit when it's sending data downstream. This generally happen
> > when a downstream operator is holding those batches in memory (e.g.
> Window
> > Operator).
> >
> > The same query is running fine on 1.2.0 which seems to suggest that the
> > recent changes to MergeJoinBatch "may" be causing the issue.
> >
> > It looks like RecordIterator is holding all incoming batches into a
> > TreeRangeMap and if I'm not mistaken it doesn't release anything until
> it's
> > closed. Is this correct ?
> >
> > I am not familiar with how merge join used to work before RecordIterator.
> > Was it also the case that we hold all incoming batches in memory ?
> >
> > Thanks
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Question about the RecordIterator

Posted by Amit Hadke <am...@gmail.com>.

Hi Hakim,
RecordIterator will not hold all batches in memory. It holds batches from
last mark() operation.
It will purge batches as join moves along.

Worst case case is when there are lots of repeating values on right side
which iterator will hold in memory.

~ Amit.

On Tue, Dec 15, 2015 at 11:23 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> Amit,
>
> I am looking at DRILL-4190 where one of the sort operators is hitting it's
> allocator limit when it's sending data downstream. This generally happen
> when a downstream operator is holding those batches in memory (e.g. Window
> Operator).
>
> The same query is running fine on 1.2.0 which seems to suggest that the
> recent changes to MergeJoinBatch "may" be causing the issue.
>
> It looks like RecordIterator is holding all incoming batches into a
> TreeRangeMap and if I'm not mistaken it doesn't release anything until it's
> closed. Is this correct ?
>
> I am not familiar with how merge join used to work before RecordIterator.
> Was it also the case that we hold all incoming batches in memory ?
>
> Thanks
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>