You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Abdel Hakim Deneche <ad...@maprtech.com> on 2015/07/07 18:13:41 UTC

connection allocator in rpc layer is using too much memory

Trying to investigate DRILL-3241
<https://issues.apache.org/jira/browse/DRILL-3241> (query hangs if out of
memory in RPC layer), I see the following warning in the logs:

WARN: Failure allocating buffer on incoming stream due to

 memory limits.  Current Allocation: 1372678764.


This happening in ProtobufLengthDecoder.decode() on the receiver side (data
server).

Is it expected for the connection allocation to allocation > 1GB of memory
? shouldn't the allocated batches be transferred to the receiving
fragment's allocator ?

Thanks!

-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: connection allocator in rpc layer is using too much memory

Posted by Jacques Nadeau <ja...@apache.org>.

I understand that.  However, if the top level allocator is out of memory,
child allocators aren't going to help us.  The one thing that a child
allocator may allow you to do is do a child reservation so we hold back
memory for other uses.  Is that what you're proposing?

Do we even check allocation limits when we are doing ownership transfers?

On Tue, Jul 7, 2015 at 12:10 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> The main issue I'm seeing in DRILL-3241 is when we hit an out of memory in
> the RPC layer it's too soon to figure out which fragment executor we should
> fail, so the query just forever. I'm hoping that by using a sub-allocator
> for the RPC layer we will hit the out of memory later (e.g. when
> transferring the batch to the fragment's allocator) and we will be able to
> properly fail the query.
>
> On Tue, Jul 7, 2015 at 10:46 AM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > Yes, I believe it using the TopLevelAllocator.  We could have a
> > suballocator but I can't really see how that would help with the JIRA
> > issue.  The one thing that might be a good idea is that we could then
> have
> > extra memory reservation for the RPC layer.  (In general, we don't want
> to
> > run out of memory inside the RPC layer.)
> >
> > On Tue, Jul 7, 2015 at 10:10 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com>
> > wrote:
> >
> > > My bad, I didn't explain the problem well. The value displayed in the
> log
> > > is the amount currently allocated by the
> ProtobufLengthDecoder.allocator
> > > and not the size we are trying to allocate, I will add the size we are
> > > trying to the log message and report back here.
> > >
> > > I was assuming the RPC layer uses it's own child allocator and it
> didn't
> > > make sense to me this allocator reached > 1GB because it should
> transfer
> > > the batch their corresponding fragment context (we are on the data
> server
> > > side). But then while investigating further I think the
> > > ProtobufLengthDecoder is actually using the drillbit top level
> allocator.
> > > am I right ?
> > > This would explain why the allocator reached it's limit. Any reason the
> > RPC
> > > layer isn't using it's own child allocator ?
> > >
> > > Thanks!
> > >
> > > On Tue, Jul 7, 2015 at 10:02 AM, Jacques Nadeau <ja...@apache.org>
> > > wrote:
> > >
> > > > There is a time where data is read off the socket before we know what
> > > type
> > > > of message it is.  This socket read buffer is outside the normal flow
> > and
> > > > could grow (although it shouldn't get this big).  However, the memory
> > > > you're talking about here is memory allocated due to the size of the
> > > > incoming message.  My guess would be either you have unusually large
> > > > records or the length of the message being sent was corrupted.
> > (Assuming
> > > > you are talking about the allocation at [1]).
> > > >
> > > > I would start logging unusually large record batches and see if
> > something
> > > > weird is going on.  A record batch shouldn't be larger than 65k
> records
> > > so
> > > > for the batch to be 1gb in size would require each record to be 16k
> in
> > > size
> > > > and for the batch to be the maximum number of records.  More
> > > realistically,
> > > > we generally target 4k records in a batch which would suggest records
> > > that
> > > > are 256k.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/ProtobufLengthDecoder.java#L87
> > > >
> > > > On Tue, Jul 7, 2015 at 9:13 AM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com
> > > > >
> > > > wrote:
> > > >
> > > > > Trying to investigate DRILL-3241
> > > > > <https://issues.apache.org/jira/browse/DRILL-3241> (query hangs if
> > out
> > > > of
> > > > > memory in RPC layer), I see the following warning in the logs:
> > > > >
> > > > > WARN: Failure allocating buffer on incoming stream due to
> > > > >
> > > > >  memory limits.  Current Allocation: 1372678764.
> > > > >
> > > > >
> > > > > This happening in ProtobufLengthDecoder.decode() on the receiver
> side
> > > > (data
> > > > > server).
> > > > >
> > > > > Is it expected for the connection allocation to allocation > 1GB of
> > > > memory
> > > > > ? shouldn't the allocated batches be transferred to the receiving
> > > > > fragment's allocator ?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: connection allocator in rpc layer is using too much memory

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.

The main issue I'm seeing in DRILL-3241 is when we hit an out of memory in
the RPC layer it's too soon to figure out which fragment executor we should
fail, so the query just forever. I'm hoping that by using a sub-allocator
for the RPC layer we will hit the out of memory later (e.g. when
transferring the batch to the fragment's allocator) and we will be able to
properly fail the query.

On Tue, Jul 7, 2015 at 10:46 AM, Jacques Nadeau <ja...@apache.org> wrote:

> Yes, I believe it using the TopLevelAllocator.  We could have a
> suballocator but I can't really see how that would help with the JIRA
> issue.  The one thing that might be a good idea is that we could then have
> extra memory reservation for the RPC layer.  (In general, we don't want to
> run out of memory inside the RPC layer.)
>
> On Tue, Jul 7, 2015 at 10:10 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > My bad, I didn't explain the problem well. The value displayed in the log
> > is the amount currently allocated by the ProtobufLengthDecoder.allocator
> > and not the size we are trying to allocate, I will add the size we are
> > trying to the log message and report back here.
> >
> > I was assuming the RPC layer uses it's own child allocator and it didn't
> > make sense to me this allocator reached > 1GB because it should transfer
> > the batch their corresponding fragment context (we are on the data server
> > side). But then while investigating further I think the
> > ProtobufLengthDecoder is actually using the drillbit top level allocator.
> > am I right ?
> > This would explain why the allocator reached it's limit. Any reason the
> RPC
> > layer isn't using it's own child allocator ?
> >
> > Thanks!
> >
> > On Tue, Jul 7, 2015 at 10:02 AM, Jacques Nadeau <ja...@apache.org>
> > wrote:
> >
> > > There is a time where data is read off the socket before we know what
> > type
> > > of message it is.  This socket read buffer is outside the normal flow
> and
> > > could grow (although it shouldn't get this big).  However, the memory
> > > you're talking about here is memory allocated due to the size of the
> > > incoming message.  My guess would be either you have unusually large
> > > records or the length of the message being sent was corrupted.
> (Assuming
> > > you are talking about the allocation at [1]).
> > >
> > > I would start logging unusually large record batches and see if
> something
> > > weird is going on.  A record batch shouldn't be larger than 65k records
> > so
> > > for the batch to be 1gb in size would require each record to be 16k in
> > size
> > > and for the batch to be the maximum number of records.  More
> > realistically,
> > > we generally target 4k records in a batch which would suggest records
> > that
> > > are 256k.
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/ProtobufLengthDecoder.java#L87
> > >
> > > On Tue, Jul 7, 2015 at 9:13 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com
> > > >
> > > wrote:
> > >
> > > > Trying to investigate DRILL-3241
> > > > <https://issues.apache.org/jira/browse/DRILL-3241> (query hangs if
> out
> > > of
> > > > memory in RPC layer), I see the following warning in the logs:
> > > >
> > > > WARN: Failure allocating buffer on incoming stream due to
> > > >
> > > >  memory limits.  Current Allocation: 1372678764.
> > > >
> > > >
> > > > This happening in ProtobufLengthDecoder.decode() on the receiver side
> > > (data
> > > > server).
> > > >
> > > > Is it expected for the connection allocation to allocation > 1GB of
> > > memory
> > > > ? shouldn't the allocated batches be transferred to the receiving
> > > > fragment's allocator ?
> > > >
> > > > Thanks!
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: connection allocator in rpc layer is using too much memory

Posted by Jacques Nadeau <ja...@apache.org>.

Yes, I believe it using the TopLevelAllocator.  We could have a
suballocator but I can't really see how that would help with the JIRA
issue.  The one thing that might be a good idea is that we could then have
extra memory reservation for the RPC layer.  (In general, we don't want to
run out of memory inside the RPC layer.)

On Tue, Jul 7, 2015 at 10:10 AM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> My bad, I didn't explain the problem well. The value displayed in the log
> is the amount currently allocated by the ProtobufLengthDecoder.allocator
> and not the size we are trying to allocate, I will add the size we are
> trying to the log message and report back here.
>
> I was assuming the RPC layer uses it's own child allocator and it didn't
> make sense to me this allocator reached > 1GB because it should transfer
> the batch their corresponding fragment context (we are on the data server
> side). But then while investigating further I think the
> ProtobufLengthDecoder is actually using the drillbit top level allocator.
> am I right ?
> This would explain why the allocator reached it's limit. Any reason the RPC
> layer isn't using it's own child allocator ?
>
> Thanks!
>
> On Tue, Jul 7, 2015 at 10:02 AM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > There is a time where data is read off the socket before we know what
> type
> > of message it is.  This socket read buffer is outside the normal flow and
> > could grow (although it shouldn't get this big).  However, the memory
> > you're talking about here is memory allocated due to the size of the
> > incoming message.  My guess would be either you have unusually large
> > records or the length of the message being sent was corrupted.  (Assuming
> > you are talking about the allocation at [1]).
> >
> > I would start logging unusually large record batches and see if something
> > weird is going on.  A record batch shouldn't be larger than 65k records
> so
> > for the batch to be 1gb in size would require each record to be 16k in
> size
> > and for the batch to be the maximum number of records.  More
> realistically,
> > we generally target 4k records in a batch which would suggest records
> that
> > are 256k.
> >
> > [1]
> >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/ProtobufLengthDecoder.java#L87
> >
> > On Tue, Jul 7, 2015 at 9:13 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > >
> > wrote:
> >
> > > Trying to investigate DRILL-3241
> > > <https://issues.apache.org/jira/browse/DRILL-3241> (query hangs if out
> > of
> > > memory in RPC layer), I see the following warning in the logs:
> > >
> > > WARN: Failure allocating buffer on incoming stream due to
> > >
> > >  memory limits.  Current Allocation: 1372678764.
> > >
> > >
> > > This happening in ProtobufLengthDecoder.decode() on the receiver side
> > (data
> > > server).
> > >
> > > Is it expected for the connection allocation to allocation > 1GB of
> > memory
> > > ? shouldn't the allocated batches be transferred to the receiving
> > > fragment's allocator ?
> > >
> > > Thanks!
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: connection allocator in rpc layer is using too much memory

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.

My bad, I didn't explain the problem well. The value displayed in the log
is the amount currently allocated by the ProtobufLengthDecoder.allocator
and not the size we are trying to allocate, I will add the size we are
trying to the log message and report back here.

I was assuming the RPC layer uses it's own child allocator and it didn't
make sense to me this allocator reached > 1GB because it should transfer
the batch their corresponding fragment context (we are on the data server
side). But then while investigating further I think the
ProtobufLengthDecoder is actually using the drillbit top level allocator.
am I right ?
This would explain why the allocator reached it's limit. Any reason the RPC
layer isn't using it's own child allocator ?

Thanks!

On Tue, Jul 7, 2015 at 10:02 AM, Jacques Nadeau <ja...@apache.org> wrote:

> There is a time where data is read off the socket before we know what type
> of message it is.  This socket read buffer is outside the normal flow and
> could grow (although it shouldn't get this big).  However, the memory
> you're talking about here is memory allocated due to the size of the
> incoming message.  My guess would be either you have unusually large
> records or the length of the message being sent was corrupted.  (Assuming
> you are talking about the allocation at [1]).
>
> I would start logging unusually large record batches and see if something
> weird is going on.  A record batch shouldn't be larger than 65k records so
> for the batch to be 1gb in size would require each record to be 16k in size
> and for the batch to be the maximum number of records.  More realistically,
> we generally target 4k records in a batch which would suggest records that
> are 256k.
>
> [1]
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/ProtobufLengthDecoder.java#L87
>
> On Tue, Jul 7, 2015 at 9:13 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> >
> wrote:
>
> > Trying to investigate DRILL-3241
> > <https://issues.apache.org/jira/browse/DRILL-3241> (query hangs if out
> of
> > memory in RPC layer), I see the following warning in the logs:
> >
> > WARN: Failure allocating buffer on incoming stream due to
> >
> >  memory limits.  Current Allocation: 1372678764.
> >
> >
> > This happening in ProtobufLengthDecoder.decode() on the receiver side
> (data
> > server).
> >
> > Is it expected for the connection allocation to allocation > 1GB of
> memory
> > ? shouldn't the allocated batches be transferred to the receiving
> > fragment's allocator ?
> >
> > Thanks!
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: connection allocator in rpc layer is using too much memory

Posted by Jacques Nadeau <ja...@apache.org>.

There is a time where data is read off the socket before we know what type
of message it is.  This socket read buffer is outside the normal flow and
could grow (although it shouldn't get this big).  However, the memory
you're talking about here is memory allocated due to the size of the
incoming message.  My guess would be either you have unusually large
records or the length of the message being sent was corrupted.  (Assuming
you are talking about the allocation at [1]).

I would start logging unusually large record batches and see if something
weird is going on.  A record batch shouldn't be larger than 65k records so
for the batch to be 1gb in size would require each record to be 16k in size
and for the batch to be the maximum number of records.  More realistically,
we generally target 4k records in a batch which would suggest records that
are 256k.

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/ProtobufLengthDecoder.java#L87

On Tue, Jul 7, 2015 at 9:13 AM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> Trying to investigate DRILL-3241
> <https://issues.apache.org/jira/browse/DRILL-3241> (query hangs if out of
> memory in RPC layer), I see the following warning in the logs:
>
> WARN: Failure allocating buffer on incoming stream due to
>
>  memory limits.  Current Allocation: 1372678764.
>
>
> This happening in ProtobufLengthDecoder.decode() on the receiver side (data
> server).
>
> Is it expected for the connection allocation to allocation > 1GB of memory
> ? shouldn't the allocated batches be transferred to the receiving
> fragment's allocator ?
>
> Thanks!
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>