You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by François Méthot <fm...@gmail.com> on 2017/04/05 20:39:38 UTC

Memory was Leaked error when using "limit" in 1.10

Hi,

  I am still investigating this problem, but I will describe the symptom to
you in case there is known issue with drill 1.10.

  We migrated our production system from Drill 1.9 to 1.10 just 5 days ago.
(220 nodes cluster)

Our log show there was some 900+ queries ran without problem in first 4
days.  (similar queries, that never use the `limit` clause)

Yesterday we started doing simple adhoc select * ... limit 10 queries (like
we often do, that was our first use of limit with 1.10)
and we got a `Memory was leaked` exception below.

Also, once we get the error, Most of all subsequent user queries fails with
Channel Close Exception. We need to restart Drill to bring it back to
normal.

A day later, I used a similar select * limit 10 queries, and the same thing
happen, had to restart Drill.

In the exception, it was refering to a file (1_0_0.parquet)
I moved that file to smaller test cluster (12 nodes) and got the error on
the first attempt. but I am no longer able to reproduce the issue on that
file. Between the 12 and 220 nodes cluster, a different Column name and Row
Group Start was listed in the error.
The parquet file was generated by Drill 1.10.

I tried the same file with a local drill-embedded 1.9 and 1.10 and had no
issue.


Here is the error (manually typed), if you think of anything obvious, let
us know.


AsyncPageReader - User Error Occured: Exception Occurred while reading from
disk (can not read class o.a.parquet.format.PageHeader:
java.io.IOException: input stream is closed.)

File:..../1_0_0.parquet
Column: StringColXYZ
Row Group Start: 115215476

[Error Id: ....]
  at UserException.java:544)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsynvPageReader.java:199)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.access(AsynvPageReader.java:81)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:483)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:392)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:392)
...
Caused by: java.io.IOException: can not read class
org.apache.parquet.format.PageHeader: java.io.IOException: Input Stream is
closed.
   at o.a.parquet.format.Util.read(Util.java:216)
   at o.a.parquet.format.Util.readPageHeader(Util.java:65)
   at
o.a.drill.exec.store.parquet.columnreaders.AsyncPageReader(AsyncPageReaderTask:430)
Caused by: parquet.org.apache.thrift.transport.TTransportException: Input
stream is closed
   at ...read(TIOStreamTransport.java:129)
   at ....TTransport.readAll(TTransport.java:84)
   at ....TCompactProtocol.readByte(TCompactProtocol.java:474)
   at ....TCompactProtocol.readFieldBegin(TCompactProtocol.java:481)
   at ....InterningProtocol.readFieldBegin(InterningProtocol.java:158)
   at ....o.a.parquet.format.PageHeader.read(PageHeader.java:828)
   at ....o.a.parquet.format.Util.read(Util.java:213)


Fragment 0:0
[Error id: ...]
o.a.drill.common.exception.UserException: SYSTEM ERROR:
IllegalStateException: Memory was leaked by query. Memory leaked: (524288)
Allocator(op:0:0:4:ParquetRowGroupScan) 1000000/524288/39919616/10000000000
  at o.a.d.common.exceptions.UserException (UserException.java:544)
  at
o.a.d.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
  at o.a.d.exec.work.fragment.FragmentExecutor.cleanup(
FragmentExecutor.java:160)
  at
o.a.d.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
...
Caused by: IllegalStateException: Memory was leaked by query. Memory
leaked: (524288)
  at o.a.d.exec.memory.BaseAllocator.close(BaseAllocator.java:502)
  at o.a.d.exec.ops.OperatorContextImpl(OperatorContextImpl.java:149)
  at
o.a.d.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:422)
  at o.a.d.exec.ops.FragmentContext.close(FragmentContext.java:411)
  at
o.a.d.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:318)
  at
o.a.d.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)











Francois

Re: Memory was Leaked error when using "limit" in 1.10

Posted by François Méthot <fm...@gmail.com>.
Yes it did, the problem is gone. Thanks

I will share the details I have on a Jira ticket now.



On Tue, Apr 11, 2017 at 9:22 PM, Kunal Khatua <kk...@mapr.com> wrote:

> Did this help resolve the memory leak, Francois?
>
>
> Could you share the stack trace and other relevant logs on a JIRA?
>
>
> Thanks
>
> Kunal
>
>
>
>
> ________________________________
> From: Kunal Khatua <kk...@mapr.com>
> Sent: Wednesday, April 5, 2017 2:03:19 PM
> To: dev@drill.apache.org
> Subject: Re: Memory was Leaked error when using "limit" in 1.10
>
> Hi Francois
>
> Could you try those queries with the AsyncPageReader turned off?
>
> alter <session|system> set `store.parquet.reader.pagereader.async`=false;
>
> For Drill 1.9+ , this feature is enabled. However, there were some perf
> related improvements that Drill 1.10 carried out.
>
> If the problem goes away, could you file a JIRA and share the sample query
> and data to allow us a repro ?
>
> Thanks
>
> Kunal
>
> ________________________________
> From: François Méthot <fm...@gmail.com>
> Sent: Wednesday, April 5, 2017 1:39:38 PM
> To: dev@drill.apache.org
> Subject: Memory was Leaked error when using "limit" in 1.10
>
> Hi,
>
>   I am still investigating this problem, but I will describe the symptom to
> you in case there is known issue with drill 1.10.
>
>   We migrated our production system from Drill 1.9 to 1.10 just 5 days ago.
> (220 nodes cluster)
>
> Our log show there was some 900+ queries ran without problem in first 4
> days.  (similar queries, that never use the `limit` clause)
>
> Yesterday we started doing simple adhoc select * ... limit 10 queries (like
> we often do, that was our first use of limit with 1.10)
> and we got a `Memory was leaked` exception below.
>
> Also, once we get the error, Most of all subsequent user queries fails with
> Channel Close Exception. We need to restart Drill to bring it back to
> normal.
>
> A day later, I used a similar select * limit 10 queries, and the same thing
> happen, had to restart Drill.
>
> In the exception, it was refering to a file (1_0_0.parquet)
> I moved that file to smaller test cluster (12 nodes) and got the error on
> the first attempt. but I am no longer able to reproduce the issue on that
> file. Between the 12 and 220 nodes cluster, a different Column name and Row
> Group Start was listed in the error.
> The parquet file was generated by Drill 1.10.
>
> I tried the same file with a local drill-embedded 1.9 and 1.10 and had no
> issue.
>
>
> Here is the error (manually typed), if you think of anything obvious, let
> us know.
>
>
> AsyncPageReader - User Error Occured: Exception Occurred while reading from
> disk (can not read class o.a.parquet.format.PageHeader:
> java.io.IOException: input stream is closed.)
>
> File:..../1_0_0.parquet
> Column: StringColXYZ
> Row Group Start: 115215476
>
> [Error Id: ....]
>   at UserException.java:544)
>   at
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.
> handleAndThrowException(AsynvPageReader.java:199)
>   at
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.
> access(AsynvPageReader.java:81)
>   at
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.
> AsyncPageReaderTask.call(AsyncPageReader.java:483)
>   at
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.
> AsyncPageReaderTask.call(AsyncPageReader.java:392)
>   at
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.
> AsyncPageReaderTask.call(AsyncPageReader.java:392)
> ...
> Caused by: java.io.IOException: can not read class
> org.apache.parquet.format.PageHeader: java.io.IOException: Input Stream is
> closed.
>    at o.a.parquet.format.Util.read(Util.java:216)
>    at o.a.parquet.format.Util.readPageHeader(Util.java:65)
>    at
> o.a.drill.exec.store.parquet.columnreaders.AsyncPageReader(
> AsyncPageReaderTask:430)
> Caused by: parquet.org.apache.thrift.transport.TTransportException: Input
> stream is closed
>    at ...read(TIOStreamTransport.java:129)
>    at ....TTransport.readAll(TTransport.java:84)
>    at ....TCompactProtocol.readByte(TCompactProtocol.java:474)
>    at ....TCompactProtocol.readFieldBegin(TCompactProtocol.java:481)
>    at ....InterningProtocol.readFieldBegin(InterningProtocol.java:158)
>    at ....o.a.parquet.format.PageHeader.read(PageHeader.java:828)
>    at ....o.a.parquet.format.Util.read(Util.java:213)
>
>
> Fragment 0:0
> [Error id: ...]
> o.a.drill.common.exception.UserException: SYSTEM ERROR:
> IllegalStateException: Memory was leaked by query. Memory leaked: (524288)
> Allocator(op:0:0:4:ParquetRowGroupScan) 1000000/524288/39919616/
> 10000000000
>   at o.a.d.common.exceptions.UserException (UserException.java:544)
>   at
> o.a.d.exec.work.fragment.FragmentExecutor.sendFinalState(
> FragmentExecutor.java:293)
>   at o.a.d.exec.work.fragment.FragmentExecutor.cleanup(
> FragmentExecutor.java:160)
>   at
> o.a.d.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
> ...
> Caused by: IllegalStateException: Memory was leaked by query. Memory
> leaked: (524288)
>   at o.a.d.exec.memory.BaseAllocator.close(BaseAllocator.java:502)
>   at o.a.d.exec.ops.OperatorContextImpl(OperatorContextImpl.java:149)
>   at
> o.a.d.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:422)
>   at o.a.d.exec.ops.FragmentContext.close(FragmentContext.java:411)
>   at
> o.a.d.exec.work.fragment.FragmentExecutor.closeOutResources(
> FragmentExecutor.java:318)
>   at
> o.a.d.exec.work.fragment.FragmentExecutor.cleanup(
> FragmentExecutor.java:155)
>
>
>
>
>
>
>
>
>
>
>
> Francois
>

Re: Memory was Leaked error when using "limit" in 1.10

Posted by Kunal Khatua <kk...@mapr.com>.
Did this help resolve the memory leak, Francois?


Could you share the stack trace and other relevant logs on a JIRA?


Thanks

Kunal




________________________________
From: Kunal Khatua <kk...@mapr.com>
Sent: Wednesday, April 5, 2017 2:03:19 PM
To: dev@drill.apache.org
Subject: Re: Memory was Leaked error when using "limit" in 1.10

Hi Francois

Could you try those queries with the AsyncPageReader turned off?

alter <session|system> set `store.parquet.reader.pagereader.async`=false;

For Drill 1.9+ , this feature is enabled. However, there were some perf related improvements that Drill 1.10 carried out.

If the problem goes away, could you file a JIRA and share the sample query and data to allow us a repro ?

Thanks

Kunal

________________________________
From: François Méthot <fm...@gmail.com>
Sent: Wednesday, April 5, 2017 1:39:38 PM
To: dev@drill.apache.org
Subject: Memory was Leaked error when using "limit" in 1.10

Hi,

  I am still investigating this problem, but I will describe the symptom to
you in case there is known issue with drill 1.10.

  We migrated our production system from Drill 1.9 to 1.10 just 5 days ago.
(220 nodes cluster)

Our log show there was some 900+ queries ran without problem in first 4
days.  (similar queries, that never use the `limit` clause)

Yesterday we started doing simple adhoc select * ... limit 10 queries (like
we often do, that was our first use of limit with 1.10)
and we got a `Memory was leaked` exception below.

Also, once we get the error, Most of all subsequent user queries fails with
Channel Close Exception. We need to restart Drill to bring it back to
normal.

A day later, I used a similar select * limit 10 queries, and the same thing
happen, had to restart Drill.

In the exception, it was refering to a file (1_0_0.parquet)
I moved that file to smaller test cluster (12 nodes) and got the error on
the first attempt. but I am no longer able to reproduce the issue on that
file. Between the 12 and 220 nodes cluster, a different Column name and Row
Group Start was listed in the error.
The parquet file was generated by Drill 1.10.

I tried the same file with a local drill-embedded 1.9 and 1.10 and had no
issue.


Here is the error (manually typed), if you think of anything obvious, let
us know.


AsyncPageReader - User Error Occured: Exception Occurred while reading from
disk (can not read class o.a.parquet.format.PageHeader:
java.io.IOException: input stream is closed.)

File:..../1_0_0.parquet
Column: StringColXYZ
Row Group Start: 115215476

[Error Id: ....]
  at UserException.java:544)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsynvPageReader.java:199)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.access(AsynvPageReader.java:81)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:483)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:392)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:392)
...
Caused by: java.io.IOException: can not read class
org.apache.parquet.format.PageHeader: java.io.IOException: Input Stream is
closed.
   at o.a.parquet.format.Util.read(Util.java:216)
   at o.a.parquet.format.Util.readPageHeader(Util.java:65)
   at
o.a.drill.exec.store.parquet.columnreaders.AsyncPageReader(AsyncPageReaderTask:430)
Caused by: parquet.org.apache.thrift.transport.TTransportException: Input
stream is closed
   at ...read(TIOStreamTransport.java:129)
   at ....TTransport.readAll(TTransport.java:84)
   at ....TCompactProtocol.readByte(TCompactProtocol.java:474)
   at ....TCompactProtocol.readFieldBegin(TCompactProtocol.java:481)
   at ....InterningProtocol.readFieldBegin(InterningProtocol.java:158)
   at ....o.a.parquet.format.PageHeader.read(PageHeader.java:828)
   at ....o.a.parquet.format.Util.read(Util.java:213)


Fragment 0:0
[Error id: ...]
o.a.drill.common.exception.UserException: SYSTEM ERROR:
IllegalStateException: Memory was leaked by query. Memory leaked: (524288)
Allocator(op:0:0:4:ParquetRowGroupScan) 1000000/524288/39919616/10000000000
  at o.a.d.common.exceptions.UserException (UserException.java:544)
  at
o.a.d.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
  at o.a.d.exec.work.fragment.FragmentExecutor.cleanup(
FragmentExecutor.java:160)
  at
o.a.d.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
...
Caused by: IllegalStateException: Memory was leaked by query. Memory
leaked: (524288)
  at o.a.d.exec.memory.BaseAllocator.close(BaseAllocator.java:502)
  at o.a.d.exec.ops.OperatorContextImpl(OperatorContextImpl.java:149)
  at
o.a.d.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:422)
  at o.a.d.exec.ops.FragmentContext.close(FragmentContext.java:411)
  at
o.a.d.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:318)
  at
o.a.d.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)











Francois

Re: Memory was Leaked error when using "limit" in 1.10

Posted by Kunal Khatua <kk...@mapr.com>.
Hi Francois

Could you try those queries with the AsyncPageReader turned off?

alter <session|system> set `store.parquet.reader.pagereader.async`=false;

For Drill 1.9+ , this feature is enabled. However, there were some perf related improvements that Drill 1.10 carried out.

If the problem goes away, could you file a JIRA and share the sample query and data to allow us a repro ?

Thanks

Kunal

________________________________
From: François Méthot <fm...@gmail.com>
Sent: Wednesday, April 5, 2017 1:39:38 PM
To: dev@drill.apache.org
Subject: Memory was Leaked error when using "limit" in 1.10

Hi,

  I am still investigating this problem, but I will describe the symptom to
you in case there is known issue with drill 1.10.

  We migrated our production system from Drill 1.9 to 1.10 just 5 days ago.
(220 nodes cluster)

Our log show there was some 900+ queries ran without problem in first 4
days.  (similar queries, that never use the `limit` clause)

Yesterday we started doing simple adhoc select * ... limit 10 queries (like
we often do, that was our first use of limit with 1.10)
and we got a `Memory was leaked` exception below.

Also, once we get the error, Most of all subsequent user queries fails with
Channel Close Exception. We need to restart Drill to bring it back to
normal.

A day later, I used a similar select * limit 10 queries, and the same thing
happen, had to restart Drill.

In the exception, it was refering to a file (1_0_0.parquet)
I moved that file to smaller test cluster (12 nodes) and got the error on
the first attempt. but I am no longer able to reproduce the issue on that
file. Between the 12 and 220 nodes cluster, a different Column name and Row
Group Start was listed in the error.
The parquet file was generated by Drill 1.10.

I tried the same file with a local drill-embedded 1.9 and 1.10 and had no
issue.


Here is the error (manually typed), if you think of anything obvious, let
us know.


AsyncPageReader - User Error Occured: Exception Occurred while reading from
disk (can not read class o.a.parquet.format.PageHeader:
java.io.IOException: input stream is closed.)

File:..../1_0_0.parquet
Column: StringColXYZ
Row Group Start: 115215476

[Error Id: ....]
  at UserException.java:544)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsynvPageReader.java:199)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.access(AsynvPageReader.java:81)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:483)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:392)
  at
o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:392)
...
Caused by: java.io.IOException: can not read class
org.apache.parquet.format.PageHeader: java.io.IOException: Input Stream is
closed.
   at o.a.parquet.format.Util.read(Util.java:216)
   at o.a.parquet.format.Util.readPageHeader(Util.java:65)
   at
o.a.drill.exec.store.parquet.columnreaders.AsyncPageReader(AsyncPageReaderTask:430)
Caused by: parquet.org.apache.thrift.transport.TTransportException: Input
stream is closed
   at ...read(TIOStreamTransport.java:129)
   at ....TTransport.readAll(TTransport.java:84)
   at ....TCompactProtocol.readByte(TCompactProtocol.java:474)
   at ....TCompactProtocol.readFieldBegin(TCompactProtocol.java:481)
   at ....InterningProtocol.readFieldBegin(InterningProtocol.java:158)
   at ....o.a.parquet.format.PageHeader.read(PageHeader.java:828)
   at ....o.a.parquet.format.Util.read(Util.java:213)


Fragment 0:0
[Error id: ...]
o.a.drill.common.exception.UserException: SYSTEM ERROR:
IllegalStateException: Memory was leaked by query. Memory leaked: (524288)
Allocator(op:0:0:4:ParquetRowGroupScan) 1000000/524288/39919616/10000000000
  at o.a.d.common.exceptions.UserException (UserException.java:544)
  at
o.a.d.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
  at o.a.d.exec.work.fragment.FragmentExecutor.cleanup(
FragmentExecutor.java:160)
  at
o.a.d.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
...
Caused by: IllegalStateException: Memory was leaked by query. Memory
leaked: (524288)
  at o.a.d.exec.memory.BaseAllocator.close(BaseAllocator.java:502)
  at o.a.d.exec.ops.OperatorContextImpl(OperatorContextImpl.java:149)
  at
o.a.d.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:422)
  at o.a.d.exec.ops.FragmentContext.close(FragmentContext.java:411)
  at
o.a.d.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:318)
  at
o.a.d.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)











Francois