You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2016/06/21 03:55:57 UTC
[jira] [Commented] (DRILL-3714) Query runs out of memory and
remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341019#comment-15341019 ]
Khurram Faraaz commented on DRILL-3714:
---------------------------------------
Tried to verify if the problem still exists on Drill 1.7.0 and hit the below RuntimeException, git commit ID : 6286c0a4
{noformat}
[root@centos-01 bin]# hadoop fs -ls /tpcdsParquet/SF100/store_sales/1_25_0.parquet
-rwxr-xr-x 3 root root 4 2016-06-20 23:12 /tpcdsParquet/SF100/store_sales/1_25_0.parquet
Stack trace from drillbit.log
0: jdbc:drill:schema=dfs.tmp> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS TotalSpend FROM dfs.`/tpcdsParquet/SF100/store_sales` ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 LIMIT 10;
Error: SYSTEM ERROR: RuntimeException: maprfs:///tpcdsParquet/SF100/store_sales/1_25_0.parquet is not a Parquet file (too small)
[Error Id: 62aa5dc1-5d59-45a8-bf6c-297c883bf2aa on centos-01.qa.lab:31010]
(org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillTableRule, args [rel#1217:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, /tpcdsParquet/SF100/store_sales])]
org.apache.drill.exec.work.foreman.Foreman.run():271
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
Caused By (java.lang.AssertionError) Internal error: Error while applying rule DrillTableRule, args [rel#1217:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, /tpcdsParquet/SF100/store_sales])]
org.apache.calcite.util.Util.newInternal():792
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():808
org.apache.calcite.tools.Programs$RuleSetProgram.run():303
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():400
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():339
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():237
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():286
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():168
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():94
org.apache.drill.exec.work.foreman.Foreman.runSQL():978
org.apache.drill.exec.work.foreman.Foreman.run():257
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) Failure creating scan.
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():91
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():69
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():62
org.apache.drill.exec.planner.logical.DrillScanRule.onMatch():37
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():228
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():808
org.apache.calcite.tools.Programs$RuleSetProgram.run():303
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():400
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():339
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():237
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():286
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():168
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():94
org.apache.drill.exec.work.foreman.Foreman.runSQL():978
org.apache.drill.exec.work.foreman.Foreman.run():257
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
Caused By (java.io.IOException) java.lang.RuntimeException: maprfs:///tpcdsParquet/SF100/store_sales/1_25_0.parquet is not a Parquet file (too small)
org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.convertToIOException():290
org.apache.drill.exec.store.TimedRunnable.getValue():79
org.apache.drill.exec.store.TimedRunnable.run():173
org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v2():244
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata():223
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata():208
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata():105
org.apache.drill.exec.store.parquet.ParquetGroupScan.init():640
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>():180
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan():167
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan():65
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():145
org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan():103
org.apache.drill.exec.planner.logical.DrillTable.getGroupScan():71
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():89
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():69
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():62
org.apache.drill.exec.planner.logical.DrillScanRule.onMatch():37
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():228
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():808
org.apache.calcite.tools.Programs$RuleSetProgram.run():303
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():400
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():339
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():237
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():286
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():168
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():94
org.apache.drill.exec.work.foreman.Foreman.runSQL():978
org.apache.drill.exec.work.foreman.Foreman.run():257
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
Caused By (java.lang.RuntimeException) maprfs:///tpcdsParquet/SF100/store_sales/1_25_0.parquet is not a Parquet file (too small)
org.apache.parquet.hadoop.ParquetFileReader.readFooter():433
org.apache.parquet.hadoop.ParquetFileReader.readFooter():412
org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v2():312
org.apache.drill.exec.store.parquet.Metadata.access$000():74
org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner():282
org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner():270
org.apache.drill.exec.store.TimedRunnable.run():56
org.apache.drill.exec.store.TimedRunnable$LatchedRunnable.run():98
java.util.concurrent.Executors$RunnableAdapter.call():511
java.util.concurrent.FutureTask.run():266
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
{noformat}
> Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
> ------------------------------------------------------------------------------------------------
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.2.0
> Reporter: Victoria Markman
> Assignee: Jacques Nadeau
> Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when hitting OOM condition.
> Query runs out of memory during execution and remains in "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO o.a.d.e.w.f.FragmentStatusReporter - 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 took longer than 500ms. Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server). Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory
> at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.7.0_71]
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_71]
> at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.reallocate(PoolArena.java:280) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:110) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.WrappedByteBuf.writeBytes(WrappedByteBuf.java:600) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.UnsafeDirectLittleEndian.writeBytes(UnsafeDirectLittleEndian.java:28) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:4.0.27.Final]
> at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> ... 11 common frames omitted
> 2015-08-26 16:56:31,087 [BitServer-5] INFO o.a.d.exec.rpc.ProtobufLengthDecoder - Channel is closed, discarding remaining 124958 byte(s) in buffer.
> 2015-08-26 16:56:31,087 [BitClient-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:54554 <--> /10.10.88.133:31012 (data client). Closing connection.
> java.io.IOException: syscall:read(...)() failed: Connection reset by peer
> 2015-08-26 16:56:31,088 [BitClient-1] INFO o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.133:54554 <--> /10.10.88.133:31012.
> 2015-08-26 16:56:35,325 [BitServer-6] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:31012 <--> /10.10.88.133:54555 (data server). Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory
> at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> {code}
> Attached:
> query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
> drillbit.log
> jstack.txt
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)