You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (Jira)" <ji...@apache.org> on 2019/09/09 23:47:00 UTC

[jira] [Commented] (DRILL-7170) IllegalStateException: Record count not set for this vector container

    [ https://issues.apache.org/jira/browse/DRILL-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926184#comment-16926184 ] 

Boaz Ben-Zvi commented on DRILL-7170:
-------------------------------------

    This failure happens at the end of the build phase for the Hash Join, after all the build side input was read (and possibly some partitions have spilled), then last the hash-tables (and HJ helpers) are created one at a time, for each in-memory partition. As these creations consume more memory, we may run our of memory before all the in-memory partitions got their hash tables. So before each such partition is handled, the available memory is checked (using the postBuildCalc) and in case too little memory is left, this in-memory partition is spilled (to free memory for the hash tables of the other partitions).

    Looks like in this case the check ( {{postBuildCalc.shouldSpill()}} ) erroneously passed, so instead of spilling, the hash-table build begun, but later OOMed.  This OOM that happened mid-work had left some vector container allocated but not initialized, thus the error-message code that tried to print relevant information tried getting the record count from that (uninitialized) container and failed. 

   A possible *_work-around_*: Increase the "safety factor" of the memory calculator, thus triggering spills sooner and less likely to return {{false}} from {{postBuildCalc.shouldSpill()}} . The default setting is *1.0*, can try values like *1.5*, or *2.0*, etc. for the user configuration option, like: 

{{alter session set `exec.hashjoin.safety_factor` = 1.5}}

   The simplest *_code fix_* – catch this failure when the error message is prepared (and just print zero instead - around line 1105 in {{HashJoinBatch.java}}).

  Another fix - In {{getActualSize()}} in  {{HashTableTemplate.java}} - just return zero for any batchHolder whose VectorContainer is not initialize (i.e. {{false == hasRecordCount()}} ) .  (Seems that only the error message code calls {{getActualSize()}} ).

   A more advanced fix (in addition to the above) – In the case of the above OOM, catch that OOM, then clean up the partially built hash-table (and helper), and last spill that whole partition (to free more memory). This is a workaround for the wrong choice made by {{postBuildCalc.shouldSpill()}} . But implementing this fix would require more testing.

 

 

> IllegalStateException: Record count not set for this vector container
> ---------------------------------------------------------------------
>
>                 Key: DRILL-7170
>                 URL: https://issues.apache.org/jira/browse/DRILL-7170
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>            Reporter: Sorabh Hamirwasia
>            Priority: Major
>             Fix For: 1.17.0
>
>
> {code:java}
> Query: /root/drillAutomation/master/framework/resources/Advanced/tpcds/tpcds_sf1/original/maprdb/json/query95.sql
> WITH ws_wh AS
> (
> SELECT ws1.ws_order_number,
> ws1.ws_warehouse_sk wh1,
> ws2.ws_warehouse_sk wh2
> FROM   web_sales ws1,
> web_sales ws2
> WHERE  ws1.ws_order_number = ws2.ws_order_number
> AND    ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
> SELECT
> Count(DISTINCT ws_order_number) AS `order count` ,
> Sum(ws_ext_ship_cost)           AS `total shipping cost` ,
> Sum(ws_net_profit)              AS `total net profit`
> FROM     web_sales ws1 ,
> date_dim ,
> customer_address ,
> web_site
> WHERE    d_date BETWEEN '2000-04-01' AND      (
> Cast('2000-04-01' AS DATE) + INTERVAL '60' day)
> AND      ws1.ws_ship_date_sk = d_date_sk
> AND      ws1.ws_ship_addr_sk = ca_address_sk
> AND      ca_state = 'IN'
> AND      ws1.ws_web_site_sk = web_site_sk
> AND      web_company_name = 'pri'
> AND      ws1.ws_order_number IN
> (
> SELECT ws_order_number
> FROM   ws_wh)
> AND      ws1.ws_order_number IN
> (
> SELECT wr_order_number
> FROM   web_returns,
> ws_wh
> WHERE  wr_order_number = ws_wh.ws_order_number)
> ORDER BY count(DISTINCT ws_order_number)
> LIMIT 100
> Exception:
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Record count not set for this vector container
> Fragment 2:3
> Please, refer to logs for more information.
> [Error Id: 4ed92fce-505b-40ba-ac0e-4a302c28df47 on drill87:31010]
>   (java.lang.IllegalStateException) Record count not set for this vector container
>     org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState():459
>     org.apache.drill.exec.record.VectorContainer.getRecordCount():394
>     org.apache.drill.exec.record.RecordBatchSizer.<init>():720
>     org.apache.drill.exec.record.RecordBatchSizer.<init>():704
>     org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():462
>     org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():964
>     org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():973
>     org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():601
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():1313
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():1105
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():525
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.record.AbstractRecordBatch.next():126
>     org.apache.drill.exec.record.AbstractRecordBatch.next():116
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.record.AbstractRecordBatch.next():126
>     org.apache.drill.exec.test.generated.HashAggregatorGen1068899.doWork():642
>     org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():296
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.record.AbstractRecordBatch.next():126
>     org.apache.drill.exec.record.AbstractRecordBatch.next():116
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1669
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
> 	at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538)
> 	at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:642)
> 	at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
> 	at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:148)
> 	at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:254)
> 	at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: Record count not set for this vector container
> Fragment 2:3
> Please, refer to logs for more information.
> [Error Id: 4ed92fce-505b-40ba-ac0e-4a302c28df47 on drill87:31010]
>   (java.lang.IllegalStateException) Record count not set for this vector container
>     org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState():459
>     org.apache.drill.exec.record.VectorContainer.getRecordCount():394
>     org.apache.drill.exec.record.RecordBatchSizer.<init>():720
>     org.apache.drill.exec.record.RecordBatchSizer.<init>():704
>     org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():462
>     org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():964
>     org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():973
>     org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():601
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():1313
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():1105
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():525
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.record.AbstractRecordBatch.next():126
>     org.apache.drill.exec.record.AbstractRecordBatch.next():116
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.record.AbstractRecordBatch.next():126
>     org.apache.drill.exec.test.generated.HashAggregatorGen1068899.doWork():642
>     org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():296
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.record.AbstractRecordBatch.next():126
>     org.apache.drill.exec.record.AbstractRecordBatch.next():116
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1669
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
> 	at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
> 	at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
> 	at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
> 	at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
> 	at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
> 	at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> 	at oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> 	at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> 	at oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
> 	at oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> 	at oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> 	at oadd.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> 	at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> 	at oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
> 	at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> 	at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
> 	at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> 	at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> 	at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> 	at oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
> 	... 1 more
> Caused by: java.lang.IllegalStateException: Record count not set for this vector container
> 	at org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState(Preconditions.java:459)
> 	at org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
> 	at org.apache.drill.exec.record.RecordBatchSizer.<init>(RecordBatchSizer.java:720)
> 	at org.apache.drill.exec.record.RecordBatchSizer.<init>(RecordBatchSizer.java:704)
> 	at org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize(HashTableTemplate.java:462)
> 	at org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize(HashTableTemplate.java:964)
> 	at org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString(HashTableTemplate.java:973)
> 	at org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString(HashPartition.java:601)
> 	at org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString(HashJoinBatch.java:1313)
> 	at org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:1105)
> 	at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:525)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
> 	at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> 	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
> 	at org.apache.drill.exec.test.generated.HashAggregatorGen1068899.doWork(HashAggTemplate.java:642)
> 	at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:296)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
> 	at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> 	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
> 	at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:296)
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:283)
> 	at .......(:0)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
> 	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> 	at .......(:0)
> {code}
> When HashJoinBatch hits OOM condition while executing build phase then it catches OOMException and try to generate debugString which internally causes this IllegalStateException.
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java#L1105



--
This message was sent by Atlassian Jira
(v8.3.2#803003)