You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2016/01/04 21:23:39 UTC

[jira] [Updated] (DRILL-4190) Don't hold on to batches from left side of merge join

     [ https://issues.apache.org/jira/browse/DRILL-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victoria Markman updated DRILL-4190:
------------------------------------
    Summary: Don't hold on to batches from left side of merge join  (was: TPCDS queries are running out of memory when hash join is disabled)

> Don't hold on to batches from left side of merge join
> -----------------------------------------------------
>
>                 Key: DRILL-4190
>                 URL: https://issues.apache.org/jira/browse/DRILL-4190
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.3.0, 1.4.0, 1.5.0
>            Reporter: Victoria Markman
>            Assignee: amit hadke
>            Priority: Blocker
>         Attachments: 2990f5f8-ec64-1223-c1d8-97dd7e601cee.sys.drill, exception.log, query3.sql
>
>
> TPCDS queries with the latest 1.4.0 release when hash join is disabled:
> 22 queries fail with out of memory 
> 2 wrong results (I did not validate the nature of wrong result yet)
> Only query97.sql is a legitimate failure: we don't support full outer join with the merge join.
> It is important to understand what has changed between 1.2.0 and 1.4.0 that made these tests not runnable with the same configuration. 
> Same tests with the same drill configuration pass in 1.2.0 release.
> (I hope I did not make a mistake somewhere in my cluster setup :))
> {code}
> 0: jdbc:drill:schema=dfs> select * from sys.version;
> +-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> |     version     |                 commit_id                 |                           commit_message                            |        commit_time         | build_email  |         build_time         |
> +-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 Add a precondition for size of merge join record batch.  | 08.12.2015 @ 01:25:34 UTC  | Unknown      | 08.12.2015 @ 03:36:25 UTC  |
> +-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> 1 row selected (2.211 seconds)
> Execution Failures:
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query50.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query33.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query74.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query68.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query34.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query21.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query46.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query91.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query59.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query3.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query84.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query97.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query19.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query96.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query43.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query2.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query60.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query79.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query73.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query45.sql
> Verification Failures
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query52.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query40.sql
> Timeout Failures
> ----------------------------------------------------------------------------------------------------------------
> Passing tests: 3
> Execution Failures: 22
> VerificationFailures: 2
> Timeouts: 0
> Canceled: 0
> {code}
> {code}
> 0: jdbc:drill:schema=dfs> select * from sys.version;
> +-----------+----------------+-------------+-------------+------------+
> | commit_id | commit_message | commit_time | build_email | build_time |
> +-----------+----------------+-------------+-------------+------------+
> | f1100a79b4e4fbb1b58b35b0230edff137588777 | DRILL-3947: Use setSafe() for date, time, timestamp types while populating pruning vector (other types were already using setSafe). | 19.10.2015 @ 16:02:00 UTC | Unknown | 19.10.2015 @ 16:25:21 UTC |
> +-----------+----------------+-------------+-------------+------------+
> 1 row selected (2.79 seconds)
> PASS (1.543 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query68.sql (connection: 1681915178)
> PASS (29.36 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query34.sql (connection: 1681915178)
> PASS (3.311 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query21.sql (connection: 1681915178)
> PASS (1.447 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query46.sql (connection: 1681915178)
> PASS (34.53 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query76.sql (connection: 1681915178)
> PASS (47.13 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query91.sql (connection: 1681915178)
> PASS (1.151 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query59.sql (connection: 1681915178)
> PASS (32.29 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query3.sql (connection: 1681915178)
> PASS (1.939 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql (connection: 1681915178)
> PASS (19.26 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query84.sql (connection: 1681915178)
> PASS (1.243 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query40.sql (connection: 1681915178)
> [#37] Query failed:
> oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalArgumentException: Full outer join not currently supported
> [Error Id: 9a400ac2-3f1d-428c-9dc6-5f556cb520aa on atsqa4-133.qa.lab:31010]
>         at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
>         at oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:110)
>         at oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
>         at oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
>         at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61)
>         at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233)
>         at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205)
>         at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>         at oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>         at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>         at oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>         at oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>         at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>         at oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>         at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>         at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Thread.java:745)
> EXECUTION_FAILURE (2.814 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query97.sql (connection: 1681915178)
> PASS (57.04 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query19.sql (connection: 1681915178)
> PASS (24.01 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query96.sql (connection: 1681915178)
> PASS (28.77 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query43.sql (connection: 1681915178)
> PASS (1.833 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query93.sql (connection: 1681915178)
> PASS (38.84 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql (connection: 1681915178)
> PASS (55.82 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query2.sql (connection: 1681915178)
> PASS (1.308 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query60.sql (connection: 1681915178)
> PASS (1.116 min) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query79.sql (connection: 1681915178)
> PASS (27.79 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query73.sql (connection: 1681915178)
> PASS (39.85 s) /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query45.sql (connection: 1681915178)
> {code}
> *Cluster configuration:*
> - 4 nodes
> - 48 GB direct memory
> - 10GB memory allocated to sort
> - timeout setup for the framework = 600 seconds
> - queries were executed one at a time
> *System settings:*
> {code}
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like '%CHANGED%';
> +-------------------------------------------+----------+---------+----------+--------------+-------------+-----------+------------+
> |                   name                    |   kind   |  type   |  status  |   num_val    | string_val  | bool_val  | float_val  |
> +-------------------------------------------+----------+---------+----------+--------------+-------------+-----------+------------+
> | planner.enable_decimal_data_type          | BOOLEAN  | SYSTEM  | CHANGED  | null         | null        | true      | null       |
> | planner.enable_hashjoin                   | BOOLEAN  | SYSTEM  | CHANGED  | null         | null        | false     | null       |
> | planner.memory.max_query_memory_per_node  | LONG     | SYSTEM  | CHANGED  | 10737418240  | null        | null      | null       |
> +-------------------------------------------+----------+---------+----------+--------------+-------------+-----------+------------+
> 3 rows selected (3.464 seconds)
> {code}
> TPCDS queries that were executed from the public test framework: 
> ./run.sh -s Advanced/tpcds/tpcds_sf100/original -g smoke -t 600
> More details shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)