You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/11/20 03:33:00 UTC

[jira] [Commented] (DRILL-6861) Hash-Join: Spilled partitions are skipped following an empty probe side

    [ https://issues.apache.org/jira/browse/DRILL-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16692603#comment-16692603 ] 

ASF GitHub Bot commented on DRILL-6861:
---------------------------------------

Ben-Zvi opened a new pull request #1546: DRILL-6861: Hash-Join should not exit after an empty probe-side spilled partition
URL: https://github.com/apache/drill/pull/1546
 
 
   (The DIFF below is excessive due to indentation; but only two "lines" where actually changed).
   
   The original code below handles the spilled partitions by getting the next one (as build+probe "incoming"), and then recursively calling *innerNext()*.  Each recursive call would end by (recursively) processing the next spilled partition.
   
      The DRILL-6755 change (line 556) caused an early exit when the probe side was empty. Thus in special cases when a probe-side of a spilled partition was empty, this change would terminate the recursive chain early, thus not processing the remaining partitions.
   
      The fix: When an empty probe-side spilled partition is seen - skip and process the next partition. A new check was added (line 624), and the code was altered into a "while" loop (line 618) to allow skipping. 
   
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Hash-Join: Spilled partitions are skipped following an empty probe side
> -----------------------------------------------------------------------
>
>                 Key: DRILL-6861
>                 URL: https://issues.apache.org/jira/browse/DRILL-6861
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.14.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>            Priority: Blocker
>             Fix For: 1.15.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
>      Following DRILL-6755 (_Avoid building a hash table when the probe side is empty_) - The special case of an empty spilled probe-partition was not handled.  When such a case happens, the Hash-Join terminates early (returns NONE) and the remaining partitions are not processed/returned (which may lead to incorrect results).
>   A test case - force tpcds/query95 to spill (sf1) :
> {code:java}
> 0: jdbc:drill:zk=local> alter system set `exec.hashjoin.max_batches_in_memory` = 40;
> +-------+-----------------------------------------------+
> |  ok   |                    summary                    |
> +-------+-----------------------------------------------+
> | true  | exec.hashjoin.max_batches_in_memory updated.  |
> +-------+-----------------------------------------------+
> 1 row selected (1.325 seconds)
> 0: jdbc:drill:zk=local> WITH ws_wh AS
> . . . . . . . . . . . > (
> . . . . . . . . . . . >        SELECT ws1.ws_order_number,
> . . . . . . . . . . . >               ws1.ws_warehouse_sk wh1,
> . . . . . . . . . . . >               ws2.ws_warehouse_sk wh2
> . . . . . . . . . . . >        FROM   dfs.`/data/tpcds/sf1/parquet/web_sales` ws1,
> . . . . . . . . . . . >               dfs.`/data/tpcds/sf1/parquet/web_sales` ws2
> . . . . . . . . . . . >        WHERE  ws1.ws_order_number = ws2.ws_order_number
> . . . . . . . . . . . >        AND    ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
> . . . . . . . . . . . > SELECT
> . . . . . . . . . . . >          Count(DISTINCT ws1.ws_order_number) AS `order count` ,
> . . . . . . . . . . . >          Sum(ws1.ws_ext_ship_cost)           AS `total shipping cost` ,
> . . . . . . . . . . . >          Sum(ws1.ws_net_profit)              AS `total net profit`
> . . . . . . . . . . . > FROM     dfs.`/data/tpcds/sf1/parquet/web_sales` ws1 ,
> . . . . . . . . . . . >          dfs.`/data/tpcds/sf1/parquet/date_dim` dd,
> . . . . . . . . . . . >          dfs.`/data/tpcds/sf1/parquet/customer_address` ca,
> . . . . . . . . . . . >          dfs.`/data/tpcds/sf1/parquet/web_site` wbst
> . . . . . . . . . . . > WHERE    dd.d_date BETWEEN '2000-04-01' AND      (
> . . . . . . . . . . . >                   Cast('2000-04-01' AS DATE) + INTERVAL '60' day)
> . . . . . . . . . . . > AND      ws1.ws_ship_date_sk = dd.d_date_sk
> . . . . . . . . . . . > AND      ws1.ws_ship_addr_sk = ca.ca_address_sk
> . . . . . . . . . . . > AND      ca.ca_state = 'IN'
> . . . . . . . . . . . > AND      ws1.ws_web_site_sk = wbst.web_site_sk
> . . . . . . . . . . . > AND      wbst.web_company_name = 'pri'
> . . . . . . . . . . . > AND      ws1.ws_order_number IN
> . . . . . . . . . . . >          (
> . . . . . . . . . . . >                 SELECT ws_wh.ws_order_number
> . . . . . . . . . . . >                 FROM   ws_wh)
> . . . . . . . . . . . > AND      ws1.ws_order_number IN
> . . . . . . . . . . . >          (
> . . . . . . . . . . . >                 SELECT wr.wr_order_number
> . . . . . . . . . . . >                 FROM   dfs.`/data/tpcds/sf1/parquet/web_returns` wr,
> . . . . . . . . . . . >                        ws_wh
> . . . . . . . . . . . >                 WHERE  wr.wr_order_number = ws_wh.ws_order_number)
> . . . . . . . . . . . > ORDER BY count(DISTINCT ws1.ws_order_number)
> . . . . . . . . . . . > LIMIT 100;
> +--------------+----------------------+---------------------+
> | order count  | total shipping cost  |  total net profit   |
> +--------------+----------------------+---------------------+
> | 17           | 38508.130000000005   | 20822.3           |
> +--------------+----------------------+---------------------+
> 1 row selected (105.621 seconds)
> {code}
> The correct results should be:
> {code:java}
> +--------------+----------------------+---------------------+
> | order count  | total shipping cost  |  total net profit   |
> +--------------+----------------------+---------------------+
> | 34           | 63754.72             | 15919.009999999998  |
> +--------------+----------------------+---------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)