You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2017/10/28 12:12:02 UTC

[jira] [Commented] (HIVE-17767) Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN

    [ https://issues.apache.org/jira/browse/HIVE-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223457#comment-16223457 ] 

Hive QA commented on HIVE-17767:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894448/HIVE-17767.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 55 failed/errored test(s), 11341 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=245)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_partitioner] (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_12] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_3] (batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_4] (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[semijoin5] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists_having] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_in_having] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_unqualcolumnrefs] (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=77)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_2] (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in] (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select] (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=164)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_partitioner] (batchId=174)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=173)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_corr_in_agg] (batchId=91)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_in_implicit_gby] (batchId=90)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_exists] (batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] (batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_notin] (batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar] (batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_select] (batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_views] (batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=137)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query10] (batchId=244)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query16] (batchId=244)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query35] (batchId=244)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query69] (batchId=244)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query94] (batchId=244)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query10] (batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query16] (batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] (batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query35] (batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query69] (batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query94] (batchId=242)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testCancelRenewTokenFlow (batchId=243)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testConnection (batchId=243)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testIsValid (batchId=243)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testIsValidNeg (batchId=243)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testNegativeProxyAuth (batchId=243)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testNegativeTokenAuth (batchId=243)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testProxyAuth (batchId=243)
org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testTokenAuth (batchId=243)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7532/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7532/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7532/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 55 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12894448 - PreCommit-HIVE-Build

> Rewrite correlated EXISTS/IN subqueries into LEFT SEMI JOIN
> -----------------------------------------------------------
>
>                 Key: HIVE-17767
>                 URL: https://issues.apache.org/jira/browse/HIVE-17767
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>         Attachments: HIVE-17767.1.patch
>
>
> Currently such queries are written into group by + inner join with value generator and is inefficient. Value generator consists of join with outer query to fetch all correlated values. This value generator could be completely eliminated if such queries are instead rewritten into LEFT SEMI JOIN.
> Note that to do this first hive need to support LEFT SEMI JOIN with non-equi condition (HIVE-17766).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)