You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Andrew Sherman (JIRA)" <ji...@apache.org> on 2017/10/06 17:10:01 UTC

[jira] [Assigned] (HIVE-17572) Warnings from SparkCrossProductCheck for MapJoins are confusing

     [ https://issues.apache.org/jira/browse/HIVE-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Sherman reassigned HIVE-17572:
-------------------------------------

    Assignee: Andrew Sherman

> Warnings from SparkCrossProductCheck for MapJoins are confusing
> ---------------------------------------------------------------
>
>                 Key: HIVE-17572
>                 URL: https://issues.apache.org/jira/browse/HIVE-17572
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Andrew Sherman
>
> When the {{SparkCrossProductCheck}} detects a cross-product in a map-join, it prints out a confusing warning - e.g. {{Map Join MAPJOIN\[9\]\[bigTable=?\] in task 'Stage-1:MAPRED' is a cross product}}
> I see a few ways this can be imrpoved:
> * {{bigTable}} should actually specify the big table
> * I'm not sure why the stage id is printed instead of the work id, when a cross product is detected in a shuffle join the work id is shown (e.g. {{Warning: Shuffle Join JOIN\[13\]\[tables = \[$hdt$_1, $hdt$_2, $hdt$_0\]\] in Work 'Reducer 3' is a cross product}})
> * It shouldn't say {{MAPRED}} that can be confusing to users
> * The {{MAPJOIN}} id doesn't need to be printed, it doesn't have any meaning to the user and the value just keeps on going up and up the longer a session lives
> On a somewhat related note, could we just stick this warning in the explain plan? Otherwise users may not even notice it



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)