You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Szehon Ho (JIRA)" <ji...@apache.org> on 2014/11/04 01:22:33 UTC

[jira] [Resolved] (HIVE-8702) Extra MapTask created but not connected [Spark Branch]

     [ https://issues.apache.org/jira/browse/HIVE-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Szehon Ho resolved HIVE-8702.
-----------------------------
    Resolution: Invalid

Took another look.  So Suhas had wired up two resolvers that need to be enabled.  I had enabled only the first one (SparkMapJoinOptimizer).  There is a second one called SparkReduceSinkMapJoinProc that also needs to be wired.  Once its wired, the plan looks more appropriate.

> Extra MapTask created but not connected [Spark Branch]
> ------------------------------------------------------
>
>                 Key: HIVE-8702
>                 URL: https://issues.apache.org/jira/browse/HIVE-8702
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Szehon Ho
>
> Based on Szehon's observation, there is a strange extra maptask generated but not connected.  Here is the query to demonstrate:
> {code}
> select * FROM
> (SELECT avg(key) as x1, value as x2 FROM src group by value) x
> JOIN
> (SELECT avg(key) as y1, value as y2 FROM src group by value) y ON (x1 = y1)
> JOIN
> (SELECT avg(key) as z1, value as z2 FROM src group by value) z ON (x1 = z1);
> {code}
> We shouldn't generate it in the first place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)