You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Chinna Rao Lalam (JIRA)" <ji...@apache.org> on 2015/04/08 20:30:12 UTC

[jira] [Updated] (HIVE-8858) Visualize generated Spark plan [Spark Branch]

     [ https://issues.apache.org/jira/browse/HIVE-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chinna Rao Lalam updated HIVE-8858:
-----------------------------------
    Attachment: HIVE-8858.1-spark.patch

Hi [~xuefuz],[~csun],

I have reworked on this patch. Now output will look like this.. Please check this.

{quote}
FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
UNION ALL 
select s2.key as key, s2.value as value from src s2) unionsrc
INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key
INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key, unionsrc.value
{quote}

!!!!!!!!!!!!!!!!!!!!!!!!!! Spark Plan !!!!!!!!!!!!!!!!!!!!!!!!!!! 

 	Reduce 3 <-- ( Shuffle 13 ( Partitions 1, SortBy, Cache OFF )  <-- ( Reduce 9,Reduce 11,Reduce 18,Reduce 14 ) 
 	Reduce 9 <-- ( Shuffle 7 ( Partitions 2, SortBy, Cache OFF )  <-- ( MapTran 17,MapTran 2 )  <-- ( MapInput 26 (cache off) ,MapInput 24 (cache off)  ) 
 	Reduce 11 <-- ( Shuffle 16 ( Partitions 2, SortBy, Cache OFF )  <-- ( MapTran 8,MapTran 6 )  <-- ( MapInput 19 (cache off) ,MapInput 23 (cache off)  ) 
 	Reduce 14 <-- ( Shuffle 4 ( Partitions 2, SortBy, Cache OFF )  <-- ( MapTran 1,MapTran 12 )  <-- ( MapInput 25 (cache off) ,MapInput 21 (cache off)  ) 
 	Reduce 18 <-- ( Shuffle 10 ( Partitions 2, SortBy, Cache OFF )  <-- ( MapTran 5,MapTran 15 )  <-- ( MapInput 20 (cache off) ,MapInput 22 (cache off)  ) 

!!!!!!!!!!!!!!!!!!!!!!!!!! Spark Plan !!!!!!!!!!!!!!!!!!!!!!!!!!!

{quote}
select * from	
( 
select a.key, a.val as val1, b.val as val2 from T1 a join T2 b on a.key = b.key
union all 
select a.key, a.val as val1, b.val as val2 from T1 a join T2 b on a.key = b.key
) subq1
ORDER BY key, val1, val2;
{quote}

!!!!!!!!!!!!!!!!!!!!!!!!!! Spark Plan !!!!!!!!!!!!!!!!!!!!!!!!!!! 

	Reduce 2 <-- ( Shuffle 8 ( Partitions 1, GroupBy, Cache ON )  <-- ( MapTran 3 )  <-- ( MapInput 11 (cache off)  ) 
	Reduce 4 <-- ( Shuffle 1 ( Partitions 2, SortBy, Cache OFF )  <-- ( MapTran 7,Reduce 2 )  <-- ( MapInput 12 (cache off)  ) 
	Reduce 5 <-- ( Shuffle 8 ( Partitions 1, GroupBy, Cache ON )  <-- ( MapTran 3 )  <-- ( MapInput 11 (cache off)  ) 
	Reduce 6 <-- ( Shuffle 9 ( Partitions 2, SortBy, Cache OFF )  <-- ( MapTran 10,Reduce 5 )  <-- ( MapInput 12 (cache off)  ) 

!!!!!!!!!!!!!!!!!!!!!!!!!! Spark Plan !!!!!!!!!!!!!!!!!!!!!!!!!!! 

Patch need to clean up. I will upload final patch.

> Visualize generated Spark plan [Spark Branch]
> ---------------------------------------------
>
>                 Key: HIVE-8858
>                 URL: https://issues.apache.org/jira/browse/HIVE-8858
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-8858-spark.patch, HIVE-8858.1-spark.patch
>
>
> The spark plan generated by SparkPlanGenerator contains info which isn't available in Hive's explain plan, such as RDD caching. Also, the graph is slight different from orignal SparkWork. Thus, it would be nice to visualize the plan as is done for SparkWork.
> Preferrably, the visualization can happen as part of Hive explain extended. If not feasible, we at least can log this at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)