You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sahil Takiar (JIRA)" <ji...@apache.org> on 2018/02/06 06:09:00 UTC

[jira] [Commented] (HIVE-18368) Improve Spark Debug RDD Graph

    [ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353439#comment-16353439 ] 

Sahil Takiar commented on HIVE-18368:
-------------------------------------

[~lirui] sorry for the delay. Updated the RB, addressed comments.

[~xuefuz] true, but the UI also displays the edge-type so it should be pretty easy to figure out what tran object is being used.

> Improve Spark Debug RDD Graph
> -----------------------------
>
>                 Key: HIVE-18368
>                 URL: https://issues.apache.org/jira/browse/HIVE-18368
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: Completed Stages.png, HIVE-18368.1.patch, HIVE-18368.2.patch, HIVE-18368.3.patch, HIVE-18368.4.patch, Job Ids.png, Stage DAG 1.png, Stage DAG 2.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between different {{SparkTran}}, what shuffle types are used, and what trans are cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, RDDs, and BaseWorks. Edge should include information about number of partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)