You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuanjian Li (JIRA)" <ji...@apache.org> on 2018/12/17 16:04:00 UTC

[jira] [Commented] (SPARK-26223) Scan: track metastore operation time

    [ https://issues.apache.org/jira/browse/SPARK-26223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723111#comment-16723111 ] 

Yuanjian Li commented on SPARK-26223:
-------------------------------------

The usage of externalCatalog in `SessionCatalog` and the interface of `ExternalCatalog` are clear clues for this issue. Most interfaces in ExternalCatalog used in DDL, listing all scenario for metastore operations relative of Scan below:
 # getTable: called by analyzing rule ResolveRelation's lookupRelation.
 # listPartitions:
1. Called by execution stage about HiveTableScanExec during getting raw Partitions.
2. Called by optimize rule OptimizeMetadataOnlyQuery's replaceTableScanWithPartitionMetadata.
3. Called by HiveMetastoreCtalog.convertToLogicalRelation when lazy pruning is disabled, the entrance of this scenario is the analysis rule RelationConversions of hive analyzer.
 # listPartitionsByFilter:
1. Same with 2.1
2. Same with 2.2
3. Called by CatalogFileIndex, currently, we address this meta store operation time by adding in file listing([discussion link|https://github.com/apache/spark/pull/23327#discussion_r242076144]), will split in this PR.

We can address all this scenario by appending phase to a new-added array buffer in the `CatalogTable` parameter list and dump the phase to metrics in scan node.

> Scan: track metastore operation time
> ------------------------------------
>
>                 Key: SPARK-26223
>                 URL: https://issues.apache.org/jira/browse/SPARK-26223
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Reynold Xin
>            Priority: Major
>
> The Scan node should report how much time it spent in metastore operations. Similar to file listing, would be great to also report start and end time for constructing a timeline.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org