You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/04/19 02:30:00 UTC

[jira] [Commented] (KYLIN-5121) Make JobMetricsUtils.collectMetrics be working again

    [ https://issues.apache.org/jira/browse/KYLIN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524008#comment-17524008 ] 

ASF subversion and git services commented on KYLIN-5121:
--------------------------------------------------------

Commit dc3fd9ab25ecd03ec53f8855f5366db8de542ce5 in kylin's branch refs/heads/kylin-soft-affinity-local-cache from hujiahua
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=dc3fd9ab25 ]

[KYLIN-5121] Make JobMetricsUtils.collectMetrics be working again


> Make JobMetricsUtils.collectMetrics be working again
> ----------------------------------------------------
>
>                 Key: KYLIN-5121
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5121
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: hujiahua
>            Priority: Major
>
> At present, the rowCount needs to be eval after the cube built every time, and spark `QueryExecution` metric have `numOutputRows` metric for this purpose. But, after patch KYLIN-4662 (Migrate from third-party Spark to offical Apache Spark), the util function `JobMetricsUtils.collectMetrics` becomes out of working. Each rowCount needs to call `Dataset.count()`, which wastes resources and affects the cube build time.
> Here is my solution: Get the QueryExecution object based on custom QueryExecutionListener, and match the corresponding QueryExecution by comparing the output path. (BWT, The output path of cube id is always unique)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)