You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2019/06/14 17:48:01 UTC

[jira] [Resolved] (SPARK-21882) OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function

     [ https://issues.apache.org/jira/browse/SPARK-21882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-21882.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0
                   2.4.4
                   2.3.4

Resolved by https://github.com/apache/spark/pull/24863

> OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-21882
>                 URL: https://issues.apache.org/jira/browse/SPARK-21882
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1, 2.2.0
>            Reporter: linxiaojun
>            Assignee: linxiaojun
>            Priority: Minor
>              Labels: bulk-closed
>             Fix For: 2.3.4, 2.4.4, 3.0.0
>
>         Attachments: SPARK-21882.patch
>
>
> The first job called from saveAsHadoopDataset, running in each executor, does not calculate the writtenBytes of OutputMetrics correctly (writtenBytes is 0). The reason is that we did not initialize the callback function called to find bytes written in the right way. As usual, statisticsTable which records statistics in a FileSystem must be initialized at the beginning (this will be triggered when open SparkHadoopWriter). The solution for this issue is to adjust the order of callback function initialization. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org