You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yucai (JIRA)" <ji...@apache.org> on 2018/07/17 10:40:00 UTC

[jira] [Commented] (SPARK-24832) Improve inputMetrics's bytesRead update for ColumnarBatch

    [ https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546371#comment-16546371 ] 

yucai commented on SPARK-24832:
-------------------------------

Currently, ColumnarBatch's bytesRead need to be updated for every 4096 * 1000 rows, which makes the metrics out of date.
Can we update it for each batch?

{code:java}
if (nextElement.isInstanceOf[ColumnarBatch]) {
inputMetrics.incRecordsRead(nextElement.asInstanceOf[ColumnarBatch].numRows())
} else {
inputMetrics.incRecordsRead(1)
}
if (inputMetrics.recordsRead % SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0) {
updateBytesRead()
}
{code}

> Improve inputMetrics's bytesRead update for ColumnarBatch
> ---------------------------------------------------------
>
>                 Key: SPARK-24832
>                 URL: https://issues.apache.org/jira/browse/SPARK-24832
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.3.1
>            Reporter: yucai
>            Priority: Major
>
> Improve inputMetrics's bytesRead update for ColumnarBatch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org