You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yucai (JIRA)" <ji...@apache.org> on 2018/07/17 10:40:00 UTC
[jira] [Commented] (SPARK-24832) Improve inputMetrics's bytesRead
update for ColumnarBatch
[ https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546371#comment-16546371 ]
yucai commented on SPARK-24832:
-------------------------------
Currently, ColumnarBatch's bytesRead need to be updated for every 4096 * 1000 rows, which makes the metrics out of date.
Can we update it for each batch?
{code:java}
if (nextElement.isInstanceOf[ColumnarBatch]) {
inputMetrics.incRecordsRead(nextElement.asInstanceOf[ColumnarBatch].numRows())
} else {
inputMetrics.incRecordsRead(1)
}
if (inputMetrics.recordsRead % SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0) {
updateBytesRead()
}
{code}
> Improve inputMetrics's bytesRead update for ColumnarBatch
> ---------------------------------------------------------
>
> Key: SPARK-24832
> URL: https://issues.apache.org/jira/browse/SPARK-24832
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.3.1
> Reporter: yucai
> Priority: Major
>
> Improve inputMetrics's bytesRead update for ColumnarBatch
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org