You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "du (JIRA)" <ji...@apache.org> on 2018/08/25 06:55:00 UTC
[jira] [Created] (SPARK-25237) FileScanRdd's inputMetrics is wrong
when select the datasource table with limit
du created SPARK-25237:
--------------------------
Summary: FileScanRdd's inputMetrics is wrong when select the datasource table with limit
Key: SPARK-25237
URL: https://issues.apache.org/jira/browse/SPARK-25237
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.3.1, 2.2.2
Reporter: du
In FileScanRdd, we will update inputMetrics's bytesRead using updateBytesRead every 1000 rows or when close the iterator.
but when close the iterator, we will invoke updateBytesReadWithFileSize to increase the inputMetrics's bytesRead with file's length.
this will result in the inputMetrics's bytesRead is wrong when run the query with limit such as select * from table limit 1。
because we do not support for Hadoop 2.5 and earlier now, we always get the bytesRead from Hadoop FileSystem statistics other than files's length.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org