You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "du (JIRA)" <ji...@apache.org> on 2018/08/25 06:55:00 UTC

[jira] [Created] (SPARK-25237) FileScanRdd's inputMetrics is wrong when select the datasource table with limit

du created SPARK-25237:
--------------------------

             Summary: FileScanRdd's inputMetrics is wrong  when select the datasource table with limit
                 Key: SPARK-25237
                 URL: https://issues.apache.org/jira/browse/SPARK-25237
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.1, 2.2.2
            Reporter: du


In FileScanRdd, we will update inputMetrics's bytesRead using updateBytesRead  every 1000 rows or when close the iterator.

but when close the iterator,  we will invoke updateBytesReadWithFileSize to increase the inputMetrics's bytesRead with file's length.

this will result in the inputMetrics's bytesRead is wrong when run the query with limit such as select * from table limit 1。

because we do not support for Hadoop 2.5 and earlier now, we always get the bytesRead from  Hadoop FileSystem statistics other than files's length.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org