You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2018/09/04 02:05:21 UTC

[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/22324

    [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in FileScanRDD

    ## What changes were proposed in this pull request?
    This pr removed the method `updateBytesReadWithFileSize` in `FileScanRDD` because it computes input metrics by file size supported in Hadoop 2.5 and earlier. The current Spark does not support the versions, so it causes wrong input metric numbers.
    
    This is rework from #22232.
    
    Closes #22232
    
    ## How was this patch tested?
    Added `FileSourceSuite` to tests this case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark pr22232-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22324
    
----
commit 0f75257b50a611e069d406da8d72225bb4e73b51
Author: dujunling <du...@...>
Date:   2018-08-25T06:20:35Z

    remove updateBytesReadWithFileSize because we use Hadoop FileSystem statistics to update the inputMetrics

commit 53dd42c1facebf97044afb22b1f0894ec209f3bb
Author: dujunling <du...@...>
Date:   2018-08-27T03:26:30Z

    add ut

commit 1c326466fbd24c432184be6e53afec93369970c1
Author: dujunling <du...@...>
Date:   2018-08-27T03:33:46Z

    ut

commit 510d729b0ed6f83b05a3b0f06c2631163d62ef1a
Author: Takeshi Yamamuro <ya...@...>
Date:   2018-09-04T01:47:59Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    **[Test build #95650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95650/testReport)** for PR 22324 at commit [`bc05a35`](https://github.com/apache/spark/commit/bc05a354e375dfb1df6a70a46f28b792f8567fc5).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22324#discussion_r214776872
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceSuite.scala ---
    @@ -0,0 +1,48 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd}
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +
    +class FileSourceSuite extends SharedSQLContext {
    +
    +  test("SPARK-25237 compute correct input metrics in FileScanRDD") {
    --- End diff --
    
    Shall we move this suite into `FileBasedDataSourceSuite`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95655/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    **[Test build #95645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95645/testReport)** for PR 22324 at commit [`510d729`](https://github.com/apache/spark/commit/510d729b0ed6f83b05a3b0f06c2631163d62ef1a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class FileSourceSuite extends SharedSQLContext `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    we can credit to multiple people now though :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22324#discussion_r215461695
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -473,6 +476,27 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  test("SPARK-25237 compute correct input metrics in FileScanRDD") {
    +    withTempPath { p =>
    +      val path = p.getAbsolutePath
    +      spark.range(1000).repartition(1).write.csv(path)
    +      val bytesReads = new mutable.ArrayBuffer[Long]()
    +      val bytesReadListener = new SparkListener() {
    +        override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
    +          bytesReads += taskEnd.taskMetrics.inputMetrics.bytesRead
    +        }
    +      }
    +      sparkContext.addSparkListener(bytesReadListener)
    +      try {
    +        spark.read.csv(path).limit(1).collect()
    +        sparkContext.listenerBus.waitUntilEmpty(1000L)
    +        assert(bytesReads.sum === 7860)
    --- End diff --
    
    yea, actually the file size is `3890`, but the hadoop API (`FileSystem.getAllStatistics
    ) reports that number (`3930`). I didn't look into the Hadoop code yet, so I don't get why. I'll dig into it later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22324#discussion_r215111327
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -473,6 +476,27 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  test("SPARK-25237 compute correct input metrics in FileScanRDD") {
    +    withTempPath { p =>
    +      val path = p.getAbsolutePath
    +      spark.range(1000).repartition(1).write.csv(path)
    +      val bytesReads = new mutable.ArrayBuffer[Long]()
    +      val bytesReadListener = new SparkListener() {
    +        override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
    +          bytesReads += taskEnd.taskMetrics.inputMetrics.bytesRead
    +        }
    +      }
    +      sparkContext.addSparkListener(bytesReadListener)
    +      try {
    +        spark.read.csv(path).limit(1).collect()
    +        sparkContext.listenerBus.waitUntilEmpty(1000L)
    +        assert(bytesReads.sum === 7860)
    --- End diff --
    
    So the sum *should* be 10*2 + 90*3 + 900*4 = 3890. That's the size of the CSV file that's written too, when I try it locally. When I run this code without the change here, I get 7820+7820 = 15640. So this is better! but I wonder why it ends up thinking it reads about twice the bytes?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    **[Test build #95655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95655/testReport)** for PR 22324 at commit [`bc05a35`](https://github.com/apache/spark/commit/bc05a354e375dfb1df6a70a46f28b792f8567fc5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2812/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95650/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Merged to master/2.4


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    @srowen reworked cuz the author is inactive and can you check? (btw, it's ok that the credit of this commit goes to the original author.) 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    **[Test build #95645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95645/testReport)** for PR 22324 at commit [`510d729`](https://github.com/apache/spark/commit/510d729b0ed6f83b05a3b0f06c2631163d62ef1a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    **[Test build #95655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95655/testReport)** for PR 22324 at commit [`bc05a35`](https://github.com/apache/spark/commit/bc05a354e375dfb1df6a70a46f28b792f8567fc5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2824/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95645/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    oh, I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22324


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22324#discussion_r214783002
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceSuite.scala ---
    @@ -0,0 +1,48 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd}
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +
    +class FileSourceSuite extends SharedSQLContext {
    +
    +  test("SPARK-25237 compute correct input metrics in FileScanRDD") {
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22324#discussion_r215315904
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -473,6 +476,27 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  test("SPARK-25237 compute correct input metrics in FileScanRDD") {
    +    withTempPath { p =>
    +      val path = p.getAbsolutePath
    +      spark.range(1000).repartition(1).write.csv(path)
    +      val bytesReads = new mutable.ArrayBuffer[Long]()
    +      val bytesReadListener = new SparkListener() {
    +        override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
    +          bytesReads += taskEnd.taskMetrics.inputMetrics.bytesRead
    +        }
    +      }
    +      sparkContext.addSparkListener(bytesReadListener)
    +      try {
    +        spark.read.csv(path).limit(1).collect()
    +        sparkContext.listenerBus.waitUntilEmpty(1000L)
    +        assert(bytesReads.sum === 7860)
    --- End diff --
    
    In this test, Spark run with `local[2]` and each scan thread points to the same CSV file. Since each thread gets the file size thru Hadoop APIs, the total `byteRead` becomes 2 * the file size, IIUC.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22324#discussion_r215318249
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -473,6 +476,27 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  test("SPARK-25237 compute correct input metrics in FileScanRDD") {
    +    withTempPath { p =>
    +      val path = p.getAbsolutePath
    +      spark.range(1000).repartition(1).write.csv(path)
    +      val bytesReads = new mutable.ArrayBuffer[Long]()
    +      val bytesReadListener = new SparkListener() {
    +        override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
    +          bytesReads += taskEnd.taskMetrics.inputMetrics.bytesRead
    +        }
    +      }
    +      sparkContext.addSparkListener(bytesReadListener)
    +      try {
    +        spark.read.csv(path).limit(1).collect()
    +        sparkContext.listenerBus.waitUntilEmpty(1000L)
    +        assert(bytesReads.sum === 7860)
    --- End diff --
    
    7860/2=3930, 40 bytes more than expected, but I'm willing to believe there's a good reason for that somewhere in how it gets read. Clearly it's much better than the answer of 15640, so willing to believe this is fixing something.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2816/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    ping @srowen @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    **[Test build #95650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95650/testReport)** for PR 22324 at commit [`bc05a35`](https://github.com/apache/spark/commit/bc05a354e375dfb1df6a70a46f28b792f8567fc5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    If I find the reason why the numbers are different, I'll make pr in a new jira.
    https://github.com/apache/spark/pull/22324#discussion_r215461695


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2821/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org