You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2014/10/16 15:58:04 UTC

[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/2824

    [SPARK-3948][Shuffle]Fix stream corruption bug in sort-based shuffle

    Kernel 2.6.32 bug will lead to unexpected behavior of transferTo in copyStream, and this will corrupt the shuffle output file in sort-based shuffle, which will somehow introduce PARSING_ERROR(2), deserialization error or offset out of range. Here fix this by adding append flag, also add some position checking code. Details can be seen in [SPARK-3948](https://issues.apache.org/jira/browse/SPARK-3948).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-3948

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2824
    
----
commit b47cc9f8f32e01868288ec82eedc1f421a717a8b
Author: jerryshao <sa...@intel.com>
Date:   2014-10-16T08:45:51Z

    Fix kernel 2.6.32 bug led unexpected behavior of transferTo

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59688035
  
    Hi @JoshRosen , I just set `transferToEnabled` to false as default value, unless users explicitly set it to true, `transferTo` will not be enabled. 
    
    Currently, only `ExternalSorter` use this API as file to file copying and this is controlled by configuration `spark.file.transferTo`, other uses of `copyStream` in Spark code are all not file to file copying, so this parameter will not take effect.
    
    If future uses of `copyStream`, user have to get `transferToEnabled` from configuration, I add some usage notes here.  Still user can bypass `spark.file.transferTo` and directly set this parameter to true, but they have to be responsible for the correctness of usage.
    
    The reason I didn't take `SparkConf` as a parameter to control the behavior is that it should modify lots of the current codes to get `SparkConf` in which it calls `copyStream`.
    
    So what is your opinion? Thanks a lot.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r18957257
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -280,12 +280,29 @@ private[spark] object Utils extends Logging {
             // When both streams are File stream, use transferTo to improve copy performance.
             val inChannel = in.asInstanceOf[FileInputStream].getChannel()
             val outChannel = out.asInstanceOf[FileOutputStream].getChannel()
    +        val initialPos = outChannel.position()
             val size = inChannel.size()
     
             // In case transferTo method transferred less data than we have required.
             while (count < size) {
               count += inChannel.transferTo(count, size - count, outChannel)
             }
    +
    +        // Check the position after transferTo loop to see if it is in the right position and
    +        // give user information if not.
    +        // Position will not be increased to the expected length after calling transferTo in
    +        // kernel version 2.6.32, this issue can be seen in
    +        // scalastyle:off
    +        // https://bugs.openjdk.java.net/browse/JDK-7052359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel)
    --- End diff --
    
    I'm not sure, I found some code in KafkaUtils also use this `scalastyle:off` to turn off scalacheck.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r18957913
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -280,12 +280,29 @@ private[spark] object Utils extends Logging {
             // When both streams are File stream, use transferTo to improve copy performance.
             val inChannel = in.asInstanceOf[FileInputStream].getChannel()
             val outChannel = out.asInstanceOf[FileOutputStream].getChannel()
    +        val initialPos = outChannel.position()
             val size = inChannel.size()
     
             // In case transferTo method transferred less data than we have required.
             while (count < size) {
               count += inChannel.transferTo(count, size - count, outChannel)
             }
    +
    +        // Check the position after transferTo loop to see if it is in the right position and
    +        // give user information if not.
    +        // Position will not be increased to the expected length after calling transferTo in
    +        // kernel version 2.6.32, this issue can be seen in
    +        // scalastyle:off
    +        // https://bugs.openjdk.java.net/browse/JDK-7052359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel)
    --- End diff --
    
    I see, I didn't notice that line, I think that shall be good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59805036
  
    I've merged this into `master` and `branch-1.1`.
    
    > Thanks a lot :).
    
    Thank YOU (and @mridulm) for helping to diagnose this really subtle bug!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59475982
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21854/consoleFull) for   PR 2824 at commit [`a82b184`](https://github.com/apache/spark/commit/a82b18423f57c5f584d93d2702d710d1cde843c2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r19066582
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -272,20 +272,38 @@ private[spark] object Utils extends Logging {
       /** Copy all data from an InputStream to an OutputStream */
       def copyStream(in: InputStream,
                      out: OutputStream,
    -                 closeStreams: Boolean = false): Long =
    +                 closeStreams: Boolean = false,
    +                 transferToEnabled: Boolean = true): Long =
    --- End diff --
    
    Thanks Josh, I think I start to know your concern. My previous modification may not well control the behavior of `transferTo` in all uses of it, but this brings less impact to the current code path. If we'd like to control it in all the places, we have to pass and verify the configuration in all the places where use `copyStream`, which is not so elegant in some codes which have to to get SparkConf. I will think a better way to address this. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r18956465
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -280,12 +280,29 @@ private[spark] object Utils extends Logging {
             // When both streams are File stream, use transferTo to improve copy performance.
             val inChannel = in.asInstanceOf[FileInputStream].getChannel()
             val outChannel = out.asInstanceOf[FileOutputStream].getChannel()
    +        val initialPos = outChannel.position()
             val size = inChannel.size()
     
             // In case transferTo method transferred less data than we have required.
             while (count < size) {
               count += inChannel.transferTo(count, size - count, outChannel)
             }
    +
    +        // Check the position after transferTo loop to see if it is in the right position and
    +        // give user information if not.
    +        // Position will not be increased to the expected length after calling transferTo in
    +        // kernel version 2.6.32, this issue can be seen in
    +        // scalastyle:off
    +        // https://bugs.openjdk.java.net/browse/JDK-7052359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel)
    --- End diff --
    
    en...I guess this line will trigger scalastyle checker error


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59468593
  
    Maybe this is being overly-conservative, but could we add an undocumented configuration option that allows users to bypass the `transferTo` here?  If we ship 1.1.1 or 1.2 and this bug resurfaces, it would be nice if there was a way for users to revert back to the older version of this code without having to recompile.  A lot of users can't simply upgrade their kernel, so I think we need to offer an easy workaround for them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59368192
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21807/consoleFull) for   PR 2824 at commit [`e17ada2`](https://github.com/apache/spark/commit/e17ada2629b7ac0cc6e888cd1a0da6827159ea3b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2824


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59380188
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21807/consoleFull) for   PR 2824 at commit [`e17ada2`](https://github.com/apache/spark/commit/e17ada2629b7ac0cc6e888cd1a0da6827159ea3b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59468226
  
    @adrian-wang, Just so I understand - were you seeing the issue before applying this patch, and then the patch made it go away?
    
    @jerryshao Could we also have a branch-1.1 version of this that simply logs an error instead of throwing an exception (via `assert`)? I just want to be more conservative with the existing branch and not cause programs to terminate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r19066318
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -272,20 +272,38 @@ private[spark] object Utils extends Logging {
       /** Copy all data from an InputStream to an OutputStream */
       def copyStream(in: InputStream,
                      out: OutputStream,
    -                 closeStreams: Boolean = false): Long =
    +                 closeStreams: Boolean = false,
    +                 transferToEnabled: Boolean = true): Long =
    --- End diff --
    
    To be clear: I agree that `transferTo` should be enabled.  My concern here was simply whether setting `spark.file.transfterTo` will _always_ control the use of `transferTo`.  I'm fine having transferTo as the default, but I'd like it if there was an option to disable all uses of it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59691389
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21906/consoleFull) for   PR 2824 at commit [`be0533a`](https://github.com/apache/spark/commit/be0533a88f6b624629ac66cfeb9989337c002cfd).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59475885
  
    Hi @JoshRosen , I just add a configuration that can bypass the NIO way of copying stream. Would you mind taking a look at it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59367830
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21805/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59688531
  
    Thanks a lot :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59691398
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21906/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r19065507
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -272,20 +272,38 @@ private[spark] object Utils extends Logging {
       /** Copy all data from an InputStream to an OutputStream */
       def copyStream(in: InputStream,
                      out: OutputStream,
    -                 closeStreams: Boolean = false): Long =
    +                 closeStreams: Boolean = false,
    +                 transferToEnabled: Boolean = true): Long =
    --- End diff --
    
    IMHO, I think NIO way will gain much better performance, especially in shuffle. Though we met some issues with specific kernel version, mostly this exception will not be met, I think we should balance the trade-off between seldom met exception and performance gain. 
    
    If we hopes people to never use `transferTo`, why not directly remove this code path and configuration, this will be more straightforward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59483499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21854/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59461464
  
    The patch works fine for me, on my 2.6.32 cluster. Thanks! 
    @dbtsai  You should also try this, for your SPARK-3630 issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59469437
  
    @pwendell yes, I was suffering from running some certain queries over sort-base shuffle, just like the discussion in SPARK-3630. And with this patch the issue is gone. Thanks! -- My cluster is 4-node redhat 6.2, with kernel 2.6.32.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59686863
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21906/consoleFull) for   PR 2824 at commit [`be0533a`](https://github.com/apache/spark/commit/be0533a88f6b624629ac66cfeb9989337c002cfd).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r19034247
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -272,20 +272,38 @@ private[spark] object Utils extends Logging {
       /** Copy all data from an InputStream to an OutputStream */
       def copyStream(in: InputStream,
                      out: OutputStream,
    -                 closeStreams: Boolean = false): Long =
    +                 closeStreams: Boolean = false,
    +                 transferToEnabled: Boolean = true): Long =
    --- End diff --
    
    Should we change this default to `false`?  This would help us to guard against cases in which new code uses `copyStream` on File[Input|Output]Streams but the programmer forgot to read the configuration option.  I guess my concern is that I want to ensure that if `spark.file.transferTo` is false, then we are guaranteed to _never_ use the `transferTo` path in this function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59688343
  
    HI @jerryshao,
    
    Changing the default is exactly what I had in mind.  This looks good to me!  (Going to bed now; I'll merge this tomorrow and backport to `branch-1.1`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59380201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21807/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r19034004
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -272,20 +272,38 @@ private[spark] object Utils extends Logging {
       /** Copy all data from an InputStream to an OutputStream */
       def copyStream(in: InputStream,
                      out: OutputStream,
    -                 closeStreams: Boolean = false): Long =
    +                 closeStreams: Boolean = false,
    +                 transferToEnabled: Boolean = true): Long =
       {
         var count = 0L
         try {
    -      if (in.isInstanceOf[FileInputStream] && out.isInstanceOf[FileOutputStream]) {
    +      if (in.isInstanceOf[FileInputStream] && out.isInstanceOf[FileOutputStream]
    +        && transferToEnabled) {
             // When both streams are File stream, use transferTo to improve copy performance.
             val inChannel = in.asInstanceOf[FileInputStream].getChannel()
             val outChannel = out.asInstanceOf[FileOutputStream].getChannel()
    +        val initialPos = outChannel.position()
             val size = inChannel.size()
     
             // In case transferTo method transferred less data than we have required.
             while (count < size) {
               count += inChannel.transferTo(count, size - count, outChannel)
             }
    +
    +        // Check the position after transferTo loop to see if it is in the right position and
    +        // give user information if not.
    +        // Position will not be increased to the expected length after calling transferTo in
    +        // kernel version 2.6.32, this issue can be seen in
    +        // https://bugs.openjdk.java.net/browse/JDK-7052359
    +        // This will lead to stream corruption issue when using sort-based shuffle (SPARK-3948).
    +        val finalPos = outChannel.position()
    +        assert(finalPos == initialPos + size,
    +          s"""
    +             |Current position $finalPos do not equal to expected position ${initialPos + count}
    --- End diff --
    
    This `assert` checks whether `finalPos` is `initialPos + size`, but this error message uses `initialPos + count`; could this lead to confusion?
    
    I suppose that `count >= size` here, so it's probably fine, but it might be confusing if `count` was ever greater than `size`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59483494
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21854/consoleFull) for   PR 2824 at commit [`a82b184`](https://github.com/apache/spark/commit/a82b18423f57c5f584d93d2702d710d1cde843c2).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/2824#issuecomment-59469537
  
    I think if bug is occurred when running job, even if we do not throw an exception here, we will still meet other exceptions in reduce side, so I use `assert` here. I think an undocumented config is a good idea, but since we need SparkConf object to pass it, some other code path which use `copyStream` also need to change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r18999923
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -280,12 +280,29 @@ private[spark] object Utils extends Logging {
             // When both streams are File stream, use transferTo to improve copy performance.
             val inChannel = in.asInstanceOf[FileInputStream].getChannel()
             val outChannel = out.asInstanceOf[FileOutputStream].getChannel()
    +        val initialPos = outChannel.position()
             val size = inChannel.size()
     
             // In case transferTo method transferred less data than we have required.
             while (count < size) {
               count += inChannel.transferTo(count, size - count, outChannel)
             }
    +
    +        // Check the position after transferTo loop to see if it is in the right position and
    +        // give user information if not.
    +        // Position will not be increased to the expected length after calling transferTo in
    +        // kernel version 2.6.32, this issue can be seen in
    +        // scalastyle:off
    +        // https://bugs.openjdk.java.net/browse/JDK-7052359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel)
    --- End diff --
    
    You can remove url part after the '?' here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3948][Shuffle]Fix stream corruption bug...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2824#discussion_r19065298
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -272,20 +272,38 @@ private[spark] object Utils extends Logging {
       /** Copy all data from an InputStream to an OutputStream */
       def copyStream(in: InputStream,
                      out: OutputStream,
    -                 closeStreams: Boolean = false): Long =
    +                 closeStreams: Boolean = false,
    +                 transferToEnabled: Boolean = true): Long =
       {
         var count = 0L
         try {
    -      if (in.isInstanceOf[FileInputStream] && out.isInstanceOf[FileOutputStream]) {
    +      if (in.isInstanceOf[FileInputStream] && out.isInstanceOf[FileOutputStream]
    +        && transferToEnabled) {
             // When both streams are File stream, use transferTo to improve copy performance.
             val inChannel = in.asInstanceOf[FileInputStream].getChannel()
             val outChannel = out.asInstanceOf[FileOutputStream].getChannel()
    +        val initialPos = outChannel.position()
             val size = inChannel.size()
     
             // In case transferTo method transferred less data than we have required.
             while (count < size) {
               count += inChannel.transferTo(count, size - count, outChannel)
             }
    +
    +        // Check the position after transferTo loop to see if it is in the right position and
    +        // give user information if not.
    +        // Position will not be increased to the expected length after calling transferTo in
    +        // kernel version 2.6.32, this issue can be seen in
    +        // https://bugs.openjdk.java.net/browse/JDK-7052359
    +        // This will lead to stream corruption issue when using sort-based shuffle (SPARK-3948).
    +        val finalPos = outChannel.position()
    +        assert(finalPos == initialPos + size,
    +          s"""
    +             |Current position $finalPos do not equal to expected position ${initialPos + count}
    --- End diff --
    
    I think normally `size` would be equal to `count`. I will change to `size` to keep consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org