You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by squito <gi...@git.apache.org> on 2018/10/29 20:20:13 UTC

[GitHub] spark pull request #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for st...

GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/22882

    [SPARK-25871][STREAMING][WIP] Don't use EC for streaming WAL

    ## What changes were proposed in this pull request?
    
    The write ahead log expects to be able to call hflush, but that is a no-op when writing to a file with hdfs erasure coding.  So ensure that file is always written with replication instead, regardless of filesystem defaults.
    
    Note this is a WIP on top of changes from https://github.com/apache/spark/pull/22881.  The only new change here is https://github.com/apache/spark/commit/98204e6bcb840f1a47e1a3bd73da5fd7c9b22bcd
    
    ## How was this patch tested?
    
    None yet.  I'm posting this mostly to make it visible, as it was trivial on top of https://github.com/apache/spark/pull/22881


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark SPARK-25871

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22882.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22882
    
----
commit 005ee5494acd3d9f0721ad24ba3700d8905e2e26
Author: Imran Rashid <ir...@...>
Date:   2018-10-26T19:03:43Z

    [SPARK-25855][CORE][STREAMING] Don't use HDFS EC for event logs and WAL
    
    hdfs erasure coding doesn't support hflush(), hsync(), or append(),
    which doesn't work well for event logs and the WAL, so be sure we never
    use it for those files, regardless of the configuration of hdfs.

commit 04b968a0223e195f1c7e6d6684274bd7f8484069
Author: Imran Rashid <ir...@...>
Date:   2018-10-26T20:22:11Z

    fix

commit 8a9392c875b9b2aec048940a8ae7d03529bfc641
Author: Imran Rashid <ir...@...>
Date:   2018-10-29T15:56:20Z

    make it configurable

commit cd28e61fe9232927ea66b3beb4af5c5d699bb6d3
Author: Imran Rashid <ir...@...>
Date:   2018-10-29T20:09:28Z

    remove changes for WAL

commit 98204e6bcb840f1a47e1a3bd73da5fd7c9b22bcd
Author: Imran Rashid <ir...@...>
Date:   2018-10-29T20:12:14Z

    Add back changes for the WAL
    This reverts commit cd28e61fe9232927ea66b3beb4af5c5d699bb6d3.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    **[Test build #98229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98229/testReport)** for PR 22882 at commit [`98204e6`](https://github.com/apache/spark/commit/98204e6bcb840f1a47e1a3bd73da5fd7c9b22bcd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Merging to master. I'll remove the "WIP" during merge.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    The problem I understand my question is more like why [98204e6](https://github.com/apache/spark/commit/98204e6bcb840f1a47e1a3bd73da5fd7c9b22bcd) is not enough in the PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4796/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98229/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    **[Test build #98517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98517/testReport)** for PR 22882 at commit [`7f93d52`](https://github.com/apache/spark/commit/7f93d5209593584b68642257accccdb52b915b6f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for st...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22882


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    **[Test build #98517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98517/testReport)** for PR 22882 at commit [`7f93d52`](https://github.com/apache/spark/commit/7f93d5209593584b68642257accccdb52b915b6f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    **[Test build #98229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98229/testReport)** for PR 22882 at commit [`98204e6`](https://github.com/apache/spark/commit/98204e6bcb840f1a47e1a3bd73da5fd7c9b22bcd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98517/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    the problem is that the WAL expects `hflush()` & `append()` to work, but they don't for EC files.  So if you have hdfs configured to use EC by default, then anytime you try to use the streaming WAL you'll get a failure.  The end-user might not be the one that even configured hdfs EC, so it doesn't seem reasonable to expect them to configure a non-EC dir on hdfs and then point the WAL at that dir.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    I know it's WIP but just wondering why the whole patch needed?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    ah right, sorry I hadn't updated this after that merged.  I just updated it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4607/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22882: [SPARK-25871][STREAMING][WIP] Don't use EC for streaming...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22882
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org