You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by brkyvz <gi...@git.apache.org> on 2018/02/26 00:06:49 UTC

[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

GitHub user brkyvz opened a pull request:

    https://github.com/apache/spark/pull/20673

    [SPARK-23515] Use input/output streams for large events in JsonProtocol.sparkEventToJson

    ## What changes were proposed in this pull request?
    
    `def sparkEventToJson(event: SparkListenerEvent)`
    
    has a fallback method which creates a JSON object by turning an unrecognized event to Json and then parsing it again. This method materializes the whole string to parse the json record, which is unnecessary, and can cause OOMs as seen in the stack trace below:
    
    ```
    java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at java.lang.StringBuilder.toString(StringBuilder.java:407)
    at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3736)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726)
    at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:20)
    at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:50)
    at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:103)
    ```
    
    We should just use the stream parsing to avoid such OOMs.
    
    ## How was this patch tested?
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brkyvz/spark eventLoggingJson

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20673
    
----
commit 774188003c5b1c1a000d69f5996dce580c7a1432
Author: Burak Yavuz <br...@...>
Date:   2018-02-25T20:07:22Z

    use streams for large events

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    **[Test build #87662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87662/testReport)** for PR 20673 at commit [`2df0f2b`](https://github.com/apache/spark/commit/2df0f2b6f1d8df1b4fee6534bbc649f487f6ae29).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1050/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    **[Test build #87679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87679/testReport)** for PR 20673 at commit [`b783bcb`](https://github.com/apache/spark/commit/b783bcb090fa31250740e020baaef123067ac1ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2528/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87661/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    **[Test build #87661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87661/testReport)** for PR 20673 at commit [`2df0f2b`](https://github.com/apache/spark/commit/2df0f2b6f1d8df1b4fee6534bbc649f487f6ae29).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87679/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    **[Test build #87658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87658/testReport)** for PR 20673 at commit [`7741880`](https://github.com/apache/spark/commit/774188003c5b1c1a000d69f5996dce580c7a1432).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    **[Test build #87679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87679/testReport)** for PR 20673 at commit [`b783bcb`](https://github.com/apache/spark/commit/b783bcb090fa31250740e020baaef123067ac1ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    **[Test build #87662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87662/testReport)** for PR 20673 at commit [`2df0f2b`](https://github.com/apache/spark/commit/2df0f2b6f1d8df1b4fee6534bbc649f487f6ae29).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    **[Test build #87658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87658/testReport)** for PR 20673 at commit [`7741880`](https://github.com/apache/spark/commit/774188003c5b1c1a000d69f5996dce580c7a1432).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1048/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20673#discussion_r170646787
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -100,7 +102,18 @@ private[spark] object JsonProtocol {
             executorMetricsUpdateToJson(metricsUpdate)
           case blockUpdate: SparkListenerBlockUpdated =>
             blockUpdateToJson(blockUpdate)
    -      case _ => parse(mapper.writeValueAsString(event))
    +      case _ =>
    +        // Use piped streams to avoid extra memory consumption
    +        val outputStream = new PipedOutputStream()
    +        val inputStream = new PipedInputStream(outputStream)
    +        try {
    +          mapper.writeValue(outputStream, event)
    --- End diff --
    
    I was actually hoping for a test to fail, but none did (the test suite has a bunch of very specific stuff). This code will likely block forever if the block size is larger. Going to add a test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1045/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20673#discussion_r170478050
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -17,13 +17,15 @@
     
     package org.apache.spark.util
     
    +import java.io.{ByteArrayOutputStream, PipedInputStream, PipedOutputStream}
    --- End diff --
    
    `ByteArrayOutputStream` seems not used here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87662/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1049/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1066/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20673#discussion_r170477844
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -100,7 +102,16 @@ private[spark] object JsonProtocol {
             executorMetricsUpdateToJson(metricsUpdate)
           case blockUpdate: SparkListenerBlockUpdated =>
             blockUpdateToJson(blockUpdate)
    -      case _ => parse(mapper.writeValueAsString(event))
    +      case _ =>
    +        val outputStream = new PipedOutputStream()
    --- End diff --
    
    Hi @brkyvz, how about adding a note that `PipedOutputStream` is intentionally used to get rid of additional consumption of memory if I get this correctly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    **[Test build #87661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87661/testReport)** for PR 20673 at commit [`2df0f2b`](https://github.com/apache/spark/commit/2df0f2b6f1d8df1b4fee6534bbc649f487f6ae29).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    After benchmarking, observed that this didn't provide much benefit :( Closing the PR


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    What's the status of this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20673#discussion_r170537137
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -100,7 +102,18 @@ private[spark] object JsonProtocol {
             executorMetricsUpdateToJson(metricsUpdate)
           case blockUpdate: SparkListenerBlockUpdated =>
             blockUpdateToJson(blockUpdate)
    -      case _ => parse(mapper.writeValueAsString(event))
    +      case _ =>
    +        // Use piped streams to avoid extra memory consumption
    +        val outputStream = new PipedOutputStream()
    +        val inputStream = new PipedInputStream(outputStream)
    +        try {
    +          mapper.writeValue(outputStream, event)
    --- End diff --
    
    Wait wait .. does this lazily work for sure? Can we add a test (or manual test in the PR description) that reads some more data (maybe more then the buffer size in that pipe)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20673#discussion_r170477999
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -100,7 +102,16 @@ private[spark] object JsonProtocol {
             executorMetricsUpdateToJson(metricsUpdate)
           case blockUpdate: SparkListenerBlockUpdated =>
             blockUpdateToJson(blockUpdate)
    -      case _ => parse(mapper.writeValueAsString(event))
    +      case _ =>
    +        val outputStream = new PipedOutputStream()
    +        val inputStream = new PipedInputStream(outputStream)
    +        try {
    +          mapper.writeValue(outputStream, event)
    +          parse(inputStream)
    +        } finally {
    +          IOUtils.closeQuietly(outputStream)
    --- End diff --
    
    and .. another note for  `IOUtils.closeQuietly` saying that it's intentionally used in case that the close might be already attempted by Jackson's library if I understood correctly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87658/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20673#discussion_r173156595
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -100,7 +102,24 @@ private[spark] object JsonProtocol {
             executorMetricsUpdateToJson(metricsUpdate)
           case blockUpdate: SparkListenerBlockUpdated =>
             blockUpdateToJson(blockUpdate)
    -      case _ => parse(mapper.writeValueAsString(event))
    +      case _ =>
    +        // Use piped streams to avoid extra memory consumption
    +        val outputStream = new PipedOutputStream()
    +        val inputStream = new PipedInputStream(outputStream)
    +        try {
    +          val thread = new Thread("SparkListenerEvent json writer") {
    --- End diff --
    
    Hey @brkyvz, this approach with a thread seems a little bit overkill ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20673: [SPARK-23515] Use input/output streams for large events ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the issue:

    https://github.com/apache/spark/pull/20673
  
    @HyukjinKwon Added tests


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz closed the pull request at:

    https://github.com/apache/spark/pull/20673


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org