You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by linhong-intel <gi...@git.apache.org> on 2018/11/15 05:14:40 UTC

[GitHub] spark pull request #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream s...

GitHub user linhong-intel opened a pull request:

    https://github.com/apache/spark/pull/23040

    [SPARK-26068][Core]ChunkedByteBufferInputStream should handle empty chunks correctly

    ## What changes were proposed in this pull request?
    
    Empty chunk in ChunkedByteBuffer will truncate the ChunkedByteBufferInputStream.
    The detail reason is described in 
    
    ## How was this patch tested?
    Modified current UT to cover this case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/linhong-intel/spark fix-empty-chunked-byte-buffer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23040.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23040
    
----
commit ab81c1e8eefabc9249e7643dffb8d7c0eef3c7dc
Author: Liu,Linhong <li...@...>
Date:   2018-11-15T03:37:39Z

    ChunkedByteBufferInputStream should handle empty chunks correctly

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23040


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by linhong-intel <gi...@git.apache.org>.
Github user linhong-intel commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    cc @xuanyuanking 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    It's good to fix a potential bug, can you add a unit test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    **[Test build #98989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98989/testReport)** for PR 23040 at commit [`3c6d349`](https://github.com/apache/spark/commit/3c6d349b26e54ead7c345e11ffacf14edcd072c1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    cc @ericl and @JoshRosen, this bug was introduced by https://github.com/apache/spark/pull/14099/files 
    
    After loosing empty chunk check, the ChunkedByteBufferInputStream doesn't handle empty chunks correctly


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    LGTM except one comment


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23040#discussion_r234395227
  
    --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
    @@ -222,7 +222,7 @@ private[spark] class ChunkedByteBufferInputStream(
         dispose: Boolean)
       extends InputStream {
     
    -  private[this] var chunks = chunkedByteBuffer.getChunks().iterator
    +  private[this] var chunks = chunkedByteBuffer.getChunks().filter(_.hasRemaining).iterator
    --- End diff --
    
    can you add a comment above, saying that we do this filter because `read` assumes `chunks` has no empty chunk?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    **[Test build #98878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98878/testReport)** for PR 23040 at commit [`fa7af44`](https://github.com/apache/spark/commit/fa7af44abcd8ee95c956506c06badd83af067a03).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98878/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    also cc @jiangxb1987 @zsxwing 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by LinhongLiu <gi...@git.apache.org>.
Github user LinhongLiu commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    cc @cloud-fan @srowen 
    review is fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    **[Test build #98878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98878/testReport)** for PR 23040 at commit [`fa7af44`](https://github.com/apache/spark/commit/fa7af44abcd8ee95c956506c06badd83af067a03).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by LinhongLiu <gi...@git.apache.org>.
Github user LinhongLiu commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    Problem:
    ChunkedByteBuffer has signature `ChunkedByteBuffer(var chunks: Array[ByteBuffer])`. This means user is allowed to pass any kind of `chunks` to it. Then we will face the problem described in [JIRA](https://issues.apache.org/jira/browse/SPARK-26068). That's why I submit this PR.
    
    But on the other hand:
    This is an internal class and Spark safely uses `ChunkedByteBuffer` in 2 ways:
    1. Use `ChunkedByteBuffer(byteBuffer: ByteBuffer)` to pass only one buffer. Even if it's an empty one, spark will handle this case correctly.
    2. Use `ChunkedByteBufferOutputStream` to create `ChunkedByteBuffer` with multiple `chunks`. In this case, empty ByteBuffer will never happen.
    
    As a result, current spark code will never reach the problem as far as we won't use `ChunkedByteBuffer(var chunks: Array[ByteBuffer])` directly.
    
    So it's both OK either we fix this or not.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    **[Test build #98989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98989/testReport)** for PR 23040 at commit [`3c6d349`](https://github.com/apache/spark/commit/3c6d349b26e54ead7c345e11ffacf14edcd072c1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23040: [SPARK-26068][Core]ChunkedByteBufferInputStream should h...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23040
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98989/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org