You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2020/07/14 23:54:06 UTC

[spark] branch branch-3.0 updated: [MINOR][R] Match collectAsArrowToR with non-streaming collectAsArrowToPython

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 245b83a  [MINOR][R] Match collectAsArrowToR with non-streaming collectAsArrowToPython
245b83a is described below

commit 245b83aeab2db9aac7ed41c16d8b229120fa3723
Author: HyukjinKwon <gu...@apache.org>
AuthorDate: Wed Jul 15 08:46:20 2020 +0900

    [MINOR][R] Match collectAsArrowToR with non-streaming collectAsArrowToPython
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to port forward #29098 to `collectAsArrowToR`. `collectAsArrowToR` follows `collectAsArrowToPython` in branch-2.4 due to the limitation of ARROW-4512. SparkR vectorization currently cannot use streaming format.
    
    ### Why are the changes needed?
    
    For simplicity and consistency.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    The same code is being tested in `collectAsArrowToPython` of branch-2.4.
    
    Closes #29100 from HyukjinKwon/minor-parts.
    
    Authored-by: HyukjinKwon <gu...@apache.org>
    Signed-off-by: HyukjinKwon <gu...@apache.org>
    (cherry picked from commit 03b5707b516187aaa8012049fce8b1cd0ac0fddd)
    Signed-off-by: HyukjinKwon <gu...@apache.org>
---
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 2ddedf3..3b6fd2f 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -3492,7 +3492,7 @@ class Dataset[T] private[sql](
         val numPartitions = arrowBatchRdd.partitions.length
 
         // Store collection results for worst case of 1 to N-1 partitions
-        val results = new Array[Array[Array[Byte]]](numPartitions - 1)
+        val results = new Array[Array[Array[Byte]]](Math.max(0, numPartitions - 1))
         var lastIndex = -1  // index of last partition written
 
         // Handler to eagerly write partitions to Python in order


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org