You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2020/07/14 23:54:06 UTC
[spark] branch branch-3.0 updated: [MINOR][R] Match
collectAsArrowToR with non-streaming collectAsArrowToPython
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new 245b83a [MINOR][R] Match collectAsArrowToR with non-streaming collectAsArrowToPython
245b83a is described below
commit 245b83aeab2db9aac7ed41c16d8b229120fa3723
Author: HyukjinKwon <gu...@apache.org>
AuthorDate: Wed Jul 15 08:46:20 2020 +0900
[MINOR][R] Match collectAsArrowToR with non-streaming collectAsArrowToPython
### What changes were proposed in this pull request?
This PR proposes to port forward #29098 to `collectAsArrowToR`. `collectAsArrowToR` follows `collectAsArrowToPython` in branch-2.4 due to the limitation of ARROW-4512. SparkR vectorization currently cannot use streaming format.
### Why are the changes needed?
For simplicity and consistency.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
The same code is being tested in `collectAsArrowToPython` of branch-2.4.
Closes #29100 from HyukjinKwon/minor-parts.
Authored-by: HyukjinKwon <gu...@apache.org>
Signed-off-by: HyukjinKwon <gu...@apache.org>
(cherry picked from commit 03b5707b516187aaa8012049fce8b1cd0ac0fddd)
Signed-off-by: HyukjinKwon <gu...@apache.org>
---
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 2ddedf3..3b6fd2f 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -3492,7 +3492,7 @@ class Dataset[T] private[sql](
val numPartitions = arrowBatchRdd.partitions.length
// Store collection results for worst case of 1 to N-1 partitions
- val results = new Array[Array[Array[Byte]]](numPartitions - 1)
+ val results = new Array[Array[Array[Byte]]](Math.max(0, numPartitions - 1))
var lastIndex = -1 // index of last partition written
// Handler to eagerly write partitions to Python in order
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org