You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2022/09/19 12:53:35 UTC
[spark] branch master updated: [SPARK-40456][SQL] PartitionIterator.hasNext should be cheap to call repeatedly
This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 3b9e5cf662c [SPARK-40456][SQL] PartitionIterator.hasNext should be cheap to call repeatedly
3b9e5cf662c is described below
commit 3b9e5cf662cd90cb3d64bd3abc57e0be26367631
Author: Wenchen Fan <we...@databricks.com>
AuthorDate: Mon Sep 19 20:52:49 2022 +0800
[SPARK-40456][SQL] PartitionIterator.hasNext should be cheap to call repeatedly
### What changes were proposed in this pull request?
This PR caches the result of `PartitionReader.next` in `PartitionIterator`, so that its `hasNext` method is cheap to be called repeatedly.
### Why are the changes needed?
potential perf improvement. `PartitionReader.next` can be expensive in some v2 sources, and it's legal to call `Iterator.hasNext` repeatedly.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
existing tests
Closes #37900 from cloud-fan/minor.
Authored-by: Wenchen Fan <we...@databricks.com>
Signed-off-by: Wenchen Fan <we...@databricks.com>
---
.../apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala
index 09c8756ca01..67e77a97865 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala
@@ -111,12 +111,14 @@ private class PartitionIterator[T](
reader: PartitionReader[T],
customMetrics: Map[String, SQLMetric]) extends Iterator[T] {
private[this] var valuePrepared = false
+ private[this] var hasMoreInput = true
private var numRow = 0L
override def hasNext: Boolean = {
- if (!valuePrepared) {
- valuePrepared = reader.next()
+ if (!valuePrepared && hasMoreInput) {
+ hasMoreInput = reader.next()
+ valuePrepared = hasMoreInput
}
valuePrepared
}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org