You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "sander-goos (via GitHub)" <gi...@apache.org> on 2023/09/21 15:16:25 UTC

[GitHub] [spark] sander-goos opened a new pull request, #43035: [SPARK-45256][ARROW][WIP] DurationWriter fails when writing more values than initial capacity

sander-goos opened a new pull request, #43035:
URL: https://github.com/apache/spark/pull/43035

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   The DurationWriter fails if more values are written than the initial capacity of the DurationVector (4032). Fix by using `setSafe` instead of `set` method.
   
   ### Why are the changes needed?
   Bug fix
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added unit test for checking all types when writing more values than initial capacity.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity 
URL: https://github.com/apache/spark/pull/43035


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43035:
URL: https://github.com/apache/spark/pull/43035#discussion_r1333400696


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala:
##########
@@ -150,6 +150,75 @@ class ArrowWriterSuite extends SparkFunSuite {
     DataTypeTestUtils.dayTimeIntervalTypes.foreach(check(_, (-10 until 10).map(_ * 1000.toLong)))
   }
 
+  test("write multiple, over initial capacity") {
+    def createArrowWriter(
+        schema: StructType,
+        timeZoneId: String): (ArrowWriter, Int) = {
+      val arrowSchema =
+        ArrowUtils.toArrowSchema(schema, timeZoneId, errorOnDuplicatedFieldNames = true)

Review Comment:
   Hi, @sander-goos . It seems that you missed `import` statement.
   ```
   [error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala:158:9: not found: value ArrowUtils
   [error]         ArrowUtils.toArrowSchema(schema, timeZoneId, errorOnDuplicatedFieldNames = true)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sander-goos commented on pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity

Posted by "sander-goos (via GitHub)" <gi...@apache.org>.
sander-goos commented on PR #43035:
URL: https://github.com/apache/spark/pull/43035#issuecomment-1731360942

   > +1, this PR looks reasonable (Pending CIs). There is no perf regression for the case which fits the limit, right, @sander-goos ?
   
   There shouldn't be a perf regression; the extra call to `handleSafe` is a no-op when index < capacity. The Arrow writers for other types also `setSafe` where applicable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43035:
URL: https://github.com/apache/spark/pull/43035#issuecomment-1731330890

   Thank you for updating. Pending CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43035:
URL: https://github.com/apache/spark/pull/43035#issuecomment-1731694203

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org