You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/14 17:05:56 UTC

[GitHub] [spark] bersprockets opened a new pull request #35851: [SPARK-38528][SQL][3.2] Eagerly iterate over aggregate sequence when build…

bersprockets opened a new pull request #35851:
URL: https://github.com/apache/spark/pull/35851


   Backport of #35837.
   
   ### What changes were proposed in this pull request?
   
   When building the project list from an aggregate sequence in `ExtractGenerator`, convert the aggregate sequence to an `IndexedSeq` before performing the flatMap operation.
   
   ### Why are the changes needed?
   
   This query fails with a `NullPointerException`:
   ```
   val df = Seq(1, 2, 3).toDF("v")
   df.select(Stream(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect
   ```
   If you change `Stream` to `Seq`, then it succeeds.
   
   `ExtractGenerator` uses a flatMap operation over `aggList` for two purposes:
   
   - To produce a new aggregate list
   - to update `projectExprs` (which is initialized as an array of nulls).
   
   When `aggList` is a `Stream`, the flatMap operation evaluates lazily, so all entries in `projectExprs` after the first will still be null when the rule completes.
   
   Changing `aggList` to an `IndexedSeq` forces the flatMap to evaluate eagerly.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   New unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #35851: [SPARK-38528][SQL][3.2] Eagerly iterate over aggregate sequence when building project list in `ExtractGenerator`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #35851:
URL: https://github.com/apache/spark/pull/35851#issuecomment-1079801982


   Merged to branch-3.2 and branch-3.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #35851: [SPARK-38528][SQL][3.2] Eagerly iterate over aggregate sequence when building project list in `ExtractGenerator`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #35851:
URL: https://github.com/apache/spark/pull/35851


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on pull request #35851: [SPARK-38528][SQL][3.2] Eagerly iterate over aggregate sequence when building project list in `ExtractGenerator`

Posted by GitBox <gi...@apache.org>.
bersprockets commented on pull request #35851:
URL: https://github.com/apache/spark/pull/35851#issuecomment-1079995843


   Thanks. Should I close this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35851: [SPARK-38528][SQL][3.2] Eagerly iterate over aggregate sequence when building project list in `ExtractGenerator`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35851:
URL: https://github.com/apache/spark/pull/35851#issuecomment-1079801982


   Merged to branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org