You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "jchen5 (via GitHub)" <gi...@apache.org> on 2023/12/18 15:25:25 UTC

[PR] [SPARK-46446][SQL] Disable subqueries with correlated OFFSET to fix correctness bug [spark]

jchen5 opened a new pull request, #44401:
URL: https://github.com/apache/spark/pull/44401

   ### What changes were proposed in this pull request?
   Subqueries with correlation under LIMIT with OFFSET have a correctness bug, introduced recently when support for correlation under OFFSET was enabled but were not handled correctly. (So we went from unsupported, query throws error -> wrong results.) This is in master branch, not yet released.
   
   This PR first disables correlated OFFSET. Next PR will add support for it and re-enable it.
   
   It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS
   
   Example repro:
   
   ```
   create table x(x1 int, x2 int);
   insert into x values (1, 1), (2, 2);
   create table y(y1 int, y2 int);
   insert into y values (1, 1), (1, 2), (2, 4);
   
   select * from x where exists (select * from y where x1 = y1 limit 1 offset 2)
   ```
   
   Correct result: empty set
   Spark result: Array([2,2])
   
   ### Why are the changes needed?
   Correctness bug
   
   ### Does this PR introduce _any_ user-facing change?
   Disables correlated OFFSET query shape which was not handled correctly. (This was enabled on master branch but not yet released.)
   
   ### How was this patch tested?
   Add tests
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46446][SQL] Disable subqueries with correlated OFFSET to fix correctness bug [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #44401: [SPARK-46446][SQL] Disable subqueries with correlated OFFSET to fix correctness bug
URL: https://github.com/apache/spark/pull/44401


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46446][SQL] Disable subqueries with correlated OFFSET to fix correctness bug [spark]

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on PR #44401:
URL: https://github.com/apache/spark/pull/44401#issuecomment-1861423678

   @agubichev @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46446][SQL] Disable subqueries with correlated OFFSET to fix correctness bug [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #44401:
URL: https://github.com/apache/spark/pull/44401#issuecomment-1862138984

   The docker and pyspark test failures are unrelated, thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46446][SQL] Disable subqueries with correlated OFFSET to fix correctness bug [spark]

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on PR #44401:
URL: https://github.com/apache/spark/pull/44401#issuecomment-1863008374

   I had some minor test fixes that missed this PR, follow-up PR to fix the tests here: https://github.com/apache/spark/pull/44415


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org