You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ahshahid (via GitHub)" <gi...@apache.org> on 2023/11/14 22:51:43 UTC

[PR] SPARK-45926 : Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

ahshahid opened a new pull request, #43808:
URL: https://github.com/apache/spark/pull/43808

   ### What changes were proposed in this pull request?
   Implementing equals and hashCode in the InMemoryBatchScan and  InMemoryV2FilterBatchScan so that the pushed runtime filters are taken into account.
   
   
   ### Why are the changes needed?
   These test classes need the change so that the issue of re-use of exchange not happening in AQE when runtime filters are pushed to scan levels,  can be reproduced. Currently either the tests are not actually validating the reuse of exchange happening when dynamic pruning expressions are pushed to scan levels in partitioned key based joins, or they are false passing because the equality of the test scans are not taking into account runtime filters.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   This is test class change., It is possibe that the existing tests may fail till the dependent fixes are made in source code.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1811684533

   cc @ulysses-you and @peter-toth FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

Posted by "ahshahid (via GitHub)" <gi...@apache.org>.
ahshahid commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1811946724

   @dongjoon-hyun I think the reason for not catching the issue of reuse of exchange is a mix of multiple things
   1) Spark is not testing with any concrete DataSourceV2 implementation. ( like iceberg)
   2) The simulation of DataSourceV2 impl using InMemoryTableScan is buggy because of equals / hashcode not taking into account pushed runtime filters, as a result any reuse of exchange bug would not be caught ( i.e mismatch of cached exchange plans would not be detected, giving a false assurance of re-use0
   3) If I am not wrong, the tpcds tests are run using Hive  as DataSource and not sure if it supports push down of runtime filters.
   4) The bug in AQE only shows in TPCDS if table are partitioned and equi join involves partitioning column.  I am not sure if right now various tpcds tests use partitioned table or not.
   
   Yes I have been able to reproduce the issue using InMemoryTableScans as DataSourceV2 impl for tpcds tests. I will checkin a prototype test for reproducing the bug using q14b and if needed all tests can be run.
   
   Though I ought to point out that while running my test I also hit the issue of computeStats being called twice ( which throws error only in testing). I have not debugged that... yet. And not sure if the assertion of computeStats occuring only once is maintainable..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #43808: [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans
URL: https://github.com/apache/spark/pull/43808


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1811953814

   Thank you so much for the details and a upcoming prototype test case.
   
   For that issue, you can file a new JIRA issue.
   > Though I ought to point out that while running my test I also hit the issue of computeStats being called twice ( which throws error only in testing). I have not debugged that... yet. And not sure if the assertion of computeStats occuring only once is maintainable..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1812006973

   It seems this PR is unrelated to runtime filter. I guess you mean is DS V2 filter pushdown


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

Posted by "ahshahid (via GitHub)" <gi...@apache.org>.
ahshahid commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1811808992

   @HyukjinKwon  thanks for correcting the titles of the PRs. will take care next time..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

Posted by "ahshahid (via GitHub)" <gi...@apache.org>.
ahshahid commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1820247612

   @dongjoon-hyun added bug test in PR [SPARK-45866](https://github.com/apache/spark/pull/43824)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1972199669

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org