You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/11 15:36:13 UTC

[GitHub] [spark] SaurabhChawla100 edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

SaurabhChawla100 edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727


   @wangyum - Just curious to clear my understanding about this change.
   I am able to see the scalar sub-query in the example without joins.
   `select max(d) from t2)`
   
   Are we pushing down the scalar sub-query which is having  single or multiple joins and than returning the single row 
   For eg 
   `SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1)`
   
   So now if the `select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result 
   
   if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
   
   Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
   
   Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
   
   `If Scalar Subquery completes first than scan of t1 starts in before this change or after this change also , Than push down of  scalar subquery will always be faster`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org