You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jiangzhx (via GitHub)" <gi...@apache.org> on 2023/04/11 05:04:43 UTC

[GitHub] [arrow-datafusion] jiangzhx commented on issue #5791: SQL case scalar_subquery logical_paln unexpected Aggregate: groupBy=[[col]]

jiangzhx commented on issue #5791:
URL: https://github.com/apache/arrow-datafusion/issues/5791#issuecomment-1502692953

   i find a way to create scalar subquery when use dataframe.
   
   There are 3 places here that need attention in order to write the correct code.
   
   1. `.select(vec![count(lit(COUNT_STAR_EXPANSION))])?   `  should add after aggregate, scalar_subquery_to_join look like  need this
   3.  `count(*)` rewrite to `count(lit(COUNT_STAR_EXPANSION))`    after #5686 merge ,it's not necessary.
   4. ` .into_optimized_plan()` rewrite to `.into_unoptimized_plan()`  because eliminate_projection will remove this. then got scalar_subquery_to_join not work
   
   
   Maybe it's just me, but writing the correct code using a dataframe is not easy. We should write a document for this place. 
   
   
   the right way to create scalar subquery  with dataframe api.
   ```
   
   #[tokio::test]
   async fn test_count_wildcard_on_where_scalar_subquery() -> Result<()> {
       let ctx = create_join_context()?;
       ctx.sql("select a,b from t1 where (select count(*) from t2 where t1.a = t2.a)>0;")
           .await?
           .explain(false, false)?
           .show()
           .await?;
   
       let subquery = Expr::ScalarSubquery(datafusion_expr::Subquery {
           subquery: Arc::new(
               ctx.table("t2")
                   .await?
                   .filter(col("t1.a").eq(col("t2.a")))?
                   .aggregate(vec![], vec![count(lit(COUNT_STAR_EXPANSION))])?
                   .select(vec![count(lit(COUNT_STAR_EXPANSION))])?   
                   .into_unoptimized_plan(),
           ),
           outer_ref_columns: vec![],
       });
   
       ctx.table("t1")
           .await?
           .filter(subquery.gt(lit(ScalarValue::UInt8(Some(0)))))?
           .select(vec![col("t1.a"), col("t1.b")])?
           .explain(false, false)?
           .show()
           .await?;
       Ok(())
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org