You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/22 17:15:45 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #6747: Make it easier to create WindowFunctions with the Expr API

alamb opened a new issue, #6747:
URL: https://github.com/apache/arrow-datafusion/issues/6747

   ### Is your feature request related to a problem or challenge?
   
   Follow on to #5781 
   
   There are at least three things named `WindowFunction` in DataFusion -- `Expr::WindowFunction`, `window_function::WindowFunction` and `expr::WindowFunction`
   
   https://docs.rs/datafusion-expr/26.0.0/datafusion_expr/index.html?search=WindowFunction
   
   Constructing an Expr::WindowFunction to pass to [`LogicalPlanBuilder::window`](https://docs.rs/datafusion-expr/26.0.0/datafusion_expr/logical_plan/builder/struct.LogicalPlanBuilder.html#method.window) is quite challenging
   
   ### Describe the solution you'd like
   
   I would like to make this process easier with a builder style:
   
   for `lead(foo) OVER(PARTITION BY bar)` for example:
   
   ```rust
   let expr = lead(col("foo"))
     .with_partition_by(col("bar"))
   ````
   
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6747:
URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2090260284

   Here is another example from https://github.com/apache/datafusion/pull/10345 / @timsaucer  showing how non easy it is to create a window function via the expr API
   
   ```rust
   use datafusion::{logical_expr::{expr::WindowFunction, BuiltInWindowFunction, WindowFrame, WindowFunctionDefinition}, prelude::*};
   
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
   
       let ctx = SessionContext::new();
       let mut df = ctx.read_csv("/Users/tsaucer/working/testing_ballista/lead_lag/example.csv", CsvReadOptions::default()).await?;
   
       df = df.with_column("array_col", make_array(vec![col("a"), col("b"), col("c")]))?;
   
       df.clone().show().await?;
   
       let lag_expr = Expr::WindowFunction(WindowFunction::new(
           WindowFunctionDefinition::BuiltInWindowFunction(
               BuiltInWindowFunction::Lead,
           ),
           vec![col("array_col")],
           vec![],
           vec![],
           WindowFrame::new(None),
           None,
       ));
   
       df = df.select(vec![col("a"), col("b"), col("c"), col("array_col"), lag_expr.alias("lagged")])?;
   
       df.show().await?;
   
       Ok(())
   }
   ```
   
   It would be great if instead of 
   
   ```rust
       let lag_expr = Expr::WindowFunction(WindowFunction::new(
           WindowFunctionDefinition::BuiltInWindowFunction(
               BuiltInWindowFunction::Lead,
           ),
           vec![col("array_col")],
           vec![],
           vec![],
           WindowFrame::new(None),
           None,
       ));
   ```
   
   It looked more like
   
   ```rust
       let lag_expr = lead(
           vec![col("array_col")],
           vec![],
           vec![],
           WindowFrame::new(None),
           None,
       ));
   ```
   
   Maybe even better like a builder style 
   
   ```rust
       let lag_expr = lead(col("array_col")).build()
   ```
   
   Which would permit adding the various `OVER` clauses like
   ```rust
       let lag_expr = lead(col("array_col"))
         .partition_by(vec![])
         .order_by(vec![])
         .build()
   ```
   
   Maybe there are some inspirations in the polars API too: https://docs.pola.rs/user-guide/expressions/window/#group-by-aggregations-in-selection


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For additional commands, e-mail: github-help@datafusion.apache.org


Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6747:
URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2090266602

   🤔  it seems like spark's API is like 
   
   > count("dt").over(w).alias("count")).show()
   
   https://stackoverflow.com/questions/32769328/how-to-use-window-functions-in-pyspark-using-dataframes
   
   So maybe for DataFusion it could look like
   
   ```rust
      let w = Window::new()
        .partition_by(col("id"))
        .order_by(col("dt"));
   
       let lag_expr = lag(col("array_col"))
          .over(w)
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For additional commands, e-mail: github-help@datafusion.apache.org


Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6747:
URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2090267577

   Note I have some code in https://github.com/apache/datafusion/pull/6746 that had some part of it (along with an example)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For additional commands, e-mail: github-help@datafusion.apache.org