You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/22 17:15:45 UTC
[GitHub] [arrow-datafusion] alamb opened a new issue, #6747: Make it easier to create WindowFunctions with the Expr API
alamb opened a new issue, #6747:
URL: https://github.com/apache/arrow-datafusion/issues/6747
### Is your feature request related to a problem or challenge?
Follow on to #5781
There are at least three things named `WindowFunction` in DataFusion -- `Expr::WindowFunction`, `window_function::WindowFunction` and `expr::WindowFunction`
https://docs.rs/datafusion-expr/26.0.0/datafusion_expr/index.html?search=WindowFunction
Constructing an Expr::WindowFunction to pass to [`LogicalPlanBuilder::window`](https://docs.rs/datafusion-expr/26.0.0/datafusion_expr/logical_plan/builder/struct.LogicalPlanBuilder.html#method.window) is quite challenging
### Describe the solution you'd like
I would like to make this process easier with a builder style:
for `lead(foo) OVER(PARTITION BY bar)` for example:
```rust
let expr = lead(col("foo"))
.with_partition_by(col("bar"))
````
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6747:
URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2090260284
Here is another example from https://github.com/apache/datafusion/pull/10345 / @timsaucer showing how non easy it is to create a window function via the expr API
```rust
use datafusion::{logical_expr::{expr::WindowFunction, BuiltInWindowFunction, WindowFrame, WindowFunctionDefinition}, prelude::*};
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let ctx = SessionContext::new();
let mut df = ctx.read_csv("/Users/tsaucer/working/testing_ballista/lead_lag/example.csv", CsvReadOptions::default()).await?;
df = df.with_column("array_col", make_array(vec![col("a"), col("b"), col("c")]))?;
df.clone().show().await?;
let lag_expr = Expr::WindowFunction(WindowFunction::new(
WindowFunctionDefinition::BuiltInWindowFunction(
BuiltInWindowFunction::Lead,
),
vec![col("array_col")],
vec![],
vec![],
WindowFrame::new(None),
None,
));
df = df.select(vec![col("a"), col("b"), col("c"), col("array_col"), lag_expr.alias("lagged")])?;
df.show().await?;
Ok(())
}
```
It would be great if instead of
```rust
let lag_expr = Expr::WindowFunction(WindowFunction::new(
WindowFunctionDefinition::BuiltInWindowFunction(
BuiltInWindowFunction::Lead,
),
vec![col("array_col")],
vec![],
vec![],
WindowFrame::new(None),
None,
));
```
It looked more like
```rust
let lag_expr = lead(
vec![col("array_col")],
vec![],
vec![],
WindowFrame::new(None),
None,
));
```
Maybe even better like a builder style
```rust
let lag_expr = lead(col("array_col")).build()
```
Which would permit adding the various `OVER` clauses like
```rust
let lag_expr = lead(col("array_col"))
.partition_by(vec![])
.order_by(vec![])
.build()
```
Maybe there are some inspirations in the polars API too: https://docs.pola.rs/user-guide/expressions/window/#group-by-aggregations-in-selection
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For additional commands, e-mail: github-help@datafusion.apache.org
Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6747:
URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2090266602
🤔 it seems like spark's API is like
> count("dt").over(w).alias("count")).show()
https://stackoverflow.com/questions/32769328/how-to-use-window-functions-in-pyspark-using-dataframes
So maybe for DataFusion it could look like
```rust
let w = Window::new()
.partition_by(col("id"))
.order_by(col("dt"));
let lag_expr = lag(col("array_col"))
.over(w)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For additional commands, e-mail: github-help@datafusion.apache.org
Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6747:
URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2090267577
Note I have some code in https://github.com/apache/datafusion/pull/6746 that had some part of it (along with an example)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscribe@datafusion.apache.org
For additional commands, e-mail: github-help@datafusion.apache.org