You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/06 18:27:13 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #3741: Add datafusion example of expression apis

alamb opened a new pull request, #3741:
URL: https://github.com/apache/arrow-datafusion/pull/3741

   Draft as it builds on: https://github.com/apache/arrow-datafusion/pull/3719
   
   
   
   # Which issue does this PR close?
   re https://github.com/apache/arrow-datafusion/pull/3719
   re https://github.com/apache/arrow-datafusion/issues/3708
   re https://github.com/apache/arrow-datafusion/issues/3740
   
   # Rationale for this change
   Documenting the APIs will make the more discoverable and improve the user experience
   
   # What changes are included in this PR?
   Add a new `expr_api` example
   
   # Are there any user-facing changes?
   Better docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3741: Add datafusion example of expression apis

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3741:
URL: https://github.com/apache/arrow-datafusion/pull/3741#discussion_r993591865


##########
datafusion/core/src/prelude.rs:
##########
@@ -34,7 +34,7 @@ pub use crate::execution::options::{
 pub use datafusion_common::Column;
 pub use datafusion_expr::{
     expr_fn::*,
-    lit,
+    lit, lit_timestamp_nano,

Review Comment:
   driveby fix



##########
datafusion-examples/examples/expr_api.rs:
##########
@@ -0,0 +1,136 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use datafusion::arrow::datatypes::{DataType, Field, Schema, TimeUnit};
+
+use datafusion::error::Result;
+use datafusion::logical_plan::ToDFSchema;
+use datafusion::optimizer::expr_simplifier::{ExprSimplifier, SimplifyContext};
+use datafusion::physical_expr::execution_props::ExecutionProps;
+use datafusion::prelude::*;
+use datafusion::{logical_plan::Operator, scalar::ScalarValue};
+
+/// This example demonstrates the DataFusion [`Expr`] API.
+///
+/// DataFusion comes with a powerful and extensive system for
+/// representing and manipulating expressions such as `A + 5` and `X
+/// IN ('foo', 'bar', 'baz')` and many other constructs.
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    // The easiest way to do create expressions is to use the
+    // "fluent"-style API, like this:
+    let expr = col("a") + lit(5);
+
+    // this creates the same expression as the following though with
+    // much less code,
+    let expr2 = Expr::BinaryExpr {
+        left: Box::new(col("a")),
+        op: Operator::Plus,
+        right: Box::new(Expr::Literal(ScalarValue::Int32(Some(5)))),
+    };
+    assert_eq!(expr, expr2);
+
+    simplify_demo()?;
+
+    Ok(())
+}
+
+/// In addition to easy construction, DataFusion exposes APIs for
+/// working with and simplifying such expressions that call into the
+/// same powerful and extensive implementation used for the query
+/// engine.
+fn simplify_demo() -> Result<()> {
+    // For example, lets say you have has created an expression such
+    // ts = to_timestamp("2020-09-08T12:00:00+00:00")
+    let expr = col("ts").eq(call_fn(
+        "to_timestamp",
+        vec![lit("2020-09-08T12:00:00+00:00")],
+    )?);
+
+    // Naively evaluating such an expression against a large number of
+    // rows would involve re-converting "2020-09-08T12:00:00+00:00" to a
+    // timestamp for each row which gets expensive
+    //
+    // However, DataFusion's simplification logic can do this for you
+
+    // you need to tell DataFusion the type of column "ts":
+    let schema = Schema::new(vec![make_ts_field("ts")]).to_dfschema_ref()?;
+
+    // And then build a simplifier
+    // the ExecutionProps carries information needed to simplify
+    // expressions, such as the current time (to evaluate `now()`
+    // correctly)
+    let props = ExecutionProps::new();
+    let context = SimplifyContext::new(&props).with_schema(schema);
+    let simplifier = ExprSimplifier::new(context);
+
+    // And then call the simplify_expr function:
+    let expr = simplifier.simplify(expr)?;
+
+    // DataFusion has simplified the expression to a comparison with a constant
+    // ts = 1599566400000000000; Tada!
+    assert_eq!(
+        expr,
+        col("ts").eq(lit_timestamp_nano(1599566400000000000i64))
+    );
+
+    // here are some other examples of what DataFusion is capable of
+    let schema = Schema::new(vec![
+        make_field("i", DataType::Int64),
+        make_field("b", DataType::Boolean),
+    ])
+    .to_dfschema_ref()?;
+    let context = SimplifyContext::new(&props).with_schema(schema);
+    let simplifier = ExprSimplifier::new(context);
+
+    // basic arithmetic simplification
+    // i + 1 + 2 => a + 3
+    // (note this is not done if the expr is (col("i") + (lit(1) + lit(2))))
+    assert_eq!(
+        simplifier.simplify(col("i") + (lit(1) + lit(2)))?,
+        col("i") + lit(3)
+    );
+
+    // TODO uncomment when https://github.com/apache/arrow-datafusion/issues/1160 is done

Review Comment:
   I will do a partial fix for this in a follow on PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #3741: Add datafusion example of expression apis

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #3741:
URL: https://github.com/apache/arrow-datafusion/pull/3741#issuecomment-1276515518

   Benchmark runs are scheduled for baseline = 3af09fbf2026fc079264b1f67bc7095dc7fe7161 and contender = c27b56f09a437bbe296ad5782142e8fe3e700e4e. c27b56f09a437bbe296ad5782142e8fe3e700e4e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/5b6435c61bfd4d8280f980d9b5f3adb7...241c095036ec4a6f90cb8b2e9eb9cec2/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ec7d2457eed44d919a11834ba6a1267c...f42bf68458dc485b824b10994de351dc/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/afaaba7c22254964905bddf09663cb11...1761128cb80d48fb9175292c622a92f8/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/fe8759d5c6ea4985ba5375b5719b220c...16d89d2c3b904bbdb837ff39791277b5/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3741: Add datafusion example of expression apis

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3741:
URL: https://github.com/apache/arrow-datafusion/pull/3741#discussion_r990384817


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -15,73 +15,25 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Simplify expressions optimizer rule
+//! Simplify expressions optimizer rule and implementation
 
-use crate::expr_simplifier::ExprSimplifiable;
+use crate::expr_simplifier::{ExprSimplifier, SimplifyContext};
 use crate::{expr_simplifier::SimplifyInfo, OptimizerConfig, OptimizerRule};
 use arrow::array::new_null_array;
 use arrow::datatypes::{DataType, Field, Schema};
 use arrow::error::ArrowError;
 use arrow::record_batch::RecordBatch;
-use datafusion_common::{DFSchema, DFSchemaRef, DataFusionError, Result, ScalarValue};
+use datafusion_common::{DFSchema, DataFusionError, Result, ScalarValue};
 use datafusion_expr::{
     expr_fn::{and, or},
     expr_rewriter::{ExprRewritable, ExprRewriter, RewriteRecursion},
     lit,
     logical_plan::LogicalPlan,
     utils::from_plan,
-    BuiltinScalarFunction, ColumnarValue, Expr, ExprSchemable, Operator, Volatility,
+    BuiltinScalarFunction, ColumnarValue, Expr, Operator, Volatility,
 };
 use datafusion_physical_expr::{create_physical_expr, execution_props::ExecutionProps};
 
-/// Provides simplification information based on schema and properties
-pub(crate) struct SimplifyContext<'a, 'b> {

Review Comment:
   This was moved into the public API



##########
datafusion/core/tests/simplification.rs:
##########
@@ -1,108 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-//! This program demonstrates the DataFusion expression simplification API.

Review Comment:
   These were inlined into `expr_api.rs` examples



##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -37,19 +41,27 @@ pub trait SimplifyInfo {
     fn execution_props(&self) -> &ExecutionProps;
 }
 
-/// trait for types that can be simplified
-pub trait ExprSimplifiable: Sized {

Review Comment:
   Instead of adding a `simplify` method on to `Expr` via this trait, I propose to have an `ExprSimplifier` struct that has a simplify method.
   
   I found it made the examples in `expr_api.rs` less awkward to write because the schema wasn't needed 



##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -950,30 +909,12 @@ macro_rules! assert_contains {
     };
 }
 
-/// Apply simplification and constant propagation to ([Expr]).
-///
-/// # Arguments
-///
-/// * `expr` - The logical expression
-/// * `schema` - The DataFusion schema for the expr, used to resolve `Column` references
-///                      to qualified or unqualified fields by name.
-/// * `props` - The Arrow schema for the input, used for determining expression data types
-///                    when performing type coercion.
-pub fn simplify_expr(

Review Comment:
   this was added in https://github.com/apache/arrow-datafusion/commit/fef45e74d6772cf1f4aa8b32338ac4509fa24ab4 by @ygf11  but when I was trying to write examples using it, it ended up being quite akward



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #3741: Add datafusion example of expression apis

Posted by GitBox <gi...@apache.org>.
alamb merged PR #3741:
URL: https://github.com/apache/arrow-datafusion/pull/3741


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3741: Add datafusion example of expression apis

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3741:
URL: https://github.com/apache/arrow-datafusion/pull/3741#issuecomment-1272024639

   I moved my proposed API changes into https://github.com/apache/arrow-datafusion/pull/3758


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org