You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ji...@apache.org on 2022/02/09 15:36:41 UTC

[arrow-datafusion] branch physical-plan updated (6f86fb9 -> 3242cef)

This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a change to branch physical-plan
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git.


 discard 6f86fb9  remove reference to logical plan in physical plan
 discard b0ea0b9  add module level comments
 discard 7396899  move expr functions to datafusion-expr expr_fn
     add f6df759  Less verbose plans in debug logging (#1787)
     add 071f14a  Update `ExecutionPlan` to know about sortedness and repartitioning optimizer pass respect the invariants (#1776)
     add ecd0081  [split/13] move rest of expr to expr_fn in datafusion-expr module (#1794)
     new c6481c8  move expr functions to datafusion-expr expr_fn
     new d6af461  add module level comments
     new 3242cef  remove reference to logical plan in physical plan

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (6f86fb9)
            \
             N -- N -- N   refs/heads/physical-plan (3242cef)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../core/src/execution_plans/distributed_query.rs  |   9 +
 .../core/src/execution_plans/shuffle_reader.rs     |   9 +
 .../core/src/execution_plans/shuffle_writer.rs     |   9 +
 .../core/src/execution_plans/unresolved_shuffle.rs |   9 +
 ballista/rust/executor/src/collect.rs              |   5 +
 datafusion/src/execution/context.rs                |   8 +-
 datafusion/src/physical_optimizer/repartition.rs   | 396 ++++++++++++++++++---
 datafusion/src/physical_plan/analyze.rs            |   9 +
 datafusion/src/physical_plan/coalesce_batches.rs   |   9 +
 .../src/physical_plan/coalesce_partitions.rs       |   9 +
 datafusion/src/physical_plan/cross_join.rs         |   9 +
 datafusion/src/physical_plan/empty.rs              |   5 +
 datafusion/src/physical_plan/explain.rs            |  10 +-
 datafusion/src/physical_plan/file_format/avro.rs   |   9 +
 datafusion/src/physical_plan/file_format/csv.rs    |   9 +
 datafusion/src/physical_plan/file_format/json.rs   |   9 +
 .../src/physical_plan/file_format/parquet.rs       |   9 +
 datafusion/src/physical_plan/filter.rs             |  14 +
 datafusion/src/physical_plan/hash_aggregate.rs     |  13 +
 datafusion/src/physical_plan/hash_join.rs          |   9 +
 datafusion/src/physical_plan/limit.rs              |  40 ++-
 datafusion/src/physical_plan/memory.rs             |   9 +
 datafusion/src/physical_plan/mod.rs                |  67 +++-
 datafusion/src/physical_plan/planner.rs            |  20 +-
 datafusion/src/physical_plan/projection.rs         |  15 +-
 datafusion/src/physical_plan/repartition.rs        |   9 +
 datafusion/src/physical_plan/sorts/sort.rs         |  13 +
 .../physical_plan/sorts/sort_preserving_merge.rs   |   8 +
 datafusion/src/physical_plan/union.rs              |  11 +-
 datafusion/src/physical_plan/values.rs             |   9 +
 .../src/physical_plan/windows/window_agg_exec.rs   |  13 +
 datafusion/src/test/exec.rs                        |  24 +-
 datafusion/tests/custom_sources.rs                 |   4 +
 datafusion/tests/provider_filter_pushdown.rs       |   5 +
 datafusion/tests/statistics.rs                     |   9 +-
 datafusion/tests/user_defined_plan.rs              |   9 +
 36 files changed, 755 insertions(+), 78 deletions(-)

[arrow-datafusion] 02/03: add module level comments

Posted by ji...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch physical-plan
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit d6af46133089497738b614308c8ea6c91fbb5c77
Author: Jiayu Liu <ji...@hey.com>
AuthorDate: Wed Feb 9 13:14:43 2022 +0800

    add module level comments
---
 datafusion-expr/src/lib.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/datafusion-expr/src/lib.rs b/datafusion-expr/src/lib.rs
index 1d0837f..709fa63 100644
--- a/datafusion-expr/src/lib.rs
+++ b/datafusion-expr/src/lib.rs
@@ -37,8 +37,8 @@ pub use columnar_value::{ColumnarValue, NullColumnarValue};
 pub use expr::Expr;
 pub use expr_fn::col;
 pub use function::{
-  AccumulatorFunctionImplementation, ReturnTypeFunction, ScalarFunctionImplementation,
-  StateTypeFunction,
+    AccumulatorFunctionImplementation, ReturnTypeFunction, ScalarFunctionImplementation,
+    StateTypeFunction,
 };
 pub use literal::{lit, lit_timestamp_nano, Literal, TimestampLiteral};
 pub use operator::Operator;

[arrow-datafusion] 03/03: remove reference to logical plan in physical plan

Posted by ji...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch physical-plan
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit 3242cef6f8ed7cfbea7d2f58a65b5eac61539755
Author: Jiayu Liu <ji...@hey.com>
AuthorDate: Wed Feb 9 18:07:30 2022 +0800

    remove reference to logical plan in physical plan
---
 datafusion/src/physical_plan/coercion_rule/binary_rule.rs |  4 ++--
 datafusion/src/physical_plan/expressions/binary.rs        |  2 +-
 datafusion/src/physical_plan/expressions/case.rs          |  2 +-
 datafusion/src/physical_plan/file_format/parquet.rs       | 13 +++++++------
 datafusion/src/physical_plan/filter.rs                    |  3 ++-
 datafusion/src/physical_plan/planner.rs                   |  7 ++++---
 datafusion/src/physical_plan/windows/aggregate.rs         |  2 +-
 datafusion/src/physical_plan/windows/mod.rs               |  2 +-
 8 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/datafusion/src/physical_plan/coercion_rule/binary_rule.rs b/datafusion/src/physical_plan/coercion_rule/binary_rule.rs
index 426d59f..7d4dd55 100644
--- a/datafusion/src/physical_plan/coercion_rule/binary_rule.rs
+++ b/datafusion/src/physical_plan/coercion_rule/binary_rule.rs
@@ -19,8 +19,8 @@
 
 use crate::arrow::datatypes::DataType;
 use crate::error::{DataFusionError, Result};
-use crate::logical_plan::Operator;
 use crate::scalar::{MAX_PRECISION_FOR_DECIMAL128, MAX_SCALE_FOR_DECIMAL128};
+use datafusion_expr::Operator;
 
 /// Coercion rules for all binary operators. Returns the output type
 /// of applying `op` to an argument of `lhs_type` and `rhs_type`.
@@ -494,7 +494,7 @@ mod tests {
     use super::*;
     use crate::arrow::datatypes::DataType;
     use crate::error::{DataFusionError, Result};
-    use crate::logical_plan::Operator;
+    use datafusion_expr::Operator;
 
     #[test]
 
diff --git a/datafusion/src/physical_plan/expressions/binary.rs b/datafusion/src/physical_plan/expressions/binary.rs
index d1fc3bc..9f007a2 100644
--- a/datafusion/src/physical_plan/expressions/binary.rs
+++ b/datafusion/src/physical_plan/expressions/binary.rs
@@ -59,11 +59,11 @@ use arrow::error::ArrowError::DivideByZero;
 use arrow::record_batch::RecordBatch;
 
 use crate::error::{DataFusionError, Result};
-use crate::logical_plan::Operator;
 use crate::physical_plan::coercion_rule::binary_rule::coerce_types;
 use crate::physical_plan::expressions::try_cast;
 use crate::physical_plan::{ColumnarValue, PhysicalExpr};
 use crate::scalar::ScalarValue;
+use datafusion_expr::Operator;
 
 // Simple (low performance) kernels until optimized kernels are added to arrow
 // See https://github.com/apache/arrow-rs/issues/960
diff --git a/datafusion/src/physical_plan/expressions/case.rs b/datafusion/src/physical_plan/expressions/case.rs
index 2a680d3..d990d74 100644
--- a/datafusion/src/physical_plan/expressions/case.rs
+++ b/datafusion/src/physical_plan/expressions/case.rs
@@ -456,12 +456,12 @@ mod tests {
     use super::*;
     use crate::{
         error::Result,
-        logical_plan::Operator,
         physical_plan::expressions::{binary, col, lit},
         scalar::ScalarValue,
     };
     use arrow::array::StringArray;
     use arrow::datatypes::*;
+    use datafusion_expr::Operator;
 
     #[test]
     fn case_with_expr() -> Result<()> {
diff --git a/datafusion/src/physical_plan/file_format/parquet.rs b/datafusion/src/physical_plan/file_format/parquet.rs
index 539904e..168a5c0 100644
--- a/datafusion/src/physical_plan/file_format/parquet.rs
+++ b/datafusion/src/physical_plan/file_format/parquet.rs
@@ -27,7 +27,6 @@ use crate::datasource::PartitionedFile;
 use crate::physical_plan::expressions::PhysicalSortExpr;
 use crate::{
     error::{DataFusionError, Result},
-    logical_plan::{Column, Expr},
     physical_optimizer::pruning::{PruningPredicate, PruningStatistics},
     physical_plan::{
         file_format::FileScanConfig,
@@ -38,6 +37,8 @@ use crate::{
     },
     scalar::ScalarValue,
 };
+use datafusion_common::Column;
+use datafusion_expr::Expr;
 
 use arrow::{
     array::ArrayRef,
@@ -928,7 +929,7 @@ mod tests {
 
     #[test]
     fn row_group_pruning_predicate_simple_expr() -> Result<()> {
-        use crate::logical_plan::{col, lit};
+        use datafusion_expr::{col, lit};
         // int > 1 => c1_max > 1
         let expr = col("c1").gt(lit(15));
         let schema = Schema::new(vec![Field::new("c1", DataType::Int32, false)]);
@@ -961,7 +962,7 @@ mod tests {
 
     #[test]
     fn row_group_pruning_predicate_missing_stats() -> Result<()> {
-        use crate::logical_plan::{col, lit};
+        use datafusion_expr::{col, lit};
         // int > 1 => c1_max > 1
         let expr = col("c1").gt(lit(15));
         let schema = Schema::new(vec![Field::new("c1", DataType::Int32, false)]);
@@ -996,7 +997,7 @@ mod tests {
 
     #[test]
     fn row_group_pruning_predicate_partial_expr() -> Result<()> {
-        use crate::logical_plan::{col, lit};
+        use datafusion_expr::{col, lit};
         // test row group predicate with partially supported expression
         // int > 1 and int % 2 => c1_max > 1 and true
         let expr = col("c1").gt(lit(15)).and(col("c2").modulus(lit(2)));
@@ -1082,7 +1083,7 @@ mod tests {
 
     #[test]
     fn row_group_pruning_predicate_null_expr() -> Result<()> {
-        use crate::logical_plan::{col, lit};
+        use datafusion_expr::{col, lit};
         // int > 1 and IsNull(bool) => c1_max > 1 and bool_null_count > 0
         let expr = col("c1").gt(lit(15)).and(col("c2").is_null());
         let schema = Arc::new(Schema::new(vec![
@@ -1110,7 +1111,7 @@ mod tests {
 
     #[test]
     fn row_group_pruning_predicate_eq_null_expr() -> Result<()> {
-        use crate::logical_plan::{col, lit};
+        use datafusion_expr::{col, lit};
         // test row group predicate with an unknown (Null) expr
         //
         // int > 1 and bool = NULL => c1_max > 1 and null
diff --git a/datafusion/src/physical_plan/filter.rs b/datafusion/src/physical_plan/filter.rs
index 4125058..69ff6bf 100644
--- a/datafusion/src/physical_plan/filter.rs
+++ b/datafusion/src/physical_plan/filter.rs
@@ -242,13 +242,14 @@ mod tests {
 
     use super::*;
     use crate::datasource::object_store::local::LocalFileSystem;
+    use crate::physical_plan::collect;
     use crate::physical_plan::expressions::*;
     use crate::physical_plan::file_format::{CsvExec, FileScanConfig};
     use crate::physical_plan::ExecutionPlan;
     use crate::scalar::ScalarValue;
     use crate::test;
     use crate::test_util;
-    use crate::{logical_plan::Operator, physical_plan::collect};
+    use datafusion_expr::Operator;
     use std::iter::Iterator;
 
     #[tokio::test]
diff --git a/datafusion/src/physical_plan/planner.rs b/datafusion/src/physical_plan/planner.rs
index 24b8378..ce3351b 100644
--- a/datafusion/src/physical_plan/planner.rs
+++ b/datafusion/src/physical_plan/planner.rs
@@ -1434,17 +1434,18 @@ mod tests {
     use crate::execution::options::CsvReadOptions;
     use crate::execution::runtime_env::RuntimeEnv;
     use crate::logical_plan::plan::Extension;
-    use crate::logical_plan::{DFField, DFSchema, DFSchemaRef};
     use crate::physical_plan::{
         expressions, DisplayFormatType, Partitioning, Statistics,
     };
     use crate::scalar::ScalarValue;
     use crate::{
-        logical_plan::{col, lit, sum, LogicalPlanBuilder},
-        physical_plan::SendableRecordBatchStream,
+        logical_plan::LogicalPlanBuilder, physical_plan::SendableRecordBatchStream,
     };
     use arrow::datatypes::{DataType, Field, SchemaRef};
     use async_trait::async_trait;
+    use datafusion_common::{DFField, DFSchema, DFSchemaRef};
+    use datafusion_expr::sum;
+    use datafusion_expr::{col, lit};
     use fmt::Debug;
     use std::convert::TryFrom;
     use std::{any::Any, fmt};
diff --git a/datafusion/src/physical_plan/windows/aggregate.rs b/datafusion/src/physical_plan/windows/aggregate.rs
index f7c29ba..4c97e2b 100644
--- a/datafusion/src/physical_plan/windows/aggregate.rs
+++ b/datafusion/src/physical_plan/windows/aggregate.rs
@@ -18,7 +18,6 @@
 //! Physical exec for aggregate window function expressions.
 
 use crate::error::{DataFusionError, Result};
-use crate::logical_plan::window_frames::{WindowFrame, WindowFrameUnits};
 use crate::physical_plan::windows::find_ranges_in_range;
 use crate::physical_plan::{
     expressions::PhysicalSortExpr, Accumulator, AggregateExpr, PhysicalExpr, WindowExpr,
@@ -26,6 +25,7 @@ use crate::physical_plan::{
 use arrow::compute::concat;
 use arrow::record_batch::RecordBatch;
 use arrow::{array::ArrayRef, datatypes::Field};
+use datafusion_expr::{WindowFrame, WindowFrameUnits};
 use std::any::Any;
 use std::iter::IntoIterator;
 use std::ops::Range;
diff --git a/datafusion/src/physical_plan/windows/mod.rs b/datafusion/src/physical_plan/windows/mod.rs
index 243c571..b3bf9ce 100644
--- a/datafusion/src/physical_plan/windows/mod.rs
+++ b/datafusion/src/physical_plan/windows/mod.rs
@@ -18,7 +18,6 @@
 //! Physical expressions for window functions
 
 use crate::error::{DataFusionError, Result};
-use crate::logical_plan::window_frames::WindowFrame;
 use crate::physical_plan::{
     aggregates,
     expressions::{
@@ -34,6 +33,7 @@ use crate::physical_plan::{
 };
 use crate::scalar::ScalarValue;
 use arrow::datatypes::Schema;
+use datafusion_expr::WindowFrame;
 use std::convert::TryInto;
 use std::ops::Range;
 use std::sync::Arc;

[arrow-datafusion] 01/03: move expr functions to datafusion-expr expr_fn

Posted by ji...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jiayuliu pushed a commit to branch physical-plan
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit c6481c8ce645d414137562861a49b6780fb6f040
Author: Jiayu Liu <ji...@hey.com>
AuthorDate: Wed Feb 9 13:08:39 2022 +0800

    move expr functions to datafusion-expr expr_fn
---
 datafusion-expr/src/lib.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/datafusion-expr/src/lib.rs b/datafusion-expr/src/lib.rs
index 709fa63..1d0837f 100644
--- a/datafusion-expr/src/lib.rs
+++ b/datafusion-expr/src/lib.rs
@@ -37,8 +37,8 @@ pub use columnar_value::{ColumnarValue, NullColumnarValue};
 pub use expr::Expr;
 pub use expr_fn::col;
 pub use function::{
-    AccumulatorFunctionImplementation, ReturnTypeFunction, ScalarFunctionImplementation,
-    StateTypeFunction,
+  AccumulatorFunctionImplementation, ReturnTypeFunction, ScalarFunctionImplementation,
+  StateTypeFunction,
 };
 pub use literal::{lit, lit_timestamp_nano, Literal, TimestampLiteral};
 pub use operator::Operator;