You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/01 15:31:06 UTC

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2675: WIP: Create new `datafusion-optimizer` crate for logical optimizer rules

andygrove opened a new pull request, #2675:
URL: https://github.com/apache/arrow-datafusion/pull/2675

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Builds on https://github.com/apache/arrow-datafusion/pull/2667
   
   Closes https://github.com/apache/arrow-datafusion/issues/2599
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   As a user of DataFusion for SQL query planning, I would like to be able to use the logical plan optimizer rules without depending on the full datafusion crate containing the execution engine.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   - New `datafusion-optimizer` crate
   - Move some rules to the new crate
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   API change
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   
   # Does this PR break compatibility with Ballista?
   
   No
   
   <!--
   The CI checks will attempt to build [arrow-ballista](https://github.com/apache/arrow-ballista) against this PR. If 
   this check fails then it indicates that this PR makes a breaking change to the DataFusion API.
   
   If possible, try to make the change in a way that is not a breaking API change. For example, if code has moved 
    around, try adding `pub use` from the original location to preserve the current API.
   
   If it is not possible to avoid a breaking change (such as when adding enum variants) then follow this process:
   
   - Make a corresponding PR against `arrow-ballista` with the changes required there
   - Update `dev/build-arrow-ballista.sh` to clone the appropriate `arrow-ballista` repo & branch
   - Merge this PR when CI passes
   - Merge the Ballista PR
   - Create a new PR here to reset `dev/build-arrow-ballista.sh` to point to `arrow-ballista` master again
   
   _If you would like to help improve this process, please see https://github.com/apache/arrow-datafusion/issues/2583_
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2675: Create new `datafusion-optimizer` crate for logical optimizer rules

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2675:
URL: https://github.com/apache/arrow-datafusion/pull/2675#issuecomment-1144914883

   @alamb @tustvold @yjshen PTAL when you have time. This is the last piece of the crate refactoring that I had planned. There is just one optimizer rule that needs to be moved over to the new crate and I hope to get that moved in the next 1-2 weeks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove merged pull request #2675: Create new `datafusion-optimizer` crate for logical optimizer rules

Posted by GitBox <gi...@apache.org>.
andygrove merged PR #2675:
URL: https://github.com/apache/arrow-datafusion/pull/2675


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2675: Create new `datafusion-optimizer` crate for logical optimizer rules

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2675:
URL: https://github.com/apache/arrow-datafusion/pull/2675#discussion_r888221676


##########
datafusion/optimizer/src/test/mod.rs:
##########
@@ -0,0 +1,56 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::datatypes::{DataType, Field, Schema};
+use datafusion_common::Result;
+use datafusion_expr::{logical_plan::table_scan, LogicalPlan, LogicalPlanBuilder};
+
+pub mod user_defined;
+
+/// some tests share a common table with different names
+pub fn test_table_scan_with_name(name: &str) -> Result<LogicalPlan> {

Review Comment:
   These functions don't appear to be being used by any tests that haven't also been moved, so perhaps they could be removed from core also?



##########
datafusion/optimizer/src/test/user_defined.rs:
##########
@@ -0,0 +1,80 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   This appears to be duplicating the file from core, instead of moving it. Is this intentional?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2675: Create new `datafusion-optimizer` crate for logical optimizer rules

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #2675:
URL: https://github.com/apache/arrow-datafusion/pull/2675#discussion_r888257065


##########
datafusion/optimizer/src/test/mod.rs:
##########
@@ -0,0 +1,56 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::datatypes::{DataType, Field, Schema};
+use datafusion_common::Result;
+use datafusion_expr::{logical_plan::table_scan, LogicalPlan, LogicalPlanBuilder};
+
+pub mod user_defined;
+
+/// some tests share a common table with different names
+pub fn test_table_scan_with_name(name: &str) -> Result<LogicalPlan> {

Review Comment:
   I have removed this from core now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2675: Create new `datafusion-optimizer` crate for logical optimizer rules

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #2675:
URL: https://github.com/apache/arrow-datafusion/pull/2675#discussion_r888256806


##########
datafusion/optimizer/src/test/user_defined.rs:
##########
@@ -0,0 +1,80 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   @tustvold Thanks. I hadn't realized that this was only used by the optimizer test. I have now moved this instead,



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org