You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jackwener (via GitHub)" <gi...@apache.org> on 2023/03/22 09:35:20 UTC

[GitHub] [arrow-datafusion] jackwener opened a new pull request, #5683: refactor: move InlineTableScan into Analyzer.

jackwener opened a new pull request, #5683:
URL: https://github.com/apache/arrow-datafusion/pull/5683

   # Which issue does this PR close?
   
   Closes #5682
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   
    move InlineTableScan into Analyzer.
   
   # Are these changes tested?
   
   Add Unit test for Analyzer
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5683: analyzer: move InlineTableScan into Analyzer.

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5683:
URL: https://github.com/apache/arrow-datafusion/pull/5683#discussion_r1146315356


##########
datafusion/optimizer/src/analyzer/inline_table_scan.rs:
##########
@@ -15,73 +15,113 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Optimizer rule to replace TableScan references
-//! such as DataFrames and Views and inlines the LogicalPlan
-//! to support further optimization
-use crate::optimizer::ApplyOrder;
-use crate::{OptimizerConfig, OptimizerRule};
+//! Analyzed rule to replace TableScan references
+//! such as DataFrames and Views and inlines the LogicalPlan.
+use std::sync::Arc;
+
+use datafusion_common::config::ConfigOptions;
 use datafusion_common::Result;
-use datafusion_expr::{logical_plan::LogicalPlan, Expr, LogicalPlanBuilder, TableScan};
+use datafusion_expr::expr_rewriter::rewrite_expr;
+use datafusion_expr::{
+    logical_plan::LogicalPlan, Expr, Filter, LogicalPlanBuilder, TableScan,
+};
+
+use crate::analyzer::AnalyzerRule;
+use crate::rewrite::TreeNodeRewritable;
 
-/// Optimization rule that inlines TableScan that provide a [LogicalPlan]
+/// Analyzed rule that inlines TableScan that provide a [LogicalPlan]
 /// (DataFrame / ViewTable)
 #[derive(Default)]
 pub struct InlineTableScan;
 
 impl InlineTableScan {
-    #[allow(missing_docs)]
     pub fn new() -> Self {
         Self {}
     }
 }
 
-impl OptimizerRule for InlineTableScan {
-    fn try_optimize(
-        &self,
-        plan: &LogicalPlan,
-        _config: &dyn OptimizerConfig,
-    ) -> Result<Option<LogicalPlan>> {
-        match plan {
-            // Match only on scans without filter / projection / fetch
-            // Views and DataFrames won't have those added
-            // during the early stage of planning
-            LogicalPlan::TableScan(TableScan {
-                source,
-                table_name,
-                filters,
-                projection,
-                ..
-            }) if filters.is_empty() => {
-                if let Some(sub_plan) = source.get_logical_plan() {
-                    let projection_exprs =
-                        generate_projection_expr(projection, sub_plan)?;
-                    let plan = LogicalPlanBuilder::from(sub_plan.clone())
-                        .project(projection_exprs)?
-                        // Since this This is creating a subquery like:
-                        //```sql
-                        // ...
-                        // FROM <view definition> as "table_name"
-                        // ```
-                        //
-                        // it doesn't make sense to have a qualified
-                        // reference (e.g. "foo"."bar") -- this convert to
-                        // string
-                        .alias(table_name.to_string())?;
-                    Ok(Some(plan.build()?))
-                } else {
-                    Ok(None)
-                }
-            }
-            _ => Ok(None),
-        }
+impl AnalyzerRule for InlineTableScan {
+    fn analyze(&self, plan: &LogicalPlan, _: &ConfigOptions) -> Result<LogicalPlan> {
+        plan.clone().transform_up(&analyze_internal)

Review Comment:
   The clone is unfortunate, but a legacy of the existing API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #5683: anlayzer: move InlineTableScan into Analyzer.

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #5683:
URL: https://github.com/apache/arrow-datafusion/pull/5683#issuecomment-1480536552

   > However, the other rule, https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/analyzer/count_wildcard_rule.rs does not fot this criteria either 🤔
   
   Spark has a similar rule `ExtractWindowExpressions` like it.
   
   > Taking a step back, I thought the purpose of the Analyzer was to do semantic analysis (things like doing type checks) or maybe coercing the types so the the plan became valid.
   
   I think any action to make plan valid should be put into analyzer like `resolvePlan`, `typeCoercion`, .....
   I think `inlineTableScan` also belong this category.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5683: analyzer: move InlineTableScan into Analyzer.

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5683:
URL: https://github.com/apache/arrow-datafusion/pull/5683#issuecomment-1481327302

   > I think any action to make plan valid should be put into analyzer like resolvePlan, typeCoercion, .....
   > I think inlineTableScan/count_wildcard_rule also belong this category.
   
   That makes sense to me. I tried to capture this in doc comments for future readers here:  https://github.com/apache/arrow-datafusion/pull/5705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener merged pull request #5683: analyzer: move InlineTableScan into Analyzer.

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener merged PR #5683:
URL: https://github.com/apache/arrow-datafusion/pull/5683


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #5683: anlayzer: move InlineTableScan into Analyzer.

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #5683:
URL: https://github.com/apache/arrow-datafusion/pull/5683#issuecomment-1479213691

   cc @alamb @mingmwang @liukun4515 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org