You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/23 14:45:10 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #5705: minor: Add doc comments to clarify what Analyzer is for

alamb opened a new pull request, #5705:
URL: https://github.com/apache/arrow-datafusion/pull/5705

   # Which issue does this PR close?
   
   Related to the discussion on https://github.com/apache/arrow-datafusion/pull/5683 with @jackwener  and @mingmwang  
   
   # Rationale for this change
   
   I was not clear exactly what should be an `OptimizerRule` and what should be an `AnalyzerRule`
   
   # What changes are included in this PR?
   
   Add doc strings explaining the difference per @jackwener 's comment https://github.com/apache/arrow-datafusion/pull/5683#issuecomment-1480536552
   
   # Are these changes tested?
   
   N/A
   
   # Are there any user-facing changes?
   Better doc strings


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #5705: minor: Add doc comments to clarify what Analyzer is for

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #5705:
URL: https://github.com/apache/arrow-datafusion/pull/5705#issuecomment-1482177907

   Thanks @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener merged pull request #5705: minor: Add doc comments to clarify what Analyzer is for

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener merged PR #5705:
URL: https://github.com/apache/arrow-datafusion/pull/5705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #5705: minor: Add doc comments to clarify what Analyzer is for

Posted by "mingmwang (via GitHub)" <gi...@apache.org>.
mingmwang commented on code in PR #5705:
URL: https://github.com/apache/arrow-datafusion/pull/5705#discussion_r1147073711


##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -27,8 +27,16 @@ use log::{debug, trace};
 use std::sync::Arc;
 use std::time::Instant;
 
-/// `AnalyzerRule` transforms the unresolved ['LogicalPlan']s and unresolved ['Expr']s into
-/// the resolved form.
+/// [`AnalyzerRule`]s transform [`LogicalPlan`]s in some way to make
+/// the plan valid prior to the rest of the DataFusion optimization process.
+///
+/// For example, it may resolve [`Expr]s into more specific forms such
+/// as a subquery reference, to do type coercion to ensure the types
+/// of operands are correct.
+///
+/// This is different than an [`OptimizerRule`](crate::OptimizerRule)
+/// which should preserve the semantics of the LogicalPlan but compute
+/// it the same result in some more optimal way.
 pub trait AnalyzerRule {
     /// Rewrite `plan`
     fn analyze(&self, plan: &LogicalPlan, config: &ConfigOptions) -> Result<LogicalPlan>;

Review Comment:
   My bad, I just copy it from Logical optimizer rule. I think for Analyzer, we should change it to `plan: LogicalPlan`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5705: minor: Add doc comments to clarify what Analyzer is for

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5705:
URL: https://github.com/apache/arrow-datafusion/pull/5705#discussion_r1148012803


##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -27,8 +27,16 @@ use log::{debug, trace};
 use std::sync::Arc;
 use std::time::Instant;
 
-/// `AnalyzerRule` transforms the unresolved ['LogicalPlan']s and unresolved ['Expr']s into
-/// the resolved form.
+/// [`AnalyzerRule`]s transform [`LogicalPlan`]s in some way to make
+/// the plan valid prior to the rest of the DataFusion optimization process.
+///
+/// For example, it may resolve [`Expr]s into more specific forms such
+/// as a subquery reference, to do type coercion to ensure the types
+/// of operands are correct.
+///
+/// This is different than an [`OptimizerRule`](crate::OptimizerRule)
+/// which should preserve the semantics of the LogicalPlan but compute
+/// it the same result in some more optimal way.
 pub trait AnalyzerRule {
     /// Rewrite `plan`
     fn analyze(&self, plan: &LogicalPlan, config: &ConfigOptions) -> Result<LogicalPlan>;

Review Comment:
   Proposed PR: https://github.com/apache/arrow-datafusion/pull/5728



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5705: minor: Add doc comments to clarify what Analyzer is for

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5705:
URL: https://github.com/apache/arrow-datafusion/pull/5705#discussion_r1146338524


##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -27,8 +27,16 @@ use log::{debug, trace};
 use std::sync::Arc;
 use std::time::Instant;
 
-/// `AnalyzerRule` transforms the unresolved ['LogicalPlan']s and unresolved ['Expr']s into
-/// the resolved form.
+/// [`AnalyzerRule`]s transform [`LogicalPlan`]s in some way to make
+/// the plan valid prior to the rest of the DataFusion optimization process.
+///
+/// For example, it may resolve [`Expr]s into more specific forms such
+/// as a subquery reference, to do type coercion to ensure the types
+/// of operands are correct.
+///
+/// This is different than an [`OptimizerRule`](crate::OptimizerRule)
+/// which should preserve the semantics of the LogicalPlan but compute
+/// it the same result in some more optimal way.
 pub trait AnalyzerRule {
     /// Rewrite `plan`
     fn analyze(&self, plan: &LogicalPlan, config: &ConfigOptions) -> Result<LogicalPlan>;

Review Comment:
   I'm also confused about it๐Ÿ˜‚.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5705: minor: Add doc comments to clarify what Analyzer is for

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5705:
URL: https://github.com/apache/arrow-datafusion/pull/5705#discussion_r1146314877


##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -27,8 +27,16 @@ use log::{debug, trace};
 use std::sync::Arc;
 use std::time::Instant;
 
-/// `AnalyzerRule` transforms the unresolved ['LogicalPlan']s and unresolved ['Expr']s into
-/// the resolved form.
+/// [`AnalyzerRule`]s transform [`LogicalPlan`]s in some way to make
+/// the plan valid prior to the rest of the DataFusion optimization process.
+///
+/// For example, it may resolve [`Expr]s into more specific forms such
+/// as a subquery reference, to do type coercion to ensure the types
+/// of operands are correct.
+///
+/// This is different than an [`OptimizerRule`](crate::OptimizerRule)
+/// which should preserve the semantics of the LogicalPlan but compute
+/// it the same result in some more optimal way.
 pub trait AnalyzerRule {
     /// Rewrite `plan`
     fn analyze(&self, plan: &LogicalPlan, config: &ConfigOptions) -> Result<LogicalPlan>;

Review Comment:
   While reading this I wonder why it takes `plan: &LogicalPlan` rather than `plan: LogicalPlan` -- given that `LogicalPlan` is returned this interface requires `clone`ing the LogicalPlan on each step



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org