You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "Jefffrey (via GitHub)" <gi...@apache.org> on 2023/03/07 19:44:49 UTC

[GitHub] [arrow-datafusion] Jefffrey opened a new pull request, #5509: Enforce ambiguity check whilst normalizing columns

Jefffrey opened a new pull request, #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509

# Which issue does this PR close?

Closes #5211, #5248, #5251

# Rationale for this change

Rather than having separate steps for doing ambiguity checks on unqualified columns and then normalizing them, will enforce ambiguity check during normalization (turning into qualified column). This way will not have to keep adding ambiguity checks for different paths (see issues listing which paths were missing these checks)

# What changes are included in this PR?

Introduce new unqualified column normalization methods to enforce ambiguity checks. Change old usages of the normalization method to use this new method.

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
-->

Added extra ambiguity tests

# Are there any user-facing changes?

Deprecated some methods (though unsure if they were intended to be user facing in the first place, vs being intended for use in other datafusion crates)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1134389458


##########
datafusion/expr/src/utils.rs:
##########
@@ -96,41 +96,7 @@ pub fn expr_to_columns(expr: &Expr, accum: &mut HashSet<Column>) -> Result<()> {
             Expr::ScalarVariable(_, var_names) => {
                 accum.insert(Column::from_name(var_names.join(".")));
             }
-            Expr::Alias(_, _)
-            | Expr::Literal(_)
-            | Expr::BinaryExpr { .. }
-            | Expr::Like { .. }
-            | Expr::ILike { .. }
-            | Expr::SimilarTo { .. }
-            | Expr::Not(_)
-            | Expr::IsNotNull(_)
-            | Expr::IsNull(_)
-            | Expr::IsTrue(_)
-            | Expr::IsFalse(_)
-            | Expr::IsUnknown(_)
-            | Expr::IsNotTrue(_)
-            | Expr::IsNotFalse(_)
-            | Expr::IsNotUnknown(_)
-            | Expr::Negative(_)
-            | Expr::Between { .. }
-            | Expr::Case { .. }
-            | Expr::Cast { .. }
-            | Expr::TryCast { .. }
-            | Expr::Sort { .. }
-            | Expr::ScalarFunction { .. }
-            | Expr::ScalarUDF { .. }
-            | Expr::WindowFunction { .. }
-            | Expr::AggregateFunction { .. }
-            | Expr::GroupingSet(_)
-            | Expr::AggregateUDF { .. }
-            | Expr::InList { .. }
-            | Expr::Exists { .. }
-            | Expr::InSubquery { .. }
-            | Expr::ScalarSubquery(_)
-            | Expr::Wildcard
-            | Expr::QualifiedWildcard { .. }
-            | Expr::GetIndexedField { .. }
-            | Expr::Placeholder { .. } => {}
+            _ => {}
         }

Review Comment:
   I will prepare a PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.

Jefffrey commented on code in PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1128477085


##########
datafusion/common/src/column.rs:
##########
@@ -154,6 +158,105 @@ impl Column {
                 .collect(),
         }))
     }
+
+    /// Qualify column if not done yet.
+    ///
+    /// If this column already has a [relation](Self::relation), it will be returned as is and the given parameters are
+    /// ignored. Otherwise this will search through the given schemas to find the column.
+    ///
+    /// Will check for ambiguity at each level of `schemas`.
+    ///
+    /// A schema matches if there is a single column that -- when unqualified -- matches this column. There is an
+    /// exception for `USING` statements, see below.
+    ///
+    /// # Using columns
+    /// Take the following SQL statement:
+    ///
+    /// ```sql
+    /// SELECT id FROM t1 JOIN t2 USING(id)
+    /// ```
+    ///
+    /// In this case, both `t1.id` and `t2.id` will match unqualified column `id`. To express this possibility, use
+    /// `using_columns`. Each entry in this array is a set of columns that are bound together via a `USING` clause. So
+    /// in this example this would be `[{t1.id, t2.id}]`.
+    ///
+    /// Regarding ambiguity check, `schemas` is structured to allow levels of schemas to be passed in.
+    /// For example:
+    ///
+    /// ```text
+    /// schemas = &[
+    ///    &[schema1, schema2], // first level
+    ///    &[schema3, schema4], // second level
+    /// ]
+    /// ```
+    ///
+    /// Will search for a matching field in all schemas in the first level. If a matching field according to above
+    /// mentioned conditions is not found, then will check the next level. If found more than one matching column across
+    /// all schemas in a level, that isn't a USING column, will return an error due to ambiguous column.
+    ///
+    /// If checked all levels and couldn't find field, will return field not found error.
+    pub fn normalize_with_schemas_and_ambiguity_check(
+        self,
+        schemas: &[&[&DFSchema]],

Review Comment:
   kinda ugly syntax here. it makes the logic work, but maybe could have something better?



##########
datafusion/common/src/column.rs:
##########
@@ -102,6 +102,10 @@ impl Column {
     /// In this case, both `t1.id` and `t2.id` will match unqualified column `id`. To express this possibility, use
     /// `using_columns`. Each entry in this array is a set of columns that are bound together via a `USING` clause. So
     /// in this example this would be `[{t1.id, t2.id}]`.
+    #[deprecated(
+        since = "20.0.0",
+        note = "use normalize_with_schemas_and_ambiguity_check instead"
+    )]

Review Comment:
   ive marked old methods as deprecated as unsure if should remove (breaking api change technically?)



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -505,10 +505,14 @@ pub async fn from_substrait_rex(
                         "Direct reference StructField with child is not supported"
                             .to_string(),
                     )),
-                    None => Ok(Arc::new(Expr::Column(Column {
-                        relation: None,
-                        name: input_schema.field(x.field as usize).name().to_string(),
-                    }))),
+                    None => {
+                        let column =
+                            input_schema.field(x.field as usize).qualified_column();
+                        Ok(Arc::new(Expr::Column(Column {
+                            relation: column.relation,
+                            name: column.name,
+                        })))
+                    }

Review Comment:
   this is a separate substrait fix, since it previously wasn't qualifying the column built even if the column itself was qualified



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -174,7 +174,26 @@ impl LogicalPlan {
         }
     }
 
+    /// Used for normalizing columns, as the fallback schemas to the main schema
+    /// of the plan.
+    pub fn fallback_normalize_schemas(&self) -> Vec<&DFSchema> {
+        match self {
+            LogicalPlan::Window(_)
+            | LogicalPlan::Projection(_)
+            | LogicalPlan::Aggregate(_)
+            | LogicalPlan::Unnest(_)
+            | LogicalPlan::Join(_)
+            | LogicalPlan::CrossJoin(_) => self
+                .inputs()
+                .iter()
+                .map(|input| input.schema().as_ref())
+                .collect(),
+            _ => vec![],
+        }
+    }

Review Comment:
   since previous column normalization depended on all_schemas, which i still dont fully understand, this method extracts the part from all_schemas which returns the self & the child schemas, but only returns child schema
   
   meant to be used as a fallback to the new normalization method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "ygf11 (via GitHub)" <gi...@apache.org>.

ygf11 commented on code in PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1133074480


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -947,16 +939,16 @@ impl LogicalPlanBuilder {
                 let right_key = r.into();
 
                 let left_using_columns = left_key.to_columns()?;
-                let normalized_left_key = normalize_col_with_schemas(
+                let normalized_left_key = normalize_col_with_schemas_and_ambiguity_check(
                     left_key,
-                    &[self.plan.schema(), right.schema()],
+                    &[&[self.plan.schema(), right.schema()]],
                     &[left_using_columns],
                 )?;
 
                 let right_using_columns = right_key.to_columns()?;
-                let normalized_right_key = normalize_col_with_schemas(
+                let normalized_right_key = normalize_col_with_schemas_and_ambiguity_check(
                     right_key,
-                    &[self.plan.schema(), right.schema()],
+                    &[&[self.plan.schema(), right.schema()]],

Review Comment:
   This improvement makes sense to me.
   
   We can replace `join_detailed` with `join_with_expr_keys`, and remove `join_detailed` after this pr.
   https://github.com/apache/arrow-datafusion/blob/9587339b0fb060f8d153bbb0f8de6a740195ccea/datafusion/expr/src/logical_plan/builder.rs#L568
   https://github.com/apache/arrow-datafusion/blob/9587339b0fb060f8d153bbb0f8de6a740195ccea/datafusion/expr/src/logical_plan/builder.rs#L928
      



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#issuecomment-1465176762

   Benchmark runs are scheduled for baseline = 5423ba0785035911680fdc27f8033fb7b31a577b and contender = aeb593f719508973fa4564a32bc2adc767c3713e. aeb593f719508973fa4564a32bc2adc767c3713e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/46873b668f7b4b629993a79a07b41f21...0b5b03076fca4eed96329912e21b3101/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/857d7f8214344968a204c88fb2d7c32f...1d83c01c152046ea98ecee4d6033c0b1/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/b9d8dc1a24784ed988bd495089525edc...7e3540ebbc8d415fb7acaa4147880811/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/bd795f0e30934279a7b39ee62d774eb5...1ad622b160e44f2ebbbf539d7277c63c/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb merged PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#issuecomment-1460973374

   I plan to review this carefully tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1131527905


##########
datafusion/expr/src/expr_rewriter.rs:
##########
@@ -365,6 +377,23 @@ pub fn normalize_col_with_schemas(
     })
 }
 
+pub fn normalize_col_with_schemas_and_ambiguity_check(

Review Comment:
   Maybe we can leave a reference to `Column::normalize_with_schemas_and_ambiguity_check` here:
   
   ```suggestion
   /// See [`Column::normalize_with_schemas_and_ambiguity_check`] for usage
   pub fn normalize_col_with_schemas_and_ambiguity_check(
   ```



##########
datafusion/common/src/column.rs:
##########
@@ -154,6 +158,105 @@ impl Column {
                 .collect(),
         }))
     }
+
+    /// Qualify column if not done yet.
+    ///
+    /// If this column already has a [relation](Self::relation), it will be returned as is and the given parameters are
+    /// ignored. Otherwise this will search through the given schemas to find the column.
+    ///
+    /// Will check for ambiguity at each level of `schemas`.
+    ///
+    /// A schema matches if there is a single column that -- when unqualified -- matches this column. There is an
+    /// exception for `USING` statements, see below.
+    ///
+    /// # Using columns
+    /// Take the following SQL statement:
+    ///
+    /// ```sql
+    /// SELECT id FROM t1 JOIN t2 USING(id)
+    /// ```
+    ///
+    /// In this case, both `t1.id` and `t2.id` will match unqualified column `id`. To express this possibility, use
+    /// `using_columns`. Each entry in this array is a set of columns that are bound together via a `USING` clause. So
+    /// in this example this would be `[{t1.id, t2.id}]`.
+    ///
+    /// Regarding ambiguity check, `schemas` is structured to allow levels of schemas to be passed in.
+    /// For example:
+    ///
+    /// ```text
+    /// schemas = &[
+    ///    &[schema1, schema2], // first level
+    ///    &[schema3, schema4], // second level
+    /// ]
+    /// ```
+    ///
+    /// Will search for a matching field in all schemas in the first level. If a matching field according to above
+    /// mentioned conditions is not found, then will check the next level. If found more than one matching column across
+    /// all schemas in a level, that isn't a USING column, will return an error due to ambiguous column.
+    ///
+    /// If checked all levels and couldn't find field, will return field not found error.
+    pub fn normalize_with_schemas_and_ambiguity_check(
+        self,
+        schemas: &[&[&DFSchema]],

Review Comment:
   This syntax is nasty but it makes the callsites better, in my opinion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "mingmwang (via GitHub)" <gi...@apache.org>.

mingmwang commented on code in PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1134838590


##########
datafusion/sql/src/select.rs:
##########
@@ -241,33 +236,26 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         plan: LogicalPlan,
         outer_query_schema: Option<&DFSchema>,
         planner_context: &mut PlannerContext,
-        from_schema: &DFSchema,
     ) -> Result<LogicalPlan> {
         match selection {
             Some(predicate_expr) => {
                 let mut join_schema = (**plan.schema()).clone();
-                let mut all_schemas: Vec<DFSchemaRef> = vec![];
-                for schema in plan.all_schemas() {
-                    all_schemas.push(schema.clone());
-                }
-                if let Some(outer) = outer_query_schema {
-                    all_schemas.push(Arc::new(outer.clone()));
+
+                let fallback_schemas = plan.fallback_normalize_schemas();
+                let outer_query_schema = if let Some(outer) = outer_query_schema {
                     join_schema.merge(outer);
-                }
-                let x: Vec<&DFSchemaRef> = all_schemas.iter().collect();
+                    vec![outer]
+                } else {
+                    vec![]
+                };
 

Review Comment:
   The join schema logic is problematic. I will fix it in other PR.  
   The `filter_expr` might contains reference columns from the outer_query_schema,  at the end,  it will create the `Filter` from the `filter_expr`, and the validation in the `Filter::try_new` will fail
   
   ```rust
   Ok(LogicalPlan::Filter(Filter::try_new(
                       filter_expr,
                       Arc::new(plan),
    )?)) 
   ```
   
   Actually, in the sub query plan, the out ref columns(correlated columns) can be in any places which accept expressions(Filter exprs, Projection exprs, Window exprs, Order by, Aggregate expr, Having, Join exprs, etc).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#issuecomment-1465176066

   Thanks again @Jefffrey 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1131524917


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -174,7 +174,26 @@ impl LogicalPlan {
         }
     }
 
+    /// Used for normalizing columns, as the fallback schemas to the main schema
+    /// of the plan.
+    pub fn fallback_normalize_schemas(&self) -> Vec<&DFSchema> {
+        match self {
+            LogicalPlan::Window(_)
+            | LogicalPlan::Projection(_)
+            | LogicalPlan::Aggregate(_)
+            | LogicalPlan::Unnest(_)
+            | LogicalPlan::Join(_)
+            | LogicalPlan::CrossJoin(_) => self
+                .inputs()
+                .iter()
+                .map(|input| input.schema().as_ref())
+                .collect(),
+            _ => vec![],
+        }
+    }

Review Comment:
   maybe @houqp  can provide some historical context about why all schemas is needed (maybe it was for UNIONs or JOINs?)
   
   It may also be for a limitation that no longer exists.
   
   cc @waynexia  who I think may also have hit this recently



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "mingmwang (via GitHub)" <gi...@apache.org>.

mingmwang commented on code in PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1133495404


##########
datafusion/expr/src/utils.rs:
##########
@@ -96,41 +96,7 @@ pub fn expr_to_columns(expr: &Expr, accum: &mut HashSet<Column>) -> Result<()> {
             Expr::ScalarVariable(_, var_names) => {
                 accum.insert(Column::from_name(var_names.join(".")));
             }
-            Expr::Alias(_, _)
-            | Expr::Literal(_)
-            | Expr::BinaryExpr { .. }
-            | Expr::Like { .. }
-            | Expr::ILike { .. }
-            | Expr::SimilarTo { .. }
-            | Expr::Not(_)
-            | Expr::IsNotNull(_)
-            | Expr::IsNull(_)
-            | Expr::IsTrue(_)
-            | Expr::IsFalse(_)
-            | Expr::IsUnknown(_)
-            | Expr::IsNotTrue(_)
-            | Expr::IsNotFalse(_)
-            | Expr::IsNotUnknown(_)
-            | Expr::Negative(_)
-            | Expr::Between { .. }
-            | Expr::Case { .. }
-            | Expr::Cast { .. }
-            | Expr::TryCast { .. }
-            | Expr::Sort { .. }
-            | Expr::ScalarFunction { .. }
-            | Expr::ScalarUDF { .. }
-            | Expr::WindowFunction { .. }
-            | Expr::AggregateFunction { .. }
-            | Expr::GroupingSet(_)
-            | Expr::AggregateUDF { .. }
-            | Expr::InList { .. }
-            | Expr::Exists { .. }
-            | Expr::InSubquery { .. }
-            | Expr::ScalarSubquery(_)
-            | Expr::Wildcard
-            | Expr::QualifiedWildcard { .. }
-            | Expr::GetIndexedField { .. }
-            | Expr::Placeholder { .. } => {}
+            _ => {}
         }

Review Comment:
   I would suggest keeping the explicit pattern matching instead of a default implementation.  In future someone might add new Expr types and it is safe to use  explicit pattern matching and force the committer to check all the places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#issuecomment-1464897079

   Thank you @ygf11  for the review
   
   I plan to merge this PR tomorrow unless anyone else would like more time to review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

Posted by "mingmwang (via GitHub)" <gi...@apache.org>.

mingmwang commented on code in PR #5509:
URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1134838590


##########
datafusion/sql/src/select.rs:
##########
@@ -241,33 +236,26 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         plan: LogicalPlan,
         outer_query_schema: Option<&DFSchema>,
         planner_context: &mut PlannerContext,
-        from_schema: &DFSchema,
     ) -> Result<LogicalPlan> {
         match selection {
             Some(predicate_expr) => {
                 let mut join_schema = (**plan.schema()).clone();
-                let mut all_schemas: Vec<DFSchemaRef> = vec![];
-                for schema in plan.all_schemas() {
-                    all_schemas.push(schema.clone());
-                }
-                if let Some(outer) = outer_query_schema {
-                    all_schemas.push(Arc::new(outer.clone()));
+
+                let fallback_schemas = plan.fallback_normalize_schemas();
+                let outer_query_schema = if let Some(outer) = outer_query_schema {
                     join_schema.merge(outer);
-                }
-                let x: Vec<&DFSchemaRef> = all_schemas.iter().collect();
+                    vec![outer]
+                } else {
+                    vec![]
+                };
 

Review Comment:
   The join schema logic is problematic. I will fix it in other PR.  The filter_expr might contains 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org