You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/19 09:15:49 UTC

[GitHub] [spark] TJX2014 opened a new pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

TJX2014 opened a new pull request #29156:
URL: https://github.com/apache/spark/pull/29156


   ### What changes were proposed in this pull request?
   Add in CTE hint resolve in `org.apache.spark.sql.catalyst.analysis.ResolveHints.ResolveJoinStrategyHints#apply`
   Add a UT in `org.apache.spark.sql.test.SQLTestUtils#test`
   
   ### Why are the changes needed?
   Branch 2.4, when resolve CTE in `org.apache.spark.sql.catalyst.analysis.Analyzer.CTESubstitution`, we have a chance `executeSameContext` to apply all rules to CTE include hint resolve.
   Branch 3.0, because `CTESubstitution` is moved to a separated class, we miss the feature as follow:
   `
   scala> sql("create temporary view t as select 1 as id")
   res0: org.apache.spark.sql.DataFrame = []
   
   scala> sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte")
   org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input columns: [cte.id]; line 1 pos 59;
   'Project ['id]
   +- SubqueryAlias cte
      +- Project [id#0]
         +- SubqueryAlias t
            +- Project [1 AS id#0]
               +- OneRowRelation
   `
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Unit test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-662255460


   > cannot resolve 'id' given input columns: [cte.id];
   
   Do you know how this happens? The `id` column is there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660615709


   cc @viirya @maropu @LantaoJin 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660614461


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on a change in pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on a change in pull request #29156:
URL: https://github.com/apache/spark/pull/29156#discussion_r456968128



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##########
@@ -2558,6 +2558,14 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi
       }
     }
   }
+
+  test("SPARK-32347: cte hint should be resolved in Hints batch rule") {
+    withTempView("t") {
+      sql("create temporary view t as select 1 as id")
+      sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte")

Review comment:
       @maropu Sure, we can see in branch 2.4, the hint of cte could be resolved in `CTESubstitution`, we have a chance `executeSameContext` to apply all rules to CTE include hint resolve, but in branch 3.0, the `CTESubstitution` is reparated, and the `executeSameContext` is belong to `Analyzer`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 edited a comment on pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 edited a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-662487204


   > > cannot resolve 'id' given input columns: [cte.id];
   > 
   > Do you know how this happens? The `id` column is there.
   
   Because the hint resolve ignore hint in CTE, and the `id` in cte is remained unresolved. While apply `CTESubstitution`, in branch 2.4, we have a chance to executeSameContext of Analyzer to apply all rules to CTE include hint resolve.
   but in Branch 3.0, because CTESubstitution is moved to a separated class, we miss the feature 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660614618


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660615298


   [FOLLOW](https://github.com/apache/spark/pull/29062)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29156:
URL: https://github.com/apache/spark/pull/29156#discussion_r456966108



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##########
@@ -2558,6 +2558,14 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi
       }
     }
   }
+
+  test("SPARK-32347: cte hint should be resolved in Hints batch rule") {
+    withTempView("t") {
+      sql("create temporary view t as select 1 as id")
+      sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte")

Review comment:
       Just in case, could you check that the hist is correctly applied?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660614461


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 closed pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 closed pull request #29156:
URL: https://github.com/apache/spark/pull/29156


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 removed a comment on pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 removed a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660725499


   > Hi, @TJX2014 . Please run `dev/scalastyle` and fix the Scala style violation please. As you see on GitHub Action result, everything is failing.
   
   @dongjoon-hyun  Thanks for your suggestion, I have corrected it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-662487204


   > > cannot resolve 'id' given input columns: [cte.id];
   > 
   > Do you know how this happens? The `id` column is there.
   
   Because the hint resolve ignore hint in CTE, and the `id` in cte is remained unresolved. While apply `CTESubstitution`, in branch 2.4, we have a chance to executeSameContext of Analyzer to apply all rules to CTE include hint resolve.
   and in Branch 3.0, because CTESubstitution is moved to a separated class, we miss the feature 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660693782


   Hi, @TJX2014 . Please run `dev/scalastyle` and fix the Scala style violation please. As you see on GitHub Action result, everything is failing. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 edited a comment on pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 edited a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-662487204


   > > cannot resolve 'id' given input columns: [cte.id];
   > 
   > Do you know how this happens? The `id` column is there.
   
   Because the hint resolve ignore hint in CTE, and the `id` in cte is remained unresolved. While apply `CTESubstitution`, in branch 2.4, we have a chance to executeSameContext of Analyzer to apply all rules to CTE include hint resolve.
   but in Branch 3.0, because `CTESubstitution` is moved to a separated class, sowe miss the feature.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660725499


   > Hi, @TJX2014 . Please run `dev/scalastyle` and fix the Scala style violation please. As you see on GitHub Action result, everything is failing.
   
   @dongjoon-hyun  Thanks for your suggestion, I have corrected it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 edited a comment on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
TJX2014 edited a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660615298


   [FOLLOW https://github.com/apache/spark/pull/29062](https://github.com/apache/spark/pull/29062)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29156:
URL: https://github.com/apache/spark/pull/29156#discussion_r458941369



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
##########
@@ -165,6 +165,24 @@ object ResolveHints {
           hintErrorHandler.hintRelationsNotFound(h.name, h.parameters, unmatchedIdents)
           applied
         }
+      case With(child, relations) => resolveCTEHint(child,
+        relations.foldLeft(Seq.empty[(String, LogicalPlan)]) {
+        case (resolved, (name, relation)) =>
+          resolved :+ name -> apply(resolveCTEHint(relation, resolved))
+      })
+    }
+
+    def resolveCTEHint(plan: LogicalPlan, cteRelations: Seq[(String, LogicalPlan)]): LogicalPlan = {
+      plan resolveOperatorsDown {
+        case u: UnresolvedRelation =>
+          cteRelations.find(x => resolver(x._1, u.tableName)).map(_._2).getOrElse(u)

Review comment:
       Have you looked at https://github.com/apache/spark/pull/29062 ? Seems easier to just run CTE substitution in the very beginning.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on a change in pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on a change in pull request #29156:
URL: https://github.com/apache/spark/pull/29156#discussion_r459126932



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
##########
@@ -165,6 +165,24 @@ object ResolveHints {
           hintErrorHandler.hintRelationsNotFound(h.name, h.parameters, unmatchedIdents)
           applied
         }
+      case With(child, relations) => resolveCTEHint(child,
+        relations.foldLeft(Seq.empty[(String, LogicalPlan)]) {
+        case (resolved, (name, relation)) =>
+          resolved :+ name -> apply(resolveCTEHint(relation, resolved))
+      })
+    }
+
+    def resolveCTEHint(plan: LogicalPlan, cteRelations: Seq[(String, LogicalPlan)]): LogicalPlan = {
+      plan resolveOperatorsDown {
+        case u: UnresolvedRelation =>
+          cteRelations.find(x => resolver(x._1, u.tableName)).map(_._2).getOrElse(u)

Review comment:
       Yes,  `put the CTE substitution in the very beginning` is an easier way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 edited a comment on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
TJX2014 edited a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660615298


   [Same issue but different patch way:https://github.com/apache/spark/pull/29062](https://github.com/apache/spark/pull/29062)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on a change in pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on a change in pull request #29156:
URL: https://github.com/apache/spark/pull/29156#discussion_r456968128



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##########
@@ -2558,6 +2558,14 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi
       }
     }
   }
+
+  test("SPARK-32347: cte hint should be resolved in Hints batch rule") {
+    withTempView("t") {
+      sql("create temporary view t as select 1 as id")
+      sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte")

Review comment:
       @maropu Sure, we can see in branch 2.4, the hint of cte could be resolved in `CTESubstitution`, we have a chance `executeSameContext` to apply all rules to CTE include hint resolve, but in branch 3.0, the `CTESubstitution` is reparated, and the `executeSameContext` is belong to `Analyzer`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 edited a comment on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
TJX2014 edited a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660615298


   Same issue but different patch way: [ https://github.com/apache/spark/pull/29062](https://github.com/apache/spark/pull/29062)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 edited a comment on pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 edited a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-662487204


   > > cannot resolve 'id' given input columns: [cte.id];
   > 
   > Do you know how this happens? The `id` column is there.
   
   Because the hint resolve ignore hint in CTE, and the `id` in cte is remained unresolved. While apply `CTESubstitution`, in branch 2.4, we have a chance to executeSameContext of Analyzer to apply all rules to CTE include hint resolve.
   but in Branch 3.0, because `CTESubstitution` is moved to a separated class, we miss the feature 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 edited a comment on pull request #29156: [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0

Posted by GitBox <gi...@apache.org>.
TJX2014 edited a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-660615298


   Same issue but different patch way:[https://github.com/apache/spark/pull/29062](https://github.com/apache/spark/pull/29062)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LantaoJin commented on a change in pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
LantaoJin commented on a change in pull request #29156:
URL: https://github.com/apache/spark/pull/29156#discussion_r457002916



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
##########
@@ -165,6 +165,24 @@ object ResolveHints {
           hintErrorHandler.hintRelationsNotFound(h.name, h.parameters, unmatchedIdents)
           applied
         }
+      case With(child, relations) => resolveCTEHint(child,

Review comment:
       This patch looks like it is specifically designed to fix `With` comparing to #29062. I am open to more comments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on a change in pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on a change in pull request #29156:
URL: https://github.com/apache/spark/pull/29156#discussion_r458842763



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
##########
@@ -165,6 +165,24 @@ object ResolveHints {
           hintErrorHandler.hintRelationsNotFound(h.name, h.parameters, unmatchedIdents)
           applied
         }
+      case With(child, relations) => resolveCTEHint(child,
+        relations.foldLeft(Seq.empty[(String, LogicalPlan)]) {
+        case (resolved, (name, relation)) =>
+          resolved :+ name -> apply(resolveCTEHint(relation, resolved))
+      })
+    }
+
+    def resolveCTEHint(plan: LogicalPlan, cteRelations: Seq[(String, LogicalPlan)]): LogicalPlan = {
+      plan resolveOperatorsDown {
+        case u: UnresolvedRelation =>
+          cteRelations.find(x => resolver(x._1, u.tableName)).map(_._2).getOrElse(u)

Review comment:
       This branch will occur `stackoverflow` when cte table name is same as table in cte as follows:
   sql("create temporary view t as select 1 as id")      
   sql("with **t** as (select /*+ BROADCAST(id) */ id from **t**) select id from t")
   @cloud-fan  Could you please help me find a way to pass this ? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 commented on a change in pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 commented on a change in pull request #29156:
URL: https://github.com/apache/spark/pull/29156#discussion_r456969527



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##########
@@ -2558,6 +2558,14 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi
       }
     }
   }
+
+  test("SPARK-32347: cte hint should be resolved in Hints batch rule") {
+    withTempView("t") {
+      sql("create temporary view t as select 1 as id")
+      sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte")

Review comment:
       Thank you, I need to check the hist, seems something wrong with the patch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] TJX2014 edited a comment on pull request #29156: [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule

Posted by GitBox <gi...@apache.org>.
TJX2014 edited a comment on pull request #29156:
URL: https://github.com/apache/spark/pull/29156#issuecomment-662487204


   > > cannot resolve 'id' given input columns: [cte.id];
   > 
   > Do you know how this happens? The `id` column is there.
   
   Because the hint resolve ignore hint in CTE, and the `id` in cte is remained unresolved. While apply `CTESubstitution`, in branch 2.4, we have a chance to executeSameContext of Analyzer to apply all rules to CTE include hint resolve.
   But in Branch 3.0, because `CTESubstitution` is moved to a separated class, so we miss the feature.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org