You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/26 18:47:52 UTC

[GitHub] [spark] karenfeng opened a new pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

karenfeng opened a new pull request #31666:
URL: https://github.com/apache/spark/pull/31666


   ### What changes were proposed in this pull request?
   
   Adds the duplicated common columns as hidden columns to the Projection used to rewrite NATURAL/USING JOINs.
   Built off https://github.com/apache/spark/pull/31654.
   
   ### Why are the changes needed?
   
   Allows users to resolve either side of the NATURAL/USING JOIN's common keys.
   Previously, the user could only resolve the following columns:
   
   | Join type | Left key columns | Right key columns |
   | --- | --- | --- |
   | Inner | Yes | No |
   | Left | Yes | No |
   | Right | No | Yes |
   | Outer | No | No |
   
   This fix applies in SQL but does not apply in Scala; this seems to be related to the metadata column framework in the DSv2 API.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. The user can now symmetrically resolve the common columns from a NATURAL/USING JOIN.
   
   ### How was this patch tested?
   
   SQL-side tests. The behavior matches PostgreSQL and MySQL.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814468676


   **[Test build #136967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136967/testReport)** for PR 31666 at commit [`66ad572`](https://github.com/apache/spark/commit/66ad572bee29e606d2c9e85db28c5a35f1d2d022).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814569507


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41556/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-793237058


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40470/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814521635






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815273505


   **[Test build #137034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137034/testReport)** for PR 31666 at commit [`0fe04a2`](https://github.com/apache/spark/commit/0fe04a2e014d90da13d5f2a8d4a456836b1ba6ad).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815453184


   **[Test build #137058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137058/testReport)** for PR 31666 at commit [`b1bf28d`](https://github.com/apache/spark/commit/b1bf28d9ca413cf2dee2957e5d6d8eeb4c9a4f6e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809817643


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41255/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-790718286


   > > This fix applies in SQL but does not apply in Scala; this seems to be related to the metadata column framework in the DSv2 API.
   > 
   > This is still true? I think your previous bug fix PR solved it. We can add some tests to verify it (even if it fails, we need to show people the behavior of the Scala API)
   
   Whoops, I forgot to change the PR description. This no longer holds. Thanks for the catch @cloud-fan!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810652054


   **[Test build #136735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136735/testReport)** for PR 31666 at commit [`f5cc3ae`](https://github.com/apache/spark/commit/f5cc3ae0db9538083c5efb27f3641c4dddec5faa).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810652768


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136735/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587962728



##########
File path: sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-join.sql.out
##########
@@ -1790,19 +1790,19 @@ SELECT udf(udf('')) AS `xxx`, udf(i), udf(j), udf(t), udf(k)
 -- !query schema
 struct<xxx:string,udf(i):int,udf(j):int,udf(t):string,udf(k):int>
 -- !query output
+	8	8	eight	NULL

Review comment:
       I'm not sure why these changed...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-817004458


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137149/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587661561



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -979,7 +979,7 @@ class Analyzer(override val catalogManager: CatalogManager)
    *
    * References to metadata columns are resolved using columns from [[LogicalPlan.metadataOutput]],
    * but the relation's output does not include the metadata columns until the relation is replaced
-   * using [[DataSourceV2Relation.withMetadataColumns()]]. Unless this rule adds metadata to the
+   * with a copy adding them to the output. Unless this rule adds metadata to the relation's output,
    * relation's output, the analyzer will detect that nothing produces the columns.

Review comment:
       nit: `relation's output,` is repeated twice.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818075808


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41815/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608876724



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -77,6 +77,14 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  override def metadataOutput: Seq[Attribute] =
+    child.metadataOutput.filter(_.isHiddenCol) ++

Review comment:
       To fix this, we could say that hidden columns are only propagated through `Project` with a `Star` in the `projectList`. Ideally, we'd only propagate the hidden columns that match the Star's qualifier, but I'm not sure if that's possible at that stage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-816834288


   **[Test build #137149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137149/testReport)** for PR 31666 at commit [`8c5144e`](https://github.com/apache/spark/commit/8c5144eccd4014ce51e4b2624784448349d4f081).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809811391


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41255/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810489764


   **[Test build #136735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136735/testReport)** for PR 31666 at commit [`f5cc3ae`](https://github.com/apache/spark/commit/f5cc3ae0db9538083c5efb27f3641c4dddec5faa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r604265109



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1968,7 +1986,7 @@ class Analyzer(override val catalogManager: CatalogManager)
     resolveExpression(
       expr,
       resolveColumnByName = nameParts => {
-        plan.resolve(nameParts, resolver)
+        plan.resolve(nameParts, resolver, withMetadata = false)

Review comment:
       I was thinking about this too - I can make a symmetric parameter for `resolveExpressionByPlanChildren`. This change likely also has a wider breadth than needed; primarily, we don't want `ResolveMissingReferences` to touch metadata as that should be resolved in `AddMetadataCol`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-817004458


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137149/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814584695


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41558/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r612501598



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
##########
@@ -201,4 +201,30 @@ package object util extends Logging {
   def truncatedString[T](seq: Seq[T], sep: String, maxFields: Int): String = {
     truncatedString(seq, "", sep, "", maxFields)
   }
+
+  val METADATA_COL_ATTR_KEY = "__metadata_col"
+
+  /**
+   * Hidden columns are a type of metadata column that are candidates during qualified star
+   * star expansions. They are propagated through Projects that have hidden children output,
+   * so that nested hidden output is not lost.
+   */
+  val HIDDEN_COL_ATTR_KEY = "__hidden_col"

Review comment:
       The semantic is clear now, let's refine the naming.
   
   We only have metadata column, and metadata column can be included in qualified star if required. We can just add a new property to metadata columns to indicate it.
   
   The property name can be `__support_qualified_star`, and the helper class can be
   ```
   implicit class MetadataColumnHelper(attr: Attribute) {
     def isMetadataCol: Boolean ...
     def supportQualifiedStar: Boolean ...
     def markAsSupportQualifiedStar: Attribute ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791063793


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40359/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818001686


   **[Test build #137232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137232/testReport)** for PR 31666 at commit [`333a815`](https://github.com/apache/spark/commit/333a81550ea6e20b1243fb642529e1fceb9853e6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786862129


   **[Test build #135521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135521/testReport)** for PR 31666 at commit [`2c261bb`](https://github.com/apache/spark/commit/2c261bb8979cf6187eec3d9d02872bc75fdccab5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r604165763



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -999,41 +999,31 @@ class Analyzer(override val catalogManager: CatalogManager)
    * Adds metadata columns to output for child relations when nodes are missing resolved attributes.
    *
    * References to metadata columns are resolved using columns from [[LogicalPlan.metadataOutput]],
-   * but the relation's output does not include the metadata columns until the relation is replaced
-   * using [[DataSourceV2Relation.withMetadataColumns()]]. Unless this rule adds metadata to the
-   * relation's output, the analyzer will detect that nothing produces the columns.
+   * but the relation's output does not include the metadata columns until the relation is replaced.
+   * Unless this rule adds metadata to the relation's output, the analyzer will detect that nothing
+   * produces the columns.
    *
    * This rule only adds metadata columns when a node is resolved but is missing input from its
    * children. This ensures that metadata columns are not added to the plan unless they are used. By
    * checking only resolved nodes, this ensures that * expansion is already done so that metadata
-   * columns are not accidentally selected by *.
+   * columns are not accidentally selected by *. This rule resolves operators downwards to avoid
+   * projecting away metadata columns prematurely.
    */
   object AddMetadataColumns extends Rule[LogicalPlan] {
-    import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
 
-    private def hasMetadataCol(plan: LogicalPlan): Boolean = {
-      plan.expressions.exists(_.find {
-        case a: Attribute => a.isMetadataCol
-        case _ => false
-      }.isDefined)
-    }
+    import org.apache.spark.sql.catalyst.util._
 
-    private def addMetadataCol(plan: LogicalPlan): LogicalPlan = plan match {
-      case r: DataSourceV2Relation => r.withMetadataColumns()
-      case _ => plan.withNewChildren(plan.children.map(addMetadataCol))
-    }
-
-    def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
-      case node if node.children.nonEmpty && node.resolved && hasMetadataCol(node) =>
+    def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDown {
+      // Add metadata output to all node types
+      case node if node.children.nonEmpty && node.resolved && node.missingInput.nonEmpty &&

Review comment:
       let's not call `missingInput` here as it has perf issues, see https://github.com/apache/spark/pull/31440




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587669432



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -76,6 +77,13 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  val hiddenOutputTag: TreeNodeTag[Seq[Attribute]] = TreeNodeTag[Seq[Attribute]]("hiddenOutput")
+
+  override def metadataOutput: Seq[Attribute] = {
+    child.metadataOutput ++
+      getTagValue(hiddenOutputTag).getOrElse(Seq.empty[Attribute])

Review comment:
       It's unfortunate that we need to use `TreeNodeTag` to store the extra information in `Project`, but I don't have a better idea without changing the `Project` constructor.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818210168


   **[Test build #137232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137232/testReport)** for PR 31666 at commit [`333a815`](https://github.com/apache/spark/commit/333a81550ea6e20b1243fb642529e1fceb9853e6).
    * This patch passes all tests.
    * This patch **does not merge cleanly**.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789325615


   **[Test build #135674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135674/testReport)** for PR 31666 at commit [`6fa70ba`](https://github.com/apache/spark/commit/6fa70ba286a40d065c9581e29b738f28f12593b5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `public class JavaModelSelectionViaRandomHyperparametersExample `
     * `class GangliaSink(`
     * `case class Limits[T: Numeric](x: T, y: T)`
     * `abstract class Generator[T: Numeric] `
     * `class ParamRandomBuilder extends ParamGridBuilder `
     * `class ParamRandomBuilder(ParamGridBuilder):`
     * `case class Product(child: Expression)`
     * `case class AnalyzeTables(`
     * `case class AnalyzeTablesCommand(`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608857259



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -77,6 +77,14 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  override def metadataOutput: Seq[Attribute] =
+    child.metadataOutput.filter(_.isHiddenCol) ++

Review comment:
       This was used in `natural-join.sql`:
   ```
   SELECT nt1.*, nt2.*, nt3.* FROM nt1 natural join nt2 natural join nt3;
   ```
   Which would otherwise fail to expand `nt2.*` to include the key column.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815299424


   **[Test build #137035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137035/testReport)** for PR 31666 at commit [`c7c3df6`](https://github.com/apache/spark/commit/c7c3df63592ad31c61e5a4c684a8d9aa3906f5e5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815627510


   **[Test build #137058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137058/testReport)** for PR 31666 at commit [`b1bf28d`](https://github.com/apache/spark/commit/b1bf28d9ca413cf2dee2957e5d6d8eeb4c9a4f6e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-792494995


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40436/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811535839


   **[Test build #136785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136785/testReport)** for PR 31666 at commit [`07f9ad5`](https://github.com/apache/spark/commit/07f9ad583127c19662384e3f1c31ae29cb444583).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-793271716


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40470/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789201558


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40256/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-787030190


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40112/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815318289






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809812284


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41255/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810536502


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789337070


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135674/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r586649729



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala
##########
@@ -94,6 +94,8 @@ trait AnalysisHelper extends QueryPlan[LogicalPlan] { self: LogicalPlan =>
             rule.applyOrElse(afterRuleOnChildren, identity[LogicalPlan])
           }
         }
+        newNode.copyTagsFrom(this)

Review comment:
       This exists in `transformUp`, but not in `resolveOperatorsUp` -  was the difference intentional or unintentional? Without the tags, the metadata cannot be resolved properly (`isMetadataCol` is always false).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818001686


   **[Test build #137232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137232/testReport)** for PR 31666 at commit [`333a815`](https://github.com/apache/spark/commit/333a81550ea6e20b1243fb642529e1fceb9853e6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814470949


   **[Test build #136967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136967/testReport)** for PR 31666 at commit [`66ad572`](https://github.com/apache/spark/commit/66ad572bee29e606d2c9e85db28c5a35f1d2d022).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814584223






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814468676


   **[Test build #136967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136967/testReport)** for PR 31666 at commit [`66ad572`](https://github.com/apache/spark/commit/66ad572bee29e606d2c9e85db28c5a35f1d2d022).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811579710


   **[Test build #136785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136785/testReport)** for PR 31666 at commit [`07f9ad5`](https://github.com/apache/spark/commit/07f9ad583127c19662384e3f1c31ae29cb444583).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814584695


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41558/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818309618


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137235/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815298259


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41612/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815238075


   **[Test build #137032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137032/testReport)** for PR 31666 at commit [`c84f396`](https://github.com/apache/spark/commit/c84f396e3427ffceb2891e2b1dc4b30ddc2239d0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-790454584


   > This fix applies in SQL but does not apply in Scala; this seems to be related to the metadata column framework in the DSv2 API.
   
   This is still true? I think your previous bug fix PR solved it. We can add some tests to verify it (even if it fails, we need to show people the behavior of the Scala API)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587665859



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -76,6 +77,13 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  val hiddenOutputTag: TreeNodeTag[Seq[Attribute]] = TreeNodeTag[Seq[Attribute]]("hiddenOutput")

Review comment:
       We shouldn't define this in a class. How about putting it in the `package object util`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814569493


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41556/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818035674


   **[Test build #137235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137235/testReport)** for PR 31666 at commit [`49de5c5`](https://github.com/apache/spark/commit/49de5c582d32216513cb8ecfb5a7a095f8aaa2a1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810536528


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r609044074



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -958,6 +947,34 @@ class Analyzer(override val catalogManager: CatalogManager)
           }
         }
     }
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      plan.expressions.flatMap(_.collect {
+        case a: Attribute if a.isMetadataCol => a
+        case a: Attribute if childMetadataOutput.exists(_.exprId == a.exprId) =>
+          childMetadataOutput.find(_.exprId == a.exprId).get
+      })
+    }
+
+    private def hasMetadataCol(plan: LogicalPlan): Boolean = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      val hasMetaCol = plan.expressions.exists(_.find {
+        case a: Attribute =>
+          a.isMetadataCol || childMetadataOutput.exists(_.exprId == a.exprId)

Review comment:
       I added a comment for this to clarify as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815394270


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r609044340



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -77,6 +77,14 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  override def metadataOutput: Seq[Attribute] =
+    child.metadataOutput.filter(_.isHiddenCol) ++

Review comment:
       I changed this so hidden output are only propagated if a Project already has hidden output. This allows us to propagate hidden output through nested NATURAL/USING JOINs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789182513


   **[Test build #135674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135674/testReport)** for PR 31666 at commit [`6fa70ba`](https://github.com/apache/spark/commit/6fa70ba286a40d065c9581e29b738f28f12593b5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r612490514



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -957,6 +946,36 @@ class Analyzer(override val catalogManager: CatalogManager)
           }
         }
     }
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)

Review comment:
       nit: we can avoid building a new `Seq` frequently. The check can be 
   `plan.children.exists(c => c.metadataOutput.exists(_.exprId == a.exprId))`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-792490190


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40436/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786987999


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40108/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811579868


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136785/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809759063


   **[Test build #136671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136671/testReport)** for PR 31666 at commit [`73b7c8a`](https://github.com/apache/spark/commit/73b7c8a5c7203985ebf3299ca9e600dffca76a6f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818994454


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41881/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815385556


   **[Test build #137034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137034/testReport)** for PR 31666 at commit [`0fe04a2`](https://github.com/apache/spark/commit/0fe04a2e014d90da13d5f2a8d4a456836b1ba6ad).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815478466


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41636/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587667257



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -3370,54 +3435,6 @@ class Analyzer(override val catalogManager: CatalogManager)
     }
   }
 
-  private def commonNaturalJoinProcessing(

Review comment:
       I can move it back - I just wasn't sure why it lived outside of this class, given that it's not shared.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786983806


   **[Test build #135527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135527/testReport)** for PR 31666 at commit [`80beda8`](https://github.com/apache/spark/commit/80beda814ea7a0cadca441cf820501df596db3dc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-790973413






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587588428



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
##########
@@ -477,4 +477,26 @@ class DataFrameJoinSuite extends QueryTest
 
     checkAnswer(df3.except(df4), Row(10, 50, 2, Row(10, 50, 2)))
   }
+
+  test("SPARK-34527: Resolve common columns from USING JOIN") {
+    val joinDf = testData2.as("testData2").join(
+      testData3.as("testData3"), usingColumns = Seq("a"), joinType = "fullouter")
+    val dfQuery = joinDf.select(
+      $"a", $"testData2.a", $"testData2.b", $"testData3.a", $"testData3.b")
+    val dfQuery2 = joinDf.select(

Review comment:
       These demonstrate that the behavior now works in Scala.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815298644


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41612/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587667895



##########
File path: sql/core/src/test/resources/sql-tests/inputs/natural-join.sql
##########
@@ -1,15 +1,22 @@
 create temporary view nt1 as select * from values
-  ("one", 1),
-  ("two", 2),
-  ("three", 3)
-  as nt1(k, v1);
+    ("one", 1),
+    ("two", 2),
+    ("three", 3)
+    as nt1(k, v1);

Review comment:
       nit: in Spark we use 2 spaces indentation everywhere.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786967747


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135521/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809795765


   **[Test build #136673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136673/testReport)** for PR 31666 at commit [`9fd2490`](https://github.com/apache/spark/commit/9fd24904317c38744314e54bbff00c6c94ac3477).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-787046955


   **[Test build #135531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135531/testReport)** for PR 31666 at commit [`e1719d3`](https://github.com/apache/spark/commit/e1719d30d7893da8991a78eb6e3fa098d65334ae).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-819100570


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818309618


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137235/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818308840


   **[Test build #137235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137235/testReport)** for PR 31666 at commit [`49de5c5`](https://github.com/apache/spark/commit/49de5c582d32216513cb8ecfb5a7a095f8aaa2a1).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class RuleId(id: Int) `
     * `abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with TreePatternBits `
     * `trait TreePatternBits `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r588813152



##########
File path: sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-join.sql.out
##########
@@ -1790,19 +1790,19 @@ SELECT udf(udf('')) AS `xxx`, udf(i), udf(j), udf(t), udf(k)
 -- !query schema
 struct<xxx:string,udf(i):int,udf(j):int,udf(t):string,udf(k):int>
 -- !query output
+	8	8	eight	NULL

Review comment:
       No, the values didn't change, but the order did.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-787030190


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40112/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791042669


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40359/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786967129


   **[Test build #135521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135521/testReport)** for PR 31666 at commit [`2c261bb`](https://github.com/apache/spark/commit/2c261bb8979cf6187eec3d9d02872bc75fdccab5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `sealed trait PartitionSpec extends LeafExpression with Unevaluable `
     * `trait V2PartitionCommand extends Command `
     * `case class TruncateTable(table: LogicalPlan) extends Command `
     * `case class TruncatePartition(`
     * `case class TruncatePartitionExec(`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814568522


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136971/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818075808


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41815/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811559620


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41368/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811559605


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41368/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814662881


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786989061


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40108/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786986859


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40108/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789196175


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40256/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786989061


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40108/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786983806


   **[Test build #135527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135527/testReport)** for PR 31666 at commit [`80beda8`](https://github.com/apache/spark/commit/80beda814ea7a0cadca441cf820501df596db3dc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r605088737



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1727,7 +1744,7 @@ class Analyzer(override val catalogManager: CatalogManager)
 
       case q: LogicalPlan =>
         logTrace(s"Attempting to resolve ${q.simpleString(conf.maxToStringFields)}")
-        q.mapExpressions(resolveExpressionByPlanChildren(_, q))
+        q.mapExpressions(resolveExpressionByPlanChildren(_, q, withMetadata = true))

Review comment:
       The output ordering inconsistency was due to the `Sort` rule in `ResolveReferences` resolving with metadata columns (which doesn't fall back to this case), when it should have waited for the `Sort` rule in `ResolveMissingReferences`. We still need to be able to resolve metadata columns somewhere.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-792473332


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135854/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-793231829


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135887/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815453184


   **[Test build #137058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137058/testReport)** for PR 31666 at commit [`b1bf28d`](https://github.com/apache/spark/commit/b1bf28d9ca413cf2dee2957e5d6d8eeb4c9a4f6e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818435185


   **[Test build #137253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137253/testReport)** for PR 31666 at commit [`9e62d7d`](https://github.com/apache/spark/commit/9e62d7d3aa90906d5c8883c73dd71f81e319d97f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809794804


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41253/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r586652486



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -988,31 +988,43 @@ class Analyzer(override val catalogManager: CatalogManager)
    * columns are not accidentally selected by *.
    */
   object AddMetadataColumns extends Rule[LogicalPlan] {
-    import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
+    import org.apache.spark.sql.catalyst.util._
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      plan.expressions.flatMap(_.collect {
+        case a: Attribute if a.isMetadataCol => a
+        case a: Attribute if childMetadataOutput.exists(_.exprId == a.exprId) =>

Review comment:
       This occurs in the case that a column is resolved below the level at which it becomes labeled as metadata. For the NATURAL/USING JOIN, this occurs when the column is resolved at the level of the root table - it is only labeled as hidden when it is used as a key column in the join.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789205255


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40256/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809885231


   **[Test build #136673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136673/testReport)** for PR 31666 at commit [`9fd2490`](https://github.com/apache/spark/commit/9fd24904317c38744314e54bbff00c6c94ac3477).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r603656867



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1968,7 +1986,7 @@ class Analyzer(override val catalogManager: CatalogManager)
     resolveExpression(
       expr,
       resolveColumnByName = nameParts => {
-        plan.resolve(nameParts, resolver)
+        plan.resolve(nameParts, resolver, withMetadata = false)

Review comment:
       This change fixes the UDF ordering issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608299765



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -77,6 +77,14 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  override def metadataOutput: Seq[Attribute] =
+    child.metadataOutput.filter(_.isHiddenCol) ++

Review comment:
       This is needed to propagate hidden columns through nested Projections (eg. multiple natural joins).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-817003546


   **[Test build #137149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137149/testReport)** for PR 31666 at commit [`8c5144e`](https://github.com/apache/spark/commit/8c5144eccd4014ce51e4b2624784448349d4f081).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class KnownFloatingPointNormalized(child: Expression) extends TaggingExpression `
     * `case class Acos(child: Expression) extends UnaryMathExpression(math.acos, \"ACOS\") `
     * `case class Asin(child: Expression) extends UnaryMathExpression(math.asin, \"ASIN\") `
     * `case class Atan(child: Expression) extends UnaryMathExpression(math.atan, \"ATAN\") `
     * `case class Cbrt(child: Expression) extends UnaryMathExpression(math.cbrt, \"CBRT\") `
     * `case class Cos(child: Expression) extends UnaryMathExpression(math.cos, \"COS\") `
     * `case class Cosh(child: Expression) extends UnaryMathExpression(math.cosh, \"COSH\") `
     * `case class Log10(child: Expression) extends UnaryLogExpression(StrictMath.log10, \"LOG10\") `
     * `case class Signum(child: Expression) extends UnaryMathExpression(math.signum, \"SIGNUM\") `
     * `case class Sin(child: Expression) extends UnaryMathExpression(math.sin, \"SIN\") `
     * `case class Sinh(child: Expression) extends UnaryMathExpression(math.sinh, \"SINH\") `
     * `case class Sqrt(child: Expression) extends UnaryMathExpression(math.sqrt, \"SQRT\") `
     * `case class Tan(child: Expression) extends UnaryMathExpression(math.tan, \"TAN\") `
     * `case class Tanh(child: Expression) extends UnaryMathExpression(math.tanh, \"TANH\") `
     * `case class DeleteAction(condition: Option[Expression]) extends MergeAction `
     * `case class UpdateStarAction(condition: Option[Expression]) extends MergeAction `
     * `case class InsertStarAction(condition: Option[Expression]) extends MergeAction `
     * `case class RefreshTable(child: LogicalPlan) extends UnaryCommand `
     * `case class CommentOnNamespace(child: LogicalPlan, comment: String) extends UnaryCommand `
     * `case class CommentOnTable(child: LogicalPlan, comment: String) extends UnaryCommand `
     * `case class RefreshFunction(child: LogicalPlan) extends UnaryCommand `
     * `case class DescribeFunction(child: LogicalPlan, isExtended: Boolean) extends UnaryCommand `
     * `case class RecoverPartitions(child: LogicalPlan) extends UnaryCommand `
     * `case class SetCommand(kv: Option[(String, Option[String])])`
     * `case class ResetCommand(config: Option[String]) extends LeafRunnableCommand with IgnoreCachedData `
     * `case class AddJarCommand(path: String) extends LeafRunnableCommand `
     * `case class AddFileCommand(path: String) extends LeafRunnableCommand `
     * `case class AddArchiveCommand(path: String) extends LeafRunnableCommand `
     * `case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) extends LeafRunnableCommand `
     * `case class ListJarsCommand(jars: Seq[String] = Seq.empty[String]) extends LeafRunnableCommand `
     * `case class ListArchivesCommand(archives: Seq[String] = Seq.empty[String])`
     * `abstract class DescribeCommandBase extends LeafRunnableCommand `
     * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809885979


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136673/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-816834288


   **[Test build #137149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137149/testReport)** for PR 31666 at commit [`8c5144e`](https://github.com/apache/spark/commit/8c5144eccd4014ce51e4b2624784448349d4f081).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818966176


   **[Test build #137301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137301/testReport)** for PR 31666 at commit [`8f70c2d`](https://github.com/apache/spark/commit/8f70c2d416077716d0c346d6f7c5eea7ab6a81e5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791063793


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40359/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818995707


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41881/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608861880



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -77,6 +77,14 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  override def metadataOutput: Seq[Attribute] =
+    child.metadataOutput.filter(_.isHiddenCol) ++

Review comment:
       Unfortunately, this also breaks the ordering in `udf/postgreSQL/udf-join.sql`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608685633



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -958,6 +947,34 @@ class Analyzer(override val catalogManager: CatalogManager)
           }
         }
     }
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      plan.expressions.flatMap(_.collect {
+        case a: Attribute if a.isMetadataCol => a
+        case a: Attribute if childMetadataOutput.exists(_.exprId == a.exprId) =>
+          childMetadataOutput.find(_.exprId == a.exprId).get
+      })
+    }
+
+    private def hasMetadataCol(plan: LogicalPlan): Boolean = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      val hasMetaCol = plan.expressions.exists(_.find {
+        case a: Attribute =>
+          a.isMetadataCol || childMetadataOutput.exists(_.exprId == a.exprId)

Review comment:
       This means that a metadata column may not have the special metadata key. How can this happen?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -958,6 +947,34 @@ class Analyzer(override val catalogManager: CatalogManager)
           }
         }
     }
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      plan.expressions.flatMap(_.collect {
+        case a: Attribute if a.isMetadataCol => a
+        case a: Attribute if childMetadataOutput.exists(_.exprId == a.exprId) =>
+          childMetadataOutput.find(_.exprId == a.exprId).get
+      })
+    }
+
+    private def hasMetadataCol(plan: LogicalPlan): Boolean = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      val hasMetaCol = plan.expressions.exists(_.find {
+        case a: Attribute =>
+          a.isMetadataCol || childMetadataOutput.exists(_.exprId == a.exprId)

Review comment:
       This means that a metadata column attribute may not have the special metadata key. How can this happen?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-819000251


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41881/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815366089


   **[Test build #137032 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137032/testReport)** for PR 31666 at commit [`c84f396`](https://github.com/apache/spark/commit/c84f396e3427ffceb2891e2b1dc4b30ddc2239d0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815413212


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815374759


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814568522


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136971/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587667381



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
##########
@@ -193,4 +193,29 @@ package object util extends Logging {
   def truncatedString[T](seq: Seq[T], sep: String, maxFields: Int): String = {
     truncatedString(seq, "", sep, "", maxFields)
   }
+
+  val METADATA_COL_ATTR_KEY = "__metadata_col"
+  implicit class MetadataColumnHelper(attr: Attribute) {
+    def isMetadataCol: Boolean = attr.metadata.contains(METADATA_COL_ATTR_KEY) &&
+      attr.metadata.getBoolean(METADATA_COL_ATTR_KEY)
+  }
+
+  /**
+   * Hidden columns are a type of metadata column that are not propagated through subquery aliases,
+   * and are candidates during qualified star expansions.
+   */
+  val HIDDEN_COL_ATTR_KEY = "__hidden_col"
+  implicit class HiddenColumnHelper(attr: Attribute) {

Review comment:
       how about using only one implicit class?
   ```
   implicit class SpecialColumnHelper(attr: Attribute) {
     def isMetadataCol: Boolean = attr.metadata.contains(METADATA_COL_ATTR_KEY) &&
       attr.metadata.getBoolean(METADATA_COL_ATTR_KEY)
     def isHiddenCol: Boolean = ...
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818087554


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41812/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-816869520


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41728/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818335266






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818087706


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41812/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818087706


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41812/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815638115


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137058/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608844411



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -958,6 +947,34 @@ class Analyzer(override val catalogManager: CatalogManager)
           }
         }
     }
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      plan.expressions.flatMap(_.collect {
+        case a: Attribute if a.isMetadataCol => a
+        case a: Attribute if childMetadataOutput.exists(_.exprId == a.exprId) =>
+          childMetadataOutput.find(_.exprId == a.exprId).get
+      })
+    }
+
+    private def hasMetadataCol(plan: LogicalPlan): Boolean = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      val hasMetaCol = plan.expressions.exists(_.find {
+        case a: Attribute =>
+          a.isMetadataCol || childMetadataOutput.exists(_.exprId == a.exprId)

Review comment:
       This occurs in the case that a column is resolved below the level at which it becomes labeled as metadata. For the NATURAL/USING JOIN, this happens when the column is resolved at the level of the root table (this seems to only happen for DataFrames) - it is only labeled as hidden when it is used as a key column in the join.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815266454






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815298666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41612/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791134157


   **[Test build #135777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135777/testReport)** for PR 31666 at commit [`0c116a5`](https://github.com/apache/spark/commit/0c116a59aff69ce4aca4c504099abeef600f1e55).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809788922


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41253/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r588814167



##########
File path: sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-join.sql.out
##########
@@ -1790,19 +1790,19 @@ SELECT udf(udf('')) AS `xxx`, udf(i), udf(j), udf(t), udf(k)
 -- !query schema
 struct<xxx:string,udf(i):int,udf(j):int,udf(t):string,udf(k):int>
 -- !query output
+	8	8	eight	NULL

Review comment:
       This seems wrong given that 8 should definitely not come before 4, and there is an ORDER BY term. I can't quite see how my changes triggered this though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818449379


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137253/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587665050



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala
##########
@@ -94,6 +94,8 @@ trait AnalysisHelper extends QueryPlan[LogicalPlan] { self: LogicalPlan =>
             rule.applyOrElse(afterRuleOnChildren, identity[LogicalPlan])
           }
         }
+        newNode.copyTagsFrom(this)

Review comment:
       I think it's a mistake.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-787010014


   **[Test build #135531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135531/testReport)** for PR 31666 at commit [`e1719d3`](https://github.com/apache/spark/commit/e1719d30d7893da8991a78eb6e3fa098d65334ae).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786905407


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40102/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-819280634


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814683677


   **[Test build #136981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136981/testReport)** for PR 31666 at commit [`85b81b1`](https://github.com/apache/spark/commit/85b81b1558188ba86fb3a7442da34addbb44aa9c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791134699


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135777/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814470982


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136967/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r605281233



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1727,7 +1744,7 @@ class Analyzer(override val catalogManager: CatalogManager)
 
       case q: LogicalPlan =>
         logTrace(s"Attempting to resolve ${q.simpleString(conf.maxToStringFields)}")
-        q.mapExpressions(resolveExpressionByPlanChildren(_, q))
+        q.mapExpressions(resolveExpressionByPlanChildren(_, q, withMetadata = true))

Review comment:
       I got rid of this by cherry-picking over the changes in https://github.com/apache/spark/pull/32017, which this PR is blocked on.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608690135



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
##########
@@ -201,4 +201,29 @@ package object util extends Logging {
   def truncatedString[T](seq: Seq[T], sep: String, maxFields: Int): String = {
     truncatedString(seq, "", sep, "", maxFields)
   }
+
+  val METADATA_COL_ATTR_KEY = "__metadata_col"
+
+  /**
+   * Hidden columns are a type of metadata column that are not propagated through subquery aliases,
+   * and are candidates during qualified star expansions.

Review comment:
       Can we update the comments? It seems "not propagated through subquery aliases" is not a difference anymore. The new difference is "hidden column can be propagated through project".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814568251


   **[Test build #136981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136981/testReport)** for PR 31666 at commit [`85b81b1`](https://github.com/apache/spark/commit/85b81b1558188ba86fb3a7442da34addbb44aa9c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r611670279



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -3158,26 +3179,29 @@ class Analyzer(override val catalogManager: CatalogManager)
     val rUniqueOutput = right.output.filterNot(att => rightKeys.contains(att))
 
     // the output list looks like: join keys, columns from left, columns from right
-    val projectList = joinType match {
+    val (projectList, hiddenList) = joinType match {
       case LeftOuter =>
-        leftKeys ++ lUniqueOutput ++ rUniqueOutput.map(_.withNullability(true))
+        (leftKeys ++ lUniqueOutput ++ rUniqueOutput.map(_.withNullability(true)), rightKeys)
       case LeftExistence(_) =>
-        leftKeys ++ lUniqueOutput
+        (leftKeys ++ lUniqueOutput, Seq.empty)
       case RightOuter =>
-        rightKeys ++ lUniqueOutput.map(_.withNullability(true)) ++ rUniqueOutput
+        (rightKeys ++ lUniqueOutput.map(_.withNullability(true)) ++ rUniqueOutput, leftKeys)
       case FullOuter =>
         // in full outer join, joinCols should be non-null if there is.
         val joinedCols = joinPairs.map { case (l, r) => Alias(Coalesce(Seq(l, r)), l.name)() }
-        joinedCols ++
+        (joinedCols ++
           lUniqueOutput.map(_.withNullability(true)) ++
-          rUniqueOutput.map(_.withNullability(true))
+          rUniqueOutput.map(_.withNullability(true)),
+          leftKeys ++ rightKeys)
       case _ : InnerLike =>
-        leftKeys ++ lUniqueOutput ++ rUniqueOutput
+        (leftKeys ++ lUniqueOutput ++ rUniqueOutput, rightKeys)
       case _ =>
         sys.error("Unsupported natural join type " + joinType)
     }
-    // use Project to trim unnecessary fields
-    Project(projectList, Join(left, right, joinType, newCondition, hint))
+    // use Project to hide duplicated common keys
+    val project = Project(projectList, Join(left, right, joinType, newCondition, hint))
+    project.setTagValue(Project.hiddenOutputTag, hiddenList.map(_.asHiddenCol()))

Review comment:
       I feel it's too hacky to change the hidden column propagation logic of `Project`. How about we do a manual propagation here?
   `hiddenList.map(_.asHiddenCol()) ++ project.child.metadataOutput.filter(is hidden column)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815638115


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137058/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r604265109



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1968,7 +1986,7 @@ class Analyzer(override val catalogManager: CatalogManager)
     resolveExpression(
       expr,
       resolveColumnByName = nameParts => {
-        plan.resolve(nameParts, resolver)
+        plan.resolve(nameParts, resolver, withMetadata = false)

Review comment:
       I was thinking about this too - I can make a symmetric parameter for `resolveExpressionByPlanChildren`. This change likely also has a wider breadth than needed; primarily, we don't want `ResolveMissingReferences` to touch metadata as that should be resolved in `AddMetadataCol`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815477861






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-793271716


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40470/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791054277


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40359/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786997421


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135527/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791134699


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135777/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-790973413






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587664326



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -3370,54 +3435,6 @@ class Analyzer(override val catalogManager: CatalogManager)
     }
   }
 
-  private def commonNaturalJoinProcessing(

Review comment:
       why do we move this method? It creates a lot of code diff and makes it harder to review.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-790849084






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-819099804


   **[Test build #137301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137301/testReport)** for PR 31666 at commit [`8f70c2d`](https://github.com/apache/spark/commit/8f70c2d416077716d0c346d6f7c5eea7ab6a81e5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class WriteToDataSourceV2(`
     * `case class WriteToMicroBatchDataSource(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814662881


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818335380


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41832/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809786772


   **[Test build #136671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136671/testReport)** for PR 31666 at commit [`73b7c8a`](https://github.com/apache/spark/commit/73b7c8a5c7203985ebf3299ca9e600dffca76a6f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-790849084






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818214748


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137232/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-793259890


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40470/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809794804


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41253/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786997380


   **[Test build #135527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135527/testReport)** for PR 31666 at commit [`80beda8`](https://github.com/apache/spark/commit/80beda814ea7a0cadca441cf820501df596db3dc).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811579868


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136785/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786905407


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40102/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818966176


   **[Test build #137301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137301/testReport)** for PR 31666 at commit [`8f70c2d`](https://github.com/apache/spark/commit/8f70c2d416077716d0c346d6f7c5eea7ab6a81e5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-787051387


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135531/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814529277


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41548/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815299424


   **[Test build #137035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137035/testReport)** for PR 31666 at commit [`c7c3df6`](https://github.com/apache/spark/commit/c7c3df63592ad31c61e5a4c684a8d9aa3906f5e5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818335380


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41832/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814701705


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136981/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-816869520


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41728/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815267181


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41610/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r604870924



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1727,7 +1744,7 @@ class Analyzer(override val catalogManager: CatalogManager)
 
       case q: LogicalPlan =>
         logTrace(s"Attempting to resolve ${q.simpleString(conf.maxToStringFields)}")
-        q.mapExpressions(resolveExpressionByPlanChildren(_, q))
+        q.mapExpressions(resolveExpressionByPlanChildren(_, q, withMetadata = true))

Review comment:
       Now this is the only place that can resolve to metadata columns. How does it solve the conflicts? Is it possible that the column we are resolving here can be from metadata columns or from `ResolveMissingReferences`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818035674


   **[Test build #137235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137235/testReport)** for PR 31666 at commit [`49de5c5`](https://github.com/apache/spark/commit/49de5c582d32216513cb8ecfb5a7a095f8aaa2a1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815374759


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814492189


   **[Test build #136971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136971/testReport)** for PR 31666 at commit [`fc3b16d`](https://github.com/apache/spark/commit/fc3b16df5291151cfd5c7a3f5407222868148317).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814492189


   **[Test build #136971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136971/testReport)** for PR 31666 at commit [`fc3b16d`](https://github.com/apache/spark/commit/fc3b16df5291151cfd5c7a3f5407222868148317).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r611837110



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -3158,26 +3179,29 @@ class Analyzer(override val catalogManager: CatalogManager)
     val rUniqueOutput = right.output.filterNot(att => rightKeys.contains(att))
 
     // the output list looks like: join keys, columns from left, columns from right
-    val projectList = joinType match {
+    val (projectList, hiddenList) = joinType match {
       case LeftOuter =>
-        leftKeys ++ lUniqueOutput ++ rUniqueOutput.map(_.withNullability(true))
+        (leftKeys ++ lUniqueOutput ++ rUniqueOutput.map(_.withNullability(true)), rightKeys)
       case LeftExistence(_) =>
-        leftKeys ++ lUniqueOutput
+        (leftKeys ++ lUniqueOutput, Seq.empty)
       case RightOuter =>
-        rightKeys ++ lUniqueOutput.map(_.withNullability(true)) ++ rUniqueOutput
+        (rightKeys ++ lUniqueOutput.map(_.withNullability(true)) ++ rUniqueOutput, leftKeys)
       case FullOuter =>
         // in full outer join, joinCols should be non-null if there is.
         val joinedCols = joinPairs.map { case (l, r) => Alias(Coalesce(Seq(l, r)), l.name)() }
-        joinedCols ++
+        (joinedCols ++
           lUniqueOutput.map(_.withNullability(true)) ++
-          rUniqueOutput.map(_.withNullability(true))
+          rUniqueOutput.map(_.withNullability(true)),
+          leftKeys ++ rightKeys)
       case _ : InnerLike =>
-        leftKeys ++ lUniqueOutput ++ rUniqueOutput
+        (leftKeys ++ lUniqueOutput ++ rUniqueOutput, rightKeys)
       case _ =>
         sys.error("Unsupported natural join type " + joinType)
     }
-    // use Project to trim unnecessary fields
-    Project(projectList, Join(left, right, joinType, newCondition, hint))
+    // use Project to hide duplicated common keys
+    val project = Project(projectList, Join(left, right, joinType, newCondition, hint))
+    project.setTagValue(Project.hiddenOutputTag, hiddenList.map(_.asHiddenCol()))

Review comment:
       Good call! Thanks @cloud-fan.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791028027


   **[Test build #135777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135777/testReport)** for PR 31666 at commit [`0c116a5`](https://github.com/apache/spark/commit/0c116a59aff69ce4aca4c504099abeef600f1e55).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815267181


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41610/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810531811


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-792473332


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135854/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818074574






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r604190714



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1968,7 +1986,7 @@ class Analyzer(override val catalogManager: CatalogManager)
     resolveExpression(
       expr,
       resolveColumnByName = nameParts => {
-        plan.resolve(nameParts, resolver)
+        plan.resolve(nameParts, resolver, withMetadata = false)

Review comment:
       This change is really hard to understand. Why it's only in `resolveExpressionByPlanOutput` but not `resolveExpressionByPlanChildren`? How does it resolve the conflicts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-816867295


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41728/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818309800


   **[Test build #137253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137253/testReport)** for PR 31666 at commit [`9e62d7d`](https://github.com/apache/spark/commit/9e62d7d3aa90906d5c8883c73dd71f81e319d97f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818449379


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137253/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-819100570


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814701705


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136981/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-819000251


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41881/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809759063


   **[Test build #136671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136671/testReport)** for PR 31666 at commit [`73b7c8a`](https://github.com/apache/spark/commit/73b7c8a5c7203985ebf3299ca9e600dffca76a6f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810652768


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136735/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818214748


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137232/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814470982


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136967/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815394270


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811535839


   **[Test build #136785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136785/testReport)** for PR 31666 at commit [`07f9ad5`](https://github.com/apache/spark/commit/07f9ad583127c19662384e3f1c31ae29cb444583).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809790646


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41253/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r604870924



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1727,7 +1744,7 @@ class Analyzer(override val catalogManager: CatalogManager)
 
       case q: LogicalPlan =>
         logTrace(s"Attempting to resolve ${q.simpleString(conf.maxToStringFields)}")
-        q.mapExpressions(resolveExpressionByPlanChildren(_, q))
+        q.mapExpressions(resolveExpressionByPlanChildren(_, q, withMetadata = true))

Review comment:
       Now this is the only place that can resolve to metadata columns. How does it solve the conflicts? Is it possible that the column we are resolving here can be from both metadata columns and from `ResolveMissingReferences`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r612493243



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
##########
@@ -201,4 +201,30 @@ package object util extends Logging {
   def truncatedString[T](seq: Seq[T], sep: String, maxFields: Int): String = {
     truncatedString(seq, "", sep, "", maxFields)
   }
+
+  val METADATA_COL_ATTR_KEY = "__metadata_col"
+
+  /**
+   * Hidden columns are a type of metadata column that are candidates during qualified star
+   * star expansions. They are propagated through Projects that have hidden children output,

Review comment:
       The comment needs update again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789337070


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135674/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-787010014


   **[Test build #135531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135531/testReport)** for PR 31666 at commit [`e1719d3`](https://github.com/apache/spark/commit/e1719d30d7893da8991a78eb6e3fa098d65334ae).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608690954



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -77,6 +77,14 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  override def metadataOutput: Seq[Attribute] =
+    child.metadataOutput.filter(_.isHiddenCol) ++

Review comment:
       Do we have an example query? It's a bit weird that metadata columns can be propagated through project.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815413212


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814569507


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41556/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r612490907



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -957,6 +946,36 @@ class Analyzer(override val catalogManager: CatalogManager)
           }
         }
     }
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)

Review comment:
       The same to `hasMetadataCol`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r588248661



##########
File path: sql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-join.sql.out
##########
@@ -1790,19 +1790,19 @@ SELECT udf(udf('')) AS `xxx`, udf(i), udf(j), udf(t), udf(k)
 -- !query schema
 struct<xxx:string,udf(i):int,udf(j):int,udf(t):string,udf(k):int>
 -- !query output
+	8	8	eight	NULL

Review comment:
       can we output `udf(udf(i)), udf(k), udf(t)` in the SELECT list, and check if their values are changed or not?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814567942


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41556/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810536528


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814553582


   **[Test build #136979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136979/testReport)** for PR 31666 at commit [`f665030`](https://github.com/apache/spark/commit/f665030c32af34b2d865d1c40e0da1bdc302741c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815319038


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41613/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809795765


   **[Test build #136673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136673/testReport)** for PR 31666 at commit [`9fd2490`](https://github.com/apache/spark/commit/9fd24904317c38744314e54bbff00c6c94ac3477).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814471519


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41544/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786997421


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135527/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815319038


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41613/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811559620


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41368/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815478466


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41636/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809885979


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136673/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815405904


   **[Test build #137035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137035/testReport)** for PR 31666 at commit [`c7c3df6`](https://github.com/apache/spark/commit/c7c3df63592ad31c61e5a4c684a8d9aa3906f5e5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #31666:
URL: https://github.com/apache/spark/pull/31666


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815238075


   **[Test build #137032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137032/testReport)** for PR 31666 at commit [`c84f396`](https://github.com/apache/spark/commit/c84f396e3427ffceb2891e2b1dc4b30ddc2239d0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814471508


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41544/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814553582


   **[Test build #136979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136979/testReport)** for PR 31666 at commit [`f665030`](https://github.com/apache/spark/commit/f665030c32af34b2d865d1c40e0da1bdc302741c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-811557169


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41368/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814529277


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41548/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814568234






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-793231829


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135887/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-818309800


   **[Test build #137253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137253/testReport)** for PR 31666 at commit [`9e62d7d`](https://github.com/apache/spark/commit/9e62d7d3aa90906d5c8883c73dd71f81e319d97f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-810489764


   **[Test build #136735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136735/testReport)** for PR 31666 at commit [`f5cc3ae`](https://github.com/apache/spark/commit/f5cc3ae0db9538083c5efb27f3641c4dddec5faa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789205255


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40256/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-792494995


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40436/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r587698038



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -76,6 +77,13 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan)
 
   override lazy val validConstraints: ExpressionSet =
     getAllValidConstraints(projectList)
+
+  val hiddenOutputTag: TreeNodeTag[Seq[Attribute]] = TreeNodeTag[Seq[Attribute]]("hiddenOutput")
+
+  override def metadataOutput: Seq[Attribute] = {
+    child.metadataOutput ++
+      getTagValue(hiddenOutputTag).getOrElse(Seq.empty[Attribute])

Review comment:
       We could make this more generic by adding this `LogicalPlan`'s `metadataOutput`, but that would complicate how we can add these hidden columns in `AddMetadataColumns`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-809817643


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41255/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815273505


   **[Test build #137034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137034/testReport)** for PR 31666 at commit [`0fe04a2`](https://github.com/apache/spark/commit/0fe04a2e014d90da13d5f2a8d4a456836b1ba6ad).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786862129


   **[Test build #135521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135521/testReport)** for PR 31666 at commit [`2c261bb`](https://github.com/apache/spark/commit/2c261bb8979cf6187eec3d9d02872bc75fdccab5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814471519


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41544/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-786967747


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135521/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-814655701


   **[Test build #136979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136979/testReport)** for PR 31666 at commit [`f665030`](https://github.com/apache/spark/commit/f665030c32af34b2d865d1c40e0da1bdc302741c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-792494964


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40436/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815298666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41612/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on a change in pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on a change in pull request #31666:
URL: https://github.com/apache/spark/pull/31666#discussion_r608850552



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -958,6 +947,34 @@ class Analyzer(override val catalogManager: CatalogManager)
           }
         }
     }
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      plan.expressions.flatMap(_.collect {
+        case a: Attribute if a.isMetadataCol => a
+        case a: Attribute if childMetadataOutput.exists(_.exprId == a.exprId) =>
+          childMetadataOutput.find(_.exprId == a.exprId).get
+      })
+    }
+
+    private def hasMetadataCol(plan: LogicalPlan): Boolean = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      val hasMetaCol = plan.expressions.exists(_.find {
+        case a: Attribute =>
+          a.isMetadataCol || childMetadataOutput.exists(_.exprId == a.exprId)

Review comment:
       The regression test for this is in `DataFrameJoinSuite`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-787051387


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135531/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-789182513


   **[Test build #135674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135674/testReport)** for PR 31666 at commit [`6fa70ba`](https://github.com/apache/spark/commit/6fa70ba286a40d065c9581e29b738f28f12593b5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-791028027


   **[Test build #135777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135777/testReport)** for PR 31666 at commit [`0c116a5`](https://github.com/apache/spark/commit/0c116a59aff69ce4aca4c504099abeef600f1e55).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org