You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "shrprasa (via GitHub)" <gi...@apache.org> on 2023/03/02 20:18:33 UTC

[GitHub] [spark] shrprasa opened a new pull request, #40258: [WIP][SPARK-42655]:Incorrect ambiguous column reference error

shrprasa opened a new pull request, #40258:
URL: https://github.com/apache/spark/pull/40258

   Incorrect ambiguous column reference error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] srowen commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1458138622

   I'm not sure about the change, not sure I'm qualified to review it. I think at best the error message should change; I am not clear that the result is 'wrong'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] srowen commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1466139560

   That's a "no" from me, per the logic above


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL]:Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1457585690

   Gentle Ping @srowen  @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] srowen commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461314871

   Hm, I just don't see the logic in that. It isn't how SQL works either, as far as I understand. Here's maybe another example, imagine a DataFrame defined by `SELECT 3 as id, 3 as ID`. Would you also say selecting "id" is unambiguous? and it makes sense to you if I change a 3 to a 4 that this query is no longer semantically valid?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491283077

   according to the [code in 2.3](https://github.com/apache/spark/blob/branch-2.3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L190), I think we should call `distinct` in line 345


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480217186

   Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please review this PR or direct it to someone who can review this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482756439

   I think case 1 works by accident. It's not an intentional design. I don't think it's a bug that case 2 doesn't work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1486232865

   @cloud-fan Can you please check my last comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] srowen commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461279768

   Hm, how is it not ambiguous? When case insensitive, 'id' could mean one of two different columns 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461262849

   > I don't get it, it is due to case sensitivity; that's why it becomes ambiguous and that's what you see. The issue is that the error isn't super helpful because it shows the lower-cased column right? that's what I was saying. Or: does your change still result in an error without case sensitivity? it should
   
   The issue is not with the error message. Problem is that in this case error should not be thrown. Select query should return result.  After this change, ambiguous error will not be thrown as we are fixing the duplicate attribute match. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480469070

   I second @srowen ‘s view. cc @cloud-fan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480574606

   > df3.select("id").show()
   
   @cloud-fan The example you have shared will behave the same even after this fix. It will give ambiguous error. 
   The use case which the fix is trying to solve is different. Can you please try these two cases:
   Case 1: which works fine
   val df1 = sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", "col5")
   val op_cols_same_case = List("id","col2","col3","col4", "col5", "id")
   val df3 = df1.select(op_cols_same_case.head, op_cols_same_case.tail: _*)
   df3.select("id").show()  
   
   Case 2: which doesn't work fine and the fix is to solve this issue
   val df2 = sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", "col5")
   val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID")
   val df4 = df2.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*)
   df4.select("id").show()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on a diff in pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on code in PR #40258:
URL: https://github.com/apache/spark/pull/40258#discussion_r1147356838


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala:
##########
@@ -258,7 +258,7 @@ package object expressions  {
         case (Seq(), _) =>
           val name = nameParts.head
           val attributes = collectMatches(name, direct.get(name.toLowerCase(Locale.ROOT)))
-          (attributes.filterNot(_.qualifiedAccessOnly), nameParts.tail)
+          (attributes.distinct.filterNot(_.qualifiedAccessOnly), nameParts.tail)

Review Comment:
   The unique method is not used in this flow.  It's used at many places while returning the result. Making any changes to unique will increase the scope.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1465517104

   Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please review this PR or direct it to someone who is aware of this code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1459668923

   > You first defined a case-sensitive data set, then queried in a case-insensitive way, I guess the error is expected.
   
   In the physical plan, both id and ID columns are projected to the same column in the dataframe: _1#6
   _1#6 AS id#17,  _1#6 AS ID#17
   So, there is no ambiguity,
   
   Also, in the matched attributes, results are same: attributes: Vector(id#17, id#17)
   Just because, we have duplicates in the matched result, it's being considered as ambiguous.
   
   If the matched attribute result was Vector(id#17, ID#17) , then it would have been valid error.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1459535564

   Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461303721

   It's very much relevant as this is the only case which requires the fix. If they do not come from same source, the plan  will reflect that and it will throw the ambiguous error even after this fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480574647

   @shrprasa 
   At the dataset definition phase, especially for intermediate datasets, Spark is lenient/lazy with case sensitivity. This is because the checks happen in SQL Analyzing, which is not required for defining a Dataset. This gives the user more freedom but also cognitive disorders. On the other hand, in the read phase, SQL Analyzing is a mandatory step, and checks will be performed, so the configuration provided by Spark at this stage is sufficient to resolve all ambiguities.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1496311839

   Thanks a lot @cloud-fan for the guidance and support in getting this issue fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491954910

   > If you really worry about regression, we can add a legacy config to fall back to the old code. I don't agree to make code changes that only fix the problem in one particular code path, while we know other code paths have the same problem as well.
   
   Ok, I will update the PR with suggested change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491886464

   If you really worry about regression, we can add a legacy config to fall back to the old code. I don't agree to make code changes that only fix the problem in one particular code path, while we know other code paths have the same problem as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan closed pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error
URL: https://github.com/apache/spark/pull/40258


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] srowen commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461291351

   That isn't relevant. You are selecting from a DataFrame with cols id and ID. Imagine for instance they do not come from the same source, it's clearly ambiguous. It wouldn't make sense if it were different in this case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1459654900

   You first defined a case-sensitive data set, then queried in a case-insensitive way, I guess the error is expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480595916

   > @shrprasa do you know how the case 1 works?
   
   yes. It works because the resolved column has just one match 
   attributes: Vector(id#17)
   
   but for second case, the match result is
   attributes: Vector(id#17, id#17)
   Since, there are more than one value although both are exactly same, it fails. This fix proposes to fix this by taking distinct values of match result.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480589254

   @shrprasa do you know how the case 1 works?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40258:
URL: https://github.com/apache/spark/pull/40258#discussion_r1147252040


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala:
##########
@@ -258,7 +258,7 @@ package object expressions  {
         case (Seq(), _) =>
           val name = nameParts.head
           val attributes = collectMatches(name, direct.get(name.toLowerCase(Locale.ROOT)))
-          (attributes.filterNot(_.qualifiedAccessOnly), nameParts.tail)
+          (attributes.distinct.filterNot(_.qualifiedAccessOnly), nameParts.tail)

Review Comment:
   shall we fix `def unique` in this class? It should look at expr Id.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL]:Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1454348175

   @srowen @dongjoon-hyun Can you please review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1458159825

   > I'm not sure about the change, not sure I'm qualified to review it. I think at best the error message should change; I am not clear that the result is 'wrong'
   
   Thanks for replying. Can you please tag someone who should be right person to review this change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461288036

   > Hm, how is it not ambiguous? When case insensitive, 'id' could mean one of two different columns
   
   It's not ambiguous because the  when we are selecting using list of column names, both id and ID are getting value from same column 'id' in the source dataframe. 
   val **df1** = sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("**id"**,"col2","col3","col4", "col5")
   val op_cols_mixed_case = List(**"id"**,"col2","col3","col4", "col5", **"ID"**)
   val df3 = **df1**.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*)
   df3.select("id").show()
   
   df3.explain()
   == Physical Plan ==
   *(1) Project [**_1#6 AS id#17**, _2#7 AS col2#18, _3#8 AS col3#19, _4#9 AS col4#20, _5#10 AS col5#21, **_1#6 AS ID#17**]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1476233946

   Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please review this PR or direct it to someone who can review this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1495354346

   Gentle ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1495954231

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-42655][SQL] Incorrect ambiguous column reference error [spark]

Posted by "bsikander (via GitHub)" <gi...@apache.org>.

bsikander commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1826881477

   @shrprasa do you think this issue is similar to the issue that i just posted: https://stackoverflow.com/questions/77553257/select-behavior-different-between-pyspark-2-4-8-and-3-3-2
   
   Trying to understand the behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491280956

   > FWIW Both the use cases were working fine in Spark 2.3
   
   Sorry I missed this point. Do you know how it worked in 2.3? Did 2.3 also call `distinct` before returning the result?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482368131

   > > It works because the resolved column has just one match
   > 
   > But there are two id columns. Does Spark already do deduplication somewhere?
   
   Not sure about the deduplication before, but even if it was doing it at some stage, in the second use case it might not have converted the column name to lowercase by that time, that's why that would still treat the two id and ID columns as different.
   Only at end result of column match, we see that both column matches are same id#17. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1459648316

   Can you try `set spark.sql.caseSensitive=true`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1463230011

   > Hm, I just don't see the logic in that. It isn't how SQL works either, as far as I understand. Here's maybe another example, imagine a DataFrame defined by `SELECT 3 as id, 3 as ID`. Would you also say selecting "id" is unambiguous? and it makes sense to you if I change a 3 to a 4 that this query is no longer semantically valid?
   
   If it's valid as per the plan then yes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [WIP][SPARK-42655]:Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1453913043

   @srowen  Please ignore that change. It was work in progress to check few things. 
   The reason why we get ambiguous error in below scenario and why it's not correct is the result of attribute resolution returns  
   two values but both values are same. Thus, it should not throw ambiguous error.
   
   val df1 = sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", "col5")
   val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID")
   val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*)
   df3.select("id").show()
   org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could be: id, id.
   
   df3.explain()
   == Physical Plan ==
   *(1) Project [_1#6 AS id#17, _2#7 AS col2#18, _3#8 AS col3#19, _4#9 AS col4#20, _5#10 AS col5#21, _1#6 AS ID#17]
   
   Before the fix, attributes matched were:
   attributes: Vector(id#17, id#17)
   Thus, it throws ambiguous reference error. But if we consider only unique matches, it will return correct result.
   unique attributes: Vector(id#17)
   
   
       /** Map to use for direct case insensitive attribute lookups. */
       @transient private lazy val direct: Map[String, Seq[Attribute]] = {
         unique(attrs.groupBy(_.name.toLowerCase(Locale.ROOT)))
       }
   
   has value Vector(id#17, col2#18, col3#19, col4#20, col5#21, **ID**#17) but it should be  Vector(id#17, col2#18, col3#19, col4#20, col5#21, **id**#17)
   
   The key used for lookup is being considered as case insensitive but the values itself are case sensitive.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482326317

   > It works because the resolved column has just one match
   
   But there are two id columns. Does Spark already do deduplication somewhere?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482819187

   FWIW Both the use cases were working fine in Spark 2.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482792058

   > I think case 1 works by accident. It's not an intentional design. I don't think it's a bug that case 2 doesn't work.
   
   As I had said in previous comment :
   Not sure about the deduplication before, but even if it was doing it at some stage, in the second use case it might not have converted the column name to lowercase by that time, that's why that would still treat the two id and ID columns as different.
   Only at end result of column match, we see that both column matches are same id#17.
   The speculation was right. Dedup is happening in unique method.
   
   For case 1: 
   unique before:: Map(col3 -> Vector(col3#18571), col2 -> Vector(col2#18570), id -> Vector(id#18569, id#18569), col5 -> Vector(col5#18573), col4 -> Vector(col4#18572))
   after before:: Map(col3 -> Vector(col3#18571), col2 -> Vector(col2#18570), id -> Vector(id#18569), col5 -> Vector(col5#18573), col4 -> Vector(col4#18572))
   
   For Case 2: 
   unique before:: Map(col3 -> Vector(col3#18610), col2 -> Vector(col2#18609), id -> Vector(id#18608, ID#18608), col5 -> Vector(col5#18612), col4 -> Vector(col4#18611))
   after before:: Map(col3 -> Vector(col3#18610), col2 -> Vector(col2#18609), id -> Vector(id#18608, ID#18608), col5 -> Vector(col5#18612), col4 -> Vector(col4#18611))
   
   Most of the places we are calling unique before returning the result. So what' the negative impact you think it will have if we return unique results for the column match also?
   
   One positive use case is it will fix this wrong ambiguous error being thrown just because the result of match has two duplicate values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1459652942

   > Can you try `set spark.sql.caseSensitive=true`?
   
   Yes, I have tried it. With caseSensitive set to false, it will work as then id and ID will be treated as separate columns.
   Issue is when columns names are supposed to considered  as case insensitive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] srowen commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1460147096

   I don't get it, it is due to case sensitivity; that's why it becomes ambiguous and that's what you see. The issue is that the error isn't super helpful because it shows the lower-cased column right? that's what I was saying. Or: does your change still result in an error without case sensitivity? it should


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1466394445

   > That's a "no" from me, per the logic above
   
   Thanks @srowen But seems I am not able to explain the change to you. So it's better to get review from someone who is qualified to review the change and aware of this code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480540953

   I think column resolution should only look at one level, to make the behavior simple and predictable. I tried it on pgsql and it fails as well:
   ```
   create table t(i int);
   select id from (select i as id, i as ID from t) sub;
   ERROR: column reference "id" is ambiguous Position: 8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480532873

   > I second @srowen ‘s view. cc @cloud-fan
   
   Thanks @yaooqinn for replying. Can you please explain why you think it's not the right fix? 
   The fix only proposes to remove duplicates from the resolved columns. As it's incorrect to consider the only one column match as ambiguous just because it occurs more than once in the resolved column list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491795763

   
   
   
   
   > according to the [code in 2.3](https://github.com/apache/spark/blob/branch-2.3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L190), I think we should call `distinct` in line 345
   
   @cloud-fan 
   Yes, that should also work, but making it there will increase the impact of change to lot more other scenarios.
   Whereas the place where I have made distinct keeps the scope very limited.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

Posted by "shrprasa (via GitHub)" <gi...@apache.org>.

shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1492817996

   @cloud-fan I have made the change. All Tests have passed. Can you please review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org