You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "bersprockets (via GitHub)" <gi...@apache.org> on 2023/05/28 23:37:23 UTC

[GitHub] [spark] bersprockets opened a new pull request, #41353: [SPARK-43841][SQL] Handle candidate attributes with no prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`

bersprockets opened a new pull request, #41353:
URL: https://github.com/apache/spark/pull/41353

   ### What changes were proposed in this pull request?
   
   In `StringUtils#orderSuggestedIdentifiersBySimilarity`, handle the case where the candidate attributes have a mix of empty and non-empty prefixes.
   
   ### Why are the changes needed?
   
   The following query throws a `StringIndexOutOfBoundsException`:
   ```
   with v1 as (
    select * from values (1, 2) as (c1, c2)
   ),
   v2 as (
     select * from values (2, 3) as (c1, c2)
   )
   select v1.c1, v1.c2, v2.c1, v2.c2, b
   from v1
   full outer join v2
   using (c1);
   ```
   The query should fail anyway, since `b` refers to a non-existent column. But it should fail with a helpful error message, not with a `StringIndexOutOfBoundsException`.
   
   `StringUtils#orderSuggestedIdentifiersBySimilarity` assumes that a list of suggested attributes with a mix of prefixes will never have an attribute name with an empty prefix. But in this case it does (`c1` from the `coalesce` has no prefix, since it is not associated with any relation or subquery):
   ```
   +- 'Project [c1#5, c2#6, c1#7, c2#8, 'b]
      +- Project [coalesce(c1#5, c1#7) AS c1#9, c2#6, c2#8] <== c1#9 has no prefix, unlike c2#6 (v1.c2) or c2#8 (v2.c2)
         +- Join FullOuter, (c1#5 = c1#7)
            :- SubqueryAlias v1
            :  +- CTERelationRef 0, true, [c1#5, c2#6]
            +- SubqueryAlias v2
               +- CTERelationRef 1, true, [c1#7, c2#8]
   ```
   Because of this, `orderSuggestedIdentifiersBySimilarity` returns a sorted list of suggestions like this:
   ```
   ArrayBuffer(.c1, v1.c2, v2.c2)
   ```
   `UnresolvedAttribute.parseAttributeName` chokes on an attribute name that starts with a '.'.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #41353: [SPARK-43841][SQL] Handle candidate attributes with no prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #41353:
URL: https://github.com/apache/spark/pull/41353#issuecomment-1566657800

   +1, LGTM. Merging to master.
   Thank you, @bersprockets and @dongjoon-hyun @HyukjinKwon for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #41353: [SPARK-43841][SQL] Handle candidate attributes with no prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #41353: [SPARK-43841][SQL] Handle candidate attributes with no prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`
URL: https://github.com/apache/spark/pull/41353


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #41353: [SPARK-43841][SQL] Handle candidate attributes with no prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #41353:
URL: https://github.com/apache/spark/pull/41353#issuecomment-1566326055

   cc @rednaxelafx @cloud-fan FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org