You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "nikolamand-db (via GitHub)" <gi...@apache.org> on 2024/02/26 11:42:33 UTC

[PR] [SPARK-47147][PYTHON] Fix Pyspark collated string conversion error [spark]

nikolamand-db opened a new pull request, #45257:
URL: https://github.com/apache/spark/pull/45257

### What changes were proposed in this pull request?

When running Pyspark shell in non-Spark Connect mode, query `SELECT 'abc' COLLATE 'UCS_BASIC_LCASE'` produces the following error:
```
AssertionError: Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 'string(UCS_BASIC_LCASE)'}"}
```
Fix the error by updating `StringType` class to include collation id support. Additional changes ensure conversions work properly.

### Why are the changes needed?

To fix collations-related error in PySpark shell.

### Does this PR introduce _any_ user-facing change?

Yes, the described sql query no longer throws an exception.

### How was this patch tested?

Manually tested with PySpark shell, additional assertions are added to type tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org