You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/08 05:44:19 UTC

[jira] [Resolved] (SPARK-24469) Support collations in Spark SQL

     [ https://issues.apache.org/jira/browse/SPARK-24469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24469.
----------------------------------
    Resolution: Incomplete

> Support collations in Spark SQL
> -------------------------------
>
>                 Key: SPARK-24469
>                 URL: https://issues.apache.org/jira/browse/SPARK-24469
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Alexander Shkapsky
>            Priority: Major
>              Labels: bulk-closed
>
> One of our use cases is to support case-insensitive comparison in operations, including aggregation and text comparison filters.  Another use case is to sort via collator.  Support for collations throughout the query processor appear to be the proper way to support these needs.
> Language-based worked arounds (for the aggregation case) are insufficient:
>  # SELECT UPPER(text)....GROUP BY UPPER(text)
> introduces invalid values into the output set
>  # SELECT MIN(text)...GROUP BY UPPER(text) 
> results in poor performance in our case, in part due to use of sort-based aggregate
> Examples of collation support in RDBMS:
>  * [PostgreSQL|https://www.postgresql.org/docs/10/static/collation.html]
>  * [MySQL|https://dev.mysql.com/doc/refman/8.0/en/charset.html]
>  * [Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/linguistic-sorting-and-matching.html]
>  * [SQL Server|https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-2017]
>  * [DB2|https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.nls.doc/com.ibm.db2.luw.admin.nls.doc-gentopic2.html] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org