You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Alexander Shkapsky (JIRA)" <ji...@apache.org> on 2018/06/05 17:03:00 UTC

[jira] [Created] (SPARK-24469) Support collations in Spark SQL

Alexander Shkapsky created SPARK-24469:
------------------------------------------

             Summary: Support collations in Spark SQL
                 Key: SPARK-24469
                 URL: https://issues.apache.org/jira/browse/SPARK-24469
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Alexander Shkapsky


One of our use cases is to support case-insensitive comparison in operations, including aggregation and text comparison filters.  Another use case is to sort via collator.  Support for collations throughout the query processor appear to be the proper way to support these needs.

Language-based worked arounds (for the aggregation case) are insufficient:
 # SELECT UPPER(text)....GROUP BY UPPER(text)
introduces invalid values into the output set
 # SELECT MIN(text)...GROUP BY UPPER(text) 
results in poor performance in our case, in part due to use of sort-based aggregate

Examples of collation support in RDBMS:
 * [PostgreSQL|https://www.postgresql.org/docs/10/static/collation.html]
 * [MySQL|https://dev.mysql.com/doc/refman/8.0/en/charset.html]
 * [Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/linguistic-sorting-and-matching.html]
 * [SQL Server|https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-2017]
 * [DB2|https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.nls.doc/com.ibm.db2.luw.admin.nls.doc-gentopic2.html] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org