You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alexander Shkapsky (JIRA)" <ji...@apache.org> on 2018/06/05 17:03:00 UTC
[jira] [Created] (SPARK-24469) Support collations in Spark SQL
Alexander Shkapsky created SPARK-24469:
------------------------------------------
Summary: Support collations in Spark SQL
Key: SPARK-24469
URL: https://issues.apache.org/jira/browse/SPARK-24469
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 2.3.0
Reporter: Alexander Shkapsky
One of our use cases is to support case-insensitive comparison in operations, including aggregation and text comparison filters. Another use case is to sort via collator. Support for collations throughout the query processor appear to be the proper way to support these needs.
Language-based worked arounds (for the aggregation case) are insufficient:
# SELECT UPPER(text)....GROUP BY UPPER(text)
introduces invalid values into the output set
# SELECT MIN(text)...GROUP BY UPPER(text)
results in poor performance in our case, in part due to use of sort-based aggregate
Examples of collation support in RDBMS:
* [PostgreSQL|https://www.postgresql.org/docs/10/static/collation.html]
* [MySQL|https://dev.mysql.com/doc/refman/8.0/en/charset.html]
* [Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/linguistic-sorting-and-matching.html]
* [SQL Server|https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-2017]
* [DB2|https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.nls.doc/com.ibm.db2.luw.admin.nls.doc-gentopic2.html]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org