You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Simeon Simeonov (JIRA)" <ji...@apache.org> on 2016/03/21 23:17:25 UTC
[jira] [Created] (SPARK-14048) Aggregation operations on structs
fail when the structs have fields with special characters
Simeon Simeonov created SPARK-14048:
---------------------------------------
Summary: Aggregation operations on structs fail when the structs have fields with special characters
Key: SPARK-14048
URL: https://issues.apache.org/jira/browse/SPARK-14048
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.6.0
Environment: Databricks w/ 1.6.0
Reporter: Simeon Simeonov
Consider a schema where a struct has field names with special characters, e.g.,
{code}
|-- st: struct (nullable = true)
| |-- x.y: long (nullable = true)
{code}
Schema such as these are frequently generated by the JSON schema generator, which seems to never want to map JSON data to {{MapType}} always preferring to use {{StructType}}.
In SparkSQL, referring to these fields requires backticks, e.g., {{st.`x.y`}}. There is no problem manipulating these structs unless one is using an aggregation function. It seems that, under the covers, the code is not escaping fields with special characters correctly.
For example,
{code}
select first(st) as st from tbl group by something
{code}
generates
{code}
org.apache.spark.sql.catalyst.util.DataTypeException: Unsupported dataType: struct<x.y:bigint>. If you have a struct and a field name of it has any special characters, please use backticks (`) to quote that field name, e.g. `x+y`. Please note that backtick itself is not supported in a field name.
at org.apache.spark.sql.catalyst.util.DataTypeParser$class.toDataType(DataTypeParser.scala:100)
at org.apache.spark.sql.catalyst.util.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:112)
at org.apache.spark.sql.catalyst.util.DataTypeParser$.parse(DataTypeParser.scala:116)
at org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:884)
at com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:395)
at com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:394)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at com.databricks.backend.daemon.driver.OutputAggregator$.toJsonSchema(OutputAggregator.scala:394)
at com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:122)
at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:82)
at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:42)
at com.databricks.backend.daemon.driver.DriverLocal.executeSql(DriverLocal.scala:306)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:161)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
at scala.util.Try$.apply(Try.scala:161)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:464)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:365)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:196)
at java.lang.Thread.run(Thread.java:745)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org