You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nitay Joffe <ni...@actioniq.co> on 2015/03/10 21:51:03 UTC

Spark 1.3 SQL Type Parser Changes?

In Spark 1.2 I used to be able to do this:

scala>
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType("struct<int:bigint>")
res30: org.apache.spark.sql.catalyst.types.DataType =
StructType(List(StructField(int,LongType,true)))

That is, the name of a column can be a keyword like "int". This is no
longer the case in 1.3:

data-pipeline-shell> HiveTypeHelper.toDataType("struct<int:bigint>")
org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
failure: ``>'' expected but `int' found

struct<int:bigint>
       ^
        at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
        at
org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
        at
org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)

Note HiveTypeHelper is simply an object I load in to expose
HiveMetastoreTypes since it was made private. See
https://gist.github.com/nitay/460b41ed5fd7608507f5
<https://app.relateiq.com/r?c=chrome_gmail&url=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5&t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c>

This is actually a pretty big problem for us as we have a bunch of legacy
tables with column names like "timestamp". They work fine in 1.2, but now
everything throws in 1.3.

Any thoughts?

Thanks,
- Nitay
Founder & CTO

Re: Spark 1.3 SQL Type Parser Changes?

Posted by Yin Huai <yh...@databricks.com>.
Hi Nitay,

Can you try using backticks to quote the column name? Like
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(
"struct<`int`:bigint>")?

Thanks,

Yin

On Tue, Mar 10, 2015 at 2:43 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> Thanks for reporting.  This was a result of a change to our DDL parser
> that resulted in types becoming reserved words.  I've filled a JIRA and
> will investigate if this is something we can fix.
> https://issues.apache.org/jira/browse/SPARK-6250
>
> On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe <ni...@actioniq.co> wrote:
>
>> In Spark 1.2 I used to be able to do this:
>>
>> scala>
>> org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType("struct<int:bigint>")
>> res30: org.apache.spark.sql.catalyst.types.DataType =
>> StructType(List(StructField(int,LongType,true)))
>>
>> That is, the name of a column can be a keyword like "int". This is no
>> longer the case in 1.3:
>>
>> data-pipeline-shell> HiveTypeHelper.toDataType("struct<int:bigint>")
>> org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
>> failure: ``>'' expected but `int' found
>>
>> struct<int:bigint>
>>        ^
>>         at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
>>         at
>> org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
>>         at
>> org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)
>>
>> Note HiveTypeHelper is simply an object I load in to expose
>> HiveMetastoreTypes since it was made private. See
>> https://gist.github.com/nitay/460b41ed5fd7608507f5
>> <https://app.relateiq.com/r?c=chrome_gmail&url=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5&t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c>
>>
>> This is actually a pretty big problem for us as we have a bunch of legacy
>> tables with column names like "timestamp". They work fine in 1.2, but now
>> everything throws in 1.3.
>>
>> Any thoughts?
>>
>> Thanks,
>> - Nitay
>> Founder & CTO
>>
>>
>

Re: Spark 1.3 SQL Type Parser Changes?

Posted by Yin Huai <yh...@databricks.com>.
Hi Nitay,

Can you try using backticks to quote the column name? Like
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(
"struct<`int`:bigint>")?

Thanks,

Yin

On Tue, Mar 10, 2015 at 2:43 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> Thanks for reporting.  This was a result of a change to our DDL parser
> that resulted in types becoming reserved words.  I've filled a JIRA and
> will investigate if this is something we can fix.
> https://issues.apache.org/jira/browse/SPARK-6250
>
> On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe <ni...@actioniq.co> wrote:
>
>> In Spark 1.2 I used to be able to do this:
>>
>> scala>
>> org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType("struct<int:bigint>")
>> res30: org.apache.spark.sql.catalyst.types.DataType =
>> StructType(List(StructField(int,LongType,true)))
>>
>> That is, the name of a column can be a keyword like "int". This is no
>> longer the case in 1.3:
>>
>> data-pipeline-shell> HiveTypeHelper.toDataType("struct<int:bigint>")
>> org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
>> failure: ``>'' expected but `int' found
>>
>> struct<int:bigint>
>>        ^
>>         at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
>>         at
>> org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
>>         at
>> org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)
>>
>> Note HiveTypeHelper is simply an object I load in to expose
>> HiveMetastoreTypes since it was made private. See
>> https://gist.github.com/nitay/460b41ed5fd7608507f5
>> <https://app.relateiq.com/r?c=chrome_gmail&url=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5&t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c>
>>
>> This is actually a pretty big problem for us as we have a bunch of legacy
>> tables with column names like "timestamp". They work fine in 1.2, but now
>> everything throws in 1.3.
>>
>> Any thoughts?
>>
>> Thanks,
>> - Nitay
>> Founder & CTO
>>
>>
>

Re: Spark 1.3 SQL Type Parser Changes?

Posted by Michael Armbrust <mi...@databricks.com>.
Thanks for reporting.  This was a result of a change to our DDL parser that
resulted in types becoming reserved words.  I've filled a JIRA and will
investigate if this is something we can fix.
https://issues.apache.org/jira/browse/SPARK-6250

On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe <ni...@actioniq.co> wrote:

> In Spark 1.2 I used to be able to do this:
>
> scala>
> org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType("struct<int:bigint>")
> res30: org.apache.spark.sql.catalyst.types.DataType =
> StructType(List(StructField(int,LongType,true)))
>
> That is, the name of a column can be a keyword like "int". This is no
> longer the case in 1.3:
>
> data-pipeline-shell> HiveTypeHelper.toDataType("struct<int:bigint>")
> org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
> failure: ``>'' expected but `int' found
>
> struct<int:bigint>
>        ^
>         at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
>         at
> org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
>         at
> org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)
>
> Note HiveTypeHelper is simply an object I load in to expose
> HiveMetastoreTypes since it was made private. See
> https://gist.github.com/nitay/460b41ed5fd7608507f5
> <https://app.relateiq.com/r?c=chrome_gmail&url=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5&t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c>
>
> This is actually a pretty big problem for us as we have a bunch of legacy
> tables with column names like "timestamp". They work fine in 1.2, but now
> everything throws in 1.3.
>
> Any thoughts?
>
> Thanks,
> - Nitay
> Founder & CTO
>
>

Re: Spark 1.3 SQL Type Parser Changes?

Posted by Michael Armbrust <mi...@databricks.com>.
Thanks for reporting.  This was a result of a change to our DDL parser that
resulted in types becoming reserved words.  I've filled a JIRA and will
investigate if this is something we can fix.
https://issues.apache.org/jira/browse/SPARK-6250

On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe <ni...@actioniq.co> wrote:

> In Spark 1.2 I used to be able to do this:
>
> scala>
> org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType("struct<int:bigint>")
> res30: org.apache.spark.sql.catalyst.types.DataType =
> StructType(List(StructField(int,LongType,true)))
>
> That is, the name of a column can be a keyword like "int". This is no
> longer the case in 1.3:
>
> data-pipeline-shell> HiveTypeHelper.toDataType("struct<int:bigint>")
> org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.8]
> failure: ``>'' expected but `int' found
>
> struct<int:bigint>
>        ^
>         at org.apache.spark.sql.sources.DDLParser.parseType(ddl.scala:52)
>         at
> org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:785)
>         at
> org.apache.spark.sql.hive.HiveTypeHelper$.toDataType(HiveTypeHelper.scala:9)
>
> Note HiveTypeHelper is simply an object I load in to expose
> HiveMetastoreTypes since it was made private. See
> https://gist.github.com/nitay/460b41ed5fd7608507f5
> <https://app.relateiq.com/r?c=chrome_gmail&url=https%3A%2F%2Fgist.github.com%2Fnitay%2F460b41ed5fd7608507f5&t=AFwhZf262cJFT8YSR54ZotvY2aTmpm_zHTSKNSd4jeT-a6b8q-yMXQ-BqEX9-Ym54J1bkDFiFOXyRKsNxXoDGIh7bhqbBVKsGGq6YTJIfLZxs375XXPdS13KHsE_3Lffk4UIFkRFZ_7c>
>
> This is actually a pretty big problem for us as we have a bunch of legacy
> tables with column names like "timestamp". They work fine in 1.2, but now
> everything throws in 1.3.
>
> Any thoughts?
>
> Thanks,
> - Nitay
> Founder & CTO
>
>