You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:21:18 UTC

[jira] [Updated] (SPARK-10153) Unable to query Avro data from Flume using SparkSQL

     [ https://issues.apache.org/jira/browse/SPARK-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-10153:
---------------------------------
    Labels: bulk-closed  (was: )

> Unable to query Avro data from Flume using SparkSQL
> ---------------------------------------------------
>
>                 Key: SPARK-10153
>                 URL: https://issues.apache.org/jira/browse/SPARK-10153
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1, 1.5.0
>            Reporter: mathias kluba
>            Priority: Major
>              Labels: bulk-closed
>
> I use the Avro event serialiazer of Flume.
> The schema is:
> {code}
> {
> "type":"record",
> "name":"Event",
> "fields":[
>   {
>     "name":"headers",
>     "type":{"type":"map","values":"string"}
>   },
>   {
>     "name":"body",
>     "type":"bytes"
>   }
> ]}
> {code}
> I'm using HDP 2.2 with Hive 0.14 (using TEZ) and I'm able to query the data correctly.
> But with Spark SQL, I have issues.
> I tested with 1.4.1 and 1.5.0 (last snapshot) and I have different error message for different issues.
> In 1.4.1 I have:
> {code:sql}
> select body from mytable limit 10;
> {code}
> {code}
>  conversion of string to map<string,string>not supported yet
> {code}
> It's related to the header which is a map<string,string>, but I don't understand why it's trying to convert to String. Maybe to display it as a single column ? If I do a "Select" without the header, I still have this issue.
> With 1.5.0 I have:
> {code:sql}
> select body from mytable limit 10;
> {code}
> {code}
> java.lang.RuntimeException: java.lang.ClassCastException: java.lang.String cannot be cast to [B
> {code}
> It's clearly not the same error, it seems that 1.5.0 is fixing the bug with the header.
> So it seems that there's an error why SparkSQL try to cast the body as String, even if it's a ByteArray in the column type (from the Avro schema).
> When I do the cast manually, it works:
> {code:sql}
> select cast(body as String) from mytable limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org