You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Siva <sb...@gmail.com> on 2016/01/14 02:20:15 UTC

Hive is unable to avro file written by spark avro

Hi Everyone,

Avro data written by dataframe in hdfs in not able to read by hive. Saving
data avro format with below statement.

df.save("com.databricks.spark.avro", SaveMode.Append, Map("path" -> path))

Created hive avro external table and while reading I see all nulls. Did
anyone face similar issue, what is the best way to write the data in avro
format from spark, so that it can also readable by hive.

Thanks,
Sivakumar Bhavanari.

Re: Hive is unable to avro file written by spark avro

Posted by Kevin Mellott <ke...@gmail.com>.

Hi Sivakumar,

I have run into this issue in the past, and we were able to fix it by using
an explicit schema when saving the DataFrame to the Avro file. This schema
was an exact match to the one associated with the metadata on the Hive
database table, which allowed the Hive queries to work even after updating
the underlying Avro file via Spark.

We are using Spark 1.3.0, and I was hoping to find a better solution to
this problem once we upgrade to Spark 1.5.0 (we manage versions via CDH).
This one works, but the coding involved can be a little tedious based on
the complexity of your data.

If memory serves correctly, the explicit schema was necessary because our
data structure contained optional nested properties. The DataFrame writer
will automatically create a schema for you, but ours was differing based on
the data being saved (i.e. whether it did or did not contain a nested
element).

- Kevin

On Wed, Jan 13, 2016 at 7:20 PM, Siva <sb...@gmail.com> wrote:

> Hi Everyone,
>
> Avro data written by dataframe in hdfs in not able to read by hive. Saving
> data avro format with below statement.
>
> df.save("com.databricks.spark.avro", SaveMode.Append, Map("path" -> path))
>
> Created hive avro external table and while reading I see all nulls. Did
> anyone face similar issue, what is the best way to write the data in avro
> format from spark, so that it can also readable by hive.
>
> Thanks,
> Sivakumar Bhavanari.
>