You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by Salman Ahmed <ah...@gmail.com> on 2016/08/04 14:31:03 UTC

Parquet Spark cannot interface with Parquet Impala

Dear All,

We are facing the following issue:

We have a Spark SQL Dataframe which contains a column of StringType.

This column actually contains timestamp data.

Because the dataframe doesn't support the format in which we want the
timestamp, we are using string data type for it.

All this is fine until,
we would like to write this DataFrame to Impala Table which stores data in
Parquet File Format.

This is the code  for the last step:




*df.write.save("hdfs://hadoop-test.css.org:8020/user/hive/warehouse/twitter.db/land2
<http://hadoop-test.css.org:8020/user/hive/warehouse/twitter.db/land2>",
format="parquet", mode="append", partitionBy=None)*
Here is the error we are getting:


*java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.hive.serde2.io.TimestampWritable*
When we change the format to "text"



*df.write.save("hdfs://hadoop-test.css.org:8020/user/hive/warehouse/twitter.db/land2
<http://hadoop-test.css.org:8020/user/hive/warehouse/twitter.db/land2>",
format="text", mode="append", partitionBy=None),*
we get no errors & the write is successful (provided the Impala Table is
also in TextFile format.


Why is this. We tried everything including:
http://stackoverflow.com/questions/31482798/save-spark-dataframe-to-hive-table-not-readable-because-parquet-not-a-sequence

This gives us an error:
TypeError: 'JavaPackage' object is not callable


Any help would be deeply appreciated.

Thanks & Kind Regards,
Salman Ahmed