You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Michael Armbrust <mi...@databricks.com> on 2014/07/15 00:55:59 UTC
Change when loading/storing String data using Parquet
I just wanted to send out a quick note about a change in the handling of
strings when loading / storing data using parquet and Spark SQL. Before,
Spark SQL did not support binary data in Parquet, so all binary blobs were
implicitly treated as Strings. 9fe693
<https://github.com/apache/spark/commit/9fe693b5b6ed6af34ee1e800ab89c8a11991ea38>
fixes
this limitation by adding support for binary data.
However, data written out with a prior version of Spark SQL will be missing
the annotation telling us to interpret a given column as a String, so old
string data will now be loaded as binary data. If you would like to use
the data as a string, you will need to add a CAST to convert the datatype.
New string data written out after this change, will correctly be loaded in
as a string as now we will include an annotation about the desired type.
Additionally, this should now interoperate correctly with other systems
that write Parquet data (hive, thrift, etc).
Michael