You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mohamed Nadjib MAMI <mo...@gmail.com> on 2016/10/13 22:30:45 UTC

Java.util.ArrayList is not a valid external type for schema of array

In Spark 1.5.2 I had a job that reads from textFile and saves some data
into a Parquet table. One value was of type `ArrayList<String>` being
successfully saved as an "array<string>" column in the Parquet table. I
upgraded to Spark version 2.0.1, I changed the necessary code (SparkConf to
SparkSession,  DataFrame to Dataset) so no syntactic issues in the code.
However, the job is not finishing anymore. The following exception is fired:
`java.lang.RuntimeException: Error while encoding:
java.lang.RuntimeException: java.util.ArrayList is not a valid external
type for schema of array<string>`

at the line:
`table.write().parquet(table_name);`

I inspected the schema and it looked fine. Here is the string array column:

`StructType(StructField(array_column,ArrayType(StringType,true),true)`

...and the value to be saved therein looks like:

[aaa, bbb, ccc]

The column array<string> is constructed this way:
`DataTypes.createStructField(column,
DataTypes.createArrayType(DataTypes.StringType,
true), true);`

I guess I provided all necessary code, but if more helps, please let me
know.

So there should be some logic-change in the latest version altering the
possibility to save ArrayList<String> in an array of string in Parquet
tables. Any help on solving/working around this would be very appreciated.

*Regards, Grüße, **Cordialement,** Recuerdos, Saluti, προσρήσεις, 问候,
تحياتي.*
*Mohamed Nadjib Mami*
*PhD Student - EIS Department - **Bonn University (Germany).*

*About me! <http://www.strikingly.com/mohamed-nadjib-mami>*
*LinkedIn <http://fr.linkedin.com/in/mohamednadjibmami/>*