You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Pei-Lun Lee <pl...@appier.com> on 2015/03/10 10:06:16 UTC

SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

Hi,

I found that if I try to read parquet file generated by spark 1.1.1 using
1.3.0-rc3 by default settings, I got this error:

com.fasterxml.jackson.core.JsonParseException: Unrecognized token
'StructType': was expecting ('true', 'false' or 'null')
 at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1,
column: 11]
        at
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
        at
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
        at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
        at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
        at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
        at
com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
        at
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
        at
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
        at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
        at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
        at org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
        at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
        at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)



this is how I save parquet file with 1.1.1:

sql("select 1 as a").saveAsParquetFile("/tmp/foo")



and this is the meta data of the 1.1.1 parquet file:

creator:     parquet-mr version 1.4.3
extra:       org.apache.spark.sql.parquet.row.metadata =
StructType(List(StructField(a,IntegerType,false)))



by comparison, this is 1.3.0 meta:

creator:     parquet-mr version 1.6.0rc3
extra:       org.apache.spark.sql.parquet.row.metadata =
{"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
[more]...



It looks like now ParquetRelation2 is used to load parquet file by default
and it only recognizes JSON format schema but 1.1.1 schema was case class
string format.

Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
don't know the differences.
Is this considered a bug? We have a lot of parquet files from 1.1.1, should
we disable data source api in order to read them if we want to upgrade to
1.3?

Thanks,
--
Pei-Lun

Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

Posted by Pei-Lun Lee <pl...@appier.com>.

Thanks!

On Sat, Mar 14, 2015 at 3:31 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-6315
>
> On Thu, Mar 12, 2015 at 11:00 PM, Michael Armbrust <michael@databricks.com
> >
> wrote:
>
> > We are looking at the issue and will likely fix it for Spark 1.3.1.
> >
> > On Thu, Mar 12, 2015 at 8:25 PM, giive chen <th...@gmail.com> wrote:
> >
> >> Hi all
> >>
> >> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot
> read
> >> parquet file generated by Spark 1.1. It will cost a lot of migration
> work
> >> when we wanna to upgrade Spark 1.3.
> >>
> >> Is there  anyone can help me?
> >>
> >>
> >> Thanks
> >>
> >> Wisely Chen
> >>
> >>
> >> On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee <pl...@appier.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I found that if I try to read parquet file generated by spark 1.1.1
> >> using
> >> > 1.3.0-rc3 by default settings, I got this error:
> >> >
> >> > com.fasterxml.jackson.core.JsonParseException: Unrecognized token
> >> > 'StructType': was expecting ('true', 'false' or 'null')
> >> >  at [Source: StructType(List(StructField(a,IntegerType,false))); line:
> >> 1,
> >> > column: 11]
> >> >         at
> >> >
> >>
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
> >> >         at
> >> >
> >> >
> >>
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
> >> >         at
> >> >
> >> >
> >>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
> >> >         at
> >> >
> >> >
> >>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
> >> >         at
> >> >
> >> >
> >>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
> >> >         at
> >> >
> >> >
> >>
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
> >> >         at
> >> >
> >> >
> >>
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
> >> >         at
> >> >
> >> >
> >>
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
> >> >         at
> >> org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
> >> >         at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
> >> >         at
> >> > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
> >> >         at
> >> >
> >> >
> >>
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> >> >         at
> >> >
> >> >
> >>
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> >> >
> >> >
> >> >
> >> > this is how I save parquet file with 1.1.1:
> >> >
> >> > sql("select 1 as a").saveAsParquetFile("/tmp/foo")
> >> >
> >> >
> >> >
> >> > and this is the meta data of the 1.1.1 parquet file:
> >> >
> >> > creator:     parquet-mr version 1.4.3
> >> > extra:       org.apache.spark.sql.parquet.row.metadata =
> >> > StructType(List(StructField(a,IntegerType,false)))
> >> >
> >> >
> >> >
> >> > by comparison, this is 1.3.0 meta:
> >> >
> >> > creator:     parquet-mr version 1.6.0rc3
> >> > extra:       org.apache.spark.sql.parquet.row.metadata =
> >> > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
> >> > [more]...
> >> >
> >> >
> >> >
> >> > It looks like now ParquetRelation2 is used to load parquet file by
> >> default
> >> > and it only recognizes JSON format schema but 1.1.1 schema was case
> >> class
> >> > string format.
> >> >
> >> > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
> >> > don't know the differences.
> >> > Is this considered a bug? We have a lot of parquet files from 1.1.1,
> >> should
> >> > we disable data source api in order to read them if we want to upgrade
> >> to
> >> > 1.3?
> >> >
> >> > Thanks,
> >> > --
> >> > Pei-Lun
> >> >
> >>
> >
> >
>

Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

Posted by Michael Armbrust <mi...@databricks.com>.

Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-6315

On Thu, Mar 12, 2015 at 11:00 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> We are looking at the issue and will likely fix it for Spark 1.3.1.
>
> On Thu, Mar 12, 2015 at 8:25 PM, giive chen <th...@gmail.com> wrote:
>
>> Hi all
>>
>> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
>> parquet file generated by Spark 1.1. It will cost a lot of migration work
>> when we wanna to upgrade Spark 1.3.
>>
>> Is there  anyone can help me?
>>
>>
>> Thanks
>>
>> Wisely Chen
>>
>>
>> On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee <pl...@appier.com> wrote:
>>
>> > Hi,
>> >
>> > I found that if I try to read parquet file generated by spark 1.1.1
>> using
>> > 1.3.0-rc3 by default settings, I got this error:
>> >
>> > com.fasterxml.jackson.core.JsonParseException: Unrecognized token
>> > 'StructType': was expecting ('true', 'false' or 'null')
>> >  at [Source: StructType(List(StructField(a,IntegerType,false))); line:
>> 1,
>> > column: 11]
>> >         at
>> >
>> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
>> >         at
>> >
>> >
>> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
>> >         at
>> >
>> >
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
>> >         at
>> >
>> >
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
>> >         at
>> >
>> >
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
>> >         at
>> >
>> >
>> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
>> >         at
>> >
>> >
>> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
>> >         at
>> >
>> >
>> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
>> >         at
>> org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
>> >         at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
>> >         at
>> > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
>> >         at
>> >
>> >
>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
>> >         at
>> >
>> >
>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
>> >
>> >
>> >
>> > this is how I save parquet file with 1.1.1:
>> >
>> > sql("select 1 as a").saveAsParquetFile("/tmp/foo")
>> >
>> >
>> >
>> > and this is the meta data of the 1.1.1 parquet file:
>> >
>> > creator:     parquet-mr version 1.4.3
>> > extra:       org.apache.spark.sql.parquet.row.metadata =
>> > StructType(List(StructField(a,IntegerType,false)))
>> >
>> >
>> >
>> > by comparison, this is 1.3.0 meta:
>> >
>> > creator:     parquet-mr version 1.6.0rc3
>> > extra:       org.apache.spark.sql.parquet.row.metadata =
>> > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
>> > [more]...
>> >
>> >
>> >
>> > It looks like now ParquetRelation2 is used to load parquet file by
>> default
>> > and it only recognizes JSON format schema but 1.1.1 schema was case
>> class
>> > string format.
>> >
>> > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
>> > don't know the differences.
>> > Is this considered a bug? We have a lot of parquet files from 1.1.1,
>> should
>> > we disable data source api in order to read them if we want to upgrade
>> to
>> > 1.3?
>> >
>> > Thanks,
>> > --
>> > Pei-Lun
>> >
>>
>
>

Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

Posted by Michael Armbrust <mi...@databricks.com>.

We are looking at the issue and will likely fix it for Spark 1.3.1.

On Thu, Mar 12, 2015 at 8:25 PM, giive chen <th...@gmail.com> wrote:

> Hi all
>
> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
> parquet file generated by Spark 1.1. It will cost a lot of migration work
> when we wanna to upgrade Spark 1.3.
>
> Is there  anyone can help me?
>
>
> Thanks
>
> Wisely Chen
>
>
> On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee <pl...@appier.com> wrote:
>
> > Hi,
> >
> > I found that if I try to read parquet file generated by spark 1.1.1 using
> > 1.3.0-rc3 by default settings, I got this error:
> >
> > com.fasterxml.jackson.core.JsonParseException: Unrecognized token
> > 'StructType': was expecting ('true', 'false' or 'null')
> >  at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1,
> > column: 11]
> >         at
> >
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
> >         at
> >
> >
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
> >         at
> >
> >
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
> >         at
> >
> >
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
> >         at
> >
> >
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
> >         at
> >
> >
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
> >         at
> >
> >
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
> >         at
> >
> >
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
> >         at
> org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
> >         at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
> >         at
> > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
> >         at
> >
> >
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> >         at
> >
> >
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> >
> >
> >
> > this is how I save parquet file with 1.1.1:
> >
> > sql("select 1 as a").saveAsParquetFile("/tmp/foo")
> >
> >
> >
> > and this is the meta data of the 1.1.1 parquet file:
> >
> > creator:     parquet-mr version 1.4.3
> > extra:       org.apache.spark.sql.parquet.row.metadata =
> > StructType(List(StructField(a,IntegerType,false)))
> >
> >
> >
> > by comparison, this is 1.3.0 meta:
> >
> > creator:     parquet-mr version 1.6.0rc3
> > extra:       org.apache.spark.sql.parquet.row.metadata =
> > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
> > [more]...
> >
> >
> >
> > It looks like now ParquetRelation2 is used to load parquet file by
> default
> > and it only recognizes JSON format schema but 1.1.1 schema was case class
> > string format.
> >
> > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
> > don't know the differences.
> > Is this considered a bug? We have a lot of parquet files from 1.1.1,
> should
> > we disable data source api in order to read them if we want to upgrade to
> > 1.3?
> >
> > Thanks,
> > --
> > Pei-Lun
> >
>

Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

Posted by giive chen <th...@gmail.com>.

Hi all

My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
parquet file generated by Spark 1.1. It will cost a lot of migration work
when we wanna to upgrade Spark 1.3.

Is there  anyone can help me?


Thanks

Wisely Chen


On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee <pl...@appier.com> wrote:

> Hi,
>
> I found that if I try to read parquet file generated by spark 1.1.1 using
> 1.3.0-rc3 by default settings, I got this error:
>
> com.fasterxml.jackson.core.JsonParseException: Unrecognized token
> 'StructType': was expecting ('true', 'false' or 'null')
>  at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1,
> column: 11]
>         at
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
>         at
>
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
>         at
>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
>         at
>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
>         at
>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
>         at
>
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
>         at
>
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
>         at
>
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
>         at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
>         at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
>         at
> org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
>         at
>
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
>         at
>
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
>
>
>
> this is how I save parquet file with 1.1.1:
>
> sql("select 1 as a").saveAsParquetFile("/tmp/foo")
>
>
>
> and this is the meta data of the 1.1.1 parquet file:
>
> creator:     parquet-mr version 1.4.3
> extra:       org.apache.spark.sql.parquet.row.metadata =
> StructType(List(StructField(a,IntegerType,false)))
>
>
>
> by comparison, this is 1.3.0 meta:
>
> creator:     parquet-mr version 1.6.0rc3
> extra:       org.apache.spark.sql.parquet.row.metadata =
> {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
> [more]...
>
>
>
> It looks like now ParquetRelation2 is used to load parquet file by default
> and it only recognizes JSON format schema but 1.1.1 schema was case class
> string format.
>
> Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
> don't know the differences.
> Is this considered a bug? We have a lot of parquet files from 1.1.1, should
> we disable data source api in order to read them if we want to upgrade to
> 1.3?
>
> Thanks,
> --
> Pei-Lun
>