You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Nguyen Xuan Truong <tr...@gmail.com> on 2019/04/29 06:18:14 UTC

Zeppelin 0.8.0 issue with Hadoop 2.9.2 on Spark 2.1.0

Hi,

We were having a Zeppelin instance 0.8.0 (binary package) running smoothly
on Spark 2.1.0 and Hadoop 2.6.4

We recently upgrade our hadoop version from 2.6.4 to 2.9.2 and I start
getting this error with Zeppelin when reading from HDFS (using Scala 2.11.8)

java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.spark.rdd.RDDOperationScope$ at
> org.apache.spark.SparkContext.withScope(SparkContext.scala:701) at
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:715) at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
> at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
> at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
> at scala.Option.orElse(Option.scala:289) at
> org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
> at
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at
> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441) at
> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425) ...
> 52 elided


I think it's related to the *com.fasterxml.jackson.core* dependency.
Current version I am using is 2.8.10. I already tried replacing version
2.8.10 with 2.7.8 and 2.8.8 but the issue still persists. Instead of the
above error, I got the following for 2.8.8 version:

com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson
> version: 2.8.8 at
> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
> at
> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
> at
> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:745)
> at
> org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
> at
> org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:701) at
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:715) at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
> at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
> at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
> at scala.Option.orElse(Option.scala:289) at
> org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
> at
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at
> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441) at
> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425) ...
> 52 elided
>

Wonder if anyone has any idea to resolve the issue? (We can't change our
Spark and Hadoop version) but we can change Zeppelin version if needed.

Thanks,
Truong

Re: Zeppelin 0.8.0 issue with Hadoop 2.9.2 on Spark 2.1.0

Posted by Jeff Zhang <zj...@gmail.com>.

Awesome !

Nguyen Xuan Truong <tr...@gmail.com> 于2019年4月29日周一 下午4:27写道：

> Thank you for your info.
>
> I have managed to solve my issue by building Zeppelin 0.8.1 from source
> with Hadoop 2.9.2 and adding jackson-module-scala_2.11 to the
> ZEPPELIN_HOME/lib folder.
>
> Choosing version 0.8.1 now as it's straightforward to migrate notebooks
> from version 0.8.0. Will try 0.9 later when having more time.
>
>
>
> On Mon, Apr 29, 2019 at 2:31 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> You can try zeppelin 0.9 (not released yet, master branch) which shade
>> all the dependencies.
>>
>> Nguyen Xuan Truong <tr...@gmail.com> 于2019年4月29日周一 下午2:18写道：
>>
>>> Hi,
>>>
>>> We were having a Zeppelin instance 0.8.0 (binary package) running
>>> smoothly on Spark 2.1.0 and Hadoop 2.6.4
>>>
>>> We recently upgrade our hadoop version from 2.6.4 to 2.9.2 and I start
>>> getting this error with Zeppelin when reading from HDFS (using Scala 2.11.8)
>>>
>>> java.lang.NoClassDefFoundError: Could not initialize class
>>>> org.apache.spark.rdd.RDDOperationScope$ at
>>>> org.apache.spark.SparkContext.withScope(SparkContext.scala:701) at
>>>> org.apache.spark.SparkContext.parallelize(SparkContext.scala:715) at
>>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>>>> at scala.Option.orElse(Option.scala:289) at
>>>> org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at
>>>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441) at
>>>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425) ...
>>>> 52 elided
>>>
>>>
>>> I think it's related to the *com.fasterxml.jackson.core* dependency.
>>> Current version I am using is 2.8.10. I already tried replacing version
>>> 2.8.10 with 2.7.8 and 2.8.8 but the issue still persists. Instead of the
>>> above error, I got the following for 2.8.8 version:
>>>
>>> com.fasterxml.jackson.databind.JsonMappingException: Incompatible
>>>> Jackson version: 2.8.8 at
>>>> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
>>>> at
>>>> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
>>>> at
>>>> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:745)
>>>> at
>>>> org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
>>>> at
>>>> org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
>>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:701) at
>>>> org.apache.spark.SparkContext.parallelize(SparkContext.scala:715) at
>>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>>>> at scala.Option.orElse(Option.scala:289) at
>>>> org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at
>>>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441) at
>>>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425) ...
>>>> 52 elided
>>>>
>>>
>>> Wonder if anyone has any idea to resolve the issue? (We can't change our
>>> Spark and Hadoop version) but we can change Zeppelin version if needed.
>>>
>>> Thanks,
>>> Truong
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

-- 
Best Regards

Jeff Zhang

Re: Zeppelin 0.8.0 issue with Hadoop 2.9.2 on Spark 2.1.0

Posted by Nguyen Xuan Truong <tr...@gmail.com>.

Thank you for your info.

I have managed to solve my issue by building Zeppelin 0.8.1 from source
with Hadoop 2.9.2 and adding jackson-module-scala_2.11 to the
ZEPPELIN_HOME/lib folder.

Choosing version 0.8.1 now as it's straightforward to migrate notebooks
from version 0.8.0. Will try 0.9 later when having more time.



On Mon, Apr 29, 2019 at 2:31 PM Jeff Zhang <zj...@gmail.com> wrote:

> You can try zeppelin 0.9 (not released yet, master branch) which shade all
> the dependencies.
>
> Nguyen Xuan Truong <tr...@gmail.com> 于2019年4月29日周一 下午2:18写道：
>
>> Hi,
>>
>> We were having a Zeppelin instance 0.8.0 (binary package) running
>> smoothly on Spark 2.1.0 and Hadoop 2.6.4
>>
>> We recently upgrade our hadoop version from 2.6.4 to 2.9.2 and I start
>> getting this error with Zeppelin when reading from HDFS (using Scala 2.11.8)
>>
>> java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.spark.rdd.RDDOperationScope$ at
>>> org.apache.spark.SparkContext.withScope(SparkContext.scala:701) at
>>> org.apache.spark.SparkContext.parallelize(SparkContext.scala:715) at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>>> at scala.Option.orElse(Option.scala:289) at
>>> org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at
>>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441) at
>>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425) ...
>>> 52 elided
>>
>>
>> I think it's related to the *com.fasterxml.jackson.core* dependency.
>> Current version I am using is 2.8.10. I already tried replacing version
>> 2.8.10 with 2.7.8 and 2.8.8 but the issue still persists. Instead of the
>> above error, I got the following for 2.8.8 version:
>>
>> com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson
>>> version: 2.8.8 at
>>> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
>>> at
>>> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
>>> at
>>> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:745)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:701) at
>>> org.apache.spark.SparkContext.parallelize(SparkContext.scala:715) at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>>> at scala.Option.orElse(Option.scala:289) at
>>> org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at
>>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441) at
>>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425) ...
>>> 52 elided
>>>
>>
>> Wonder if anyone has any idea to resolve the issue? (We can't change our
>> Spark and Hadoop version) but we can change Zeppelin version if needed.
>>
>> Thanks,
>> Truong
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Zeppelin 0.8.0 issue with Hadoop 2.9.2 on Spark 2.1.0

Posted by Jeff Zhang <zj...@gmail.com>.

You can try zeppelin 0.9 (not released yet, master branch) which shade all
the dependencies.

Nguyen Xuan Truong <tr...@gmail.com> 于2019年4月29日周一 下午2:18写道：

> Hi,
>
> We were having a Zeppelin instance 0.8.0 (binary package) running smoothly
> on Spark 2.1.0 and Hadoop 2.6.4
>
> We recently upgrade our hadoop version from 2.6.4 to 2.9.2 and I start
> getting this error with Zeppelin when reading from HDFS (using Scala 2.11.8)
>
> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.spark.rdd.RDDOperationScope$ at
>> org.apache.spark.SparkContext.withScope(SparkContext.scala:701) at
>> org.apache.spark.SparkContext.parallelize(SparkContext.scala:715) at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
>> at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
>> at
>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>> at
>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>> at scala.Option.orElse(Option.scala:289) at
>> org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
>> at
>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at
>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441) at
>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425) ...
>> 52 elided
>
>
> I think it's related to the *com.fasterxml.jackson.core* dependency.
> Current version I am using is 2.8.10. I already tried replacing version
> 2.8.10 with 2.7.8 and 2.8.8 but the issue still persists. Instead of the
> above error, I got the following for 2.8.8 version:
>
> com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson
>> version: 2.8.8 at
>> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
>> at
>> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
>> at
>> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:745)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:701) at
>> org.apache.spark.SparkContext.parallelize(SparkContext.scala:715) at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
>> at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
>> at
>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>> at
>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
>> at scala.Option.orElse(Option.scala:289) at
>> org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
>> at
>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at
>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441) at
>> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425) ...
>> 52 elided
>>
>
> Wonder if anyone has any idea to resolve the issue? (We can't change our
> Spark and Hadoop version) but we can change Zeppelin version if needed.
>
> Thanks,
> Truong
>


-- 
Best Regards

Jeff Zhang