You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ge ko <ko...@gmail.com> on 2014/04/17 11:55:45 UTC

Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Hi,

I want to select from a parquet based table in shark, but receive the error:

shark> select * from wl_parquet;
14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run>
14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile>
14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
wl_parquet
14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for source
tables
FAILED: Hive Internal Error:
java.lang.RuntimeException(java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
java.lang.RuntimeException(java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
    at
org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
    at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
    at
shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
    at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
    at shark.SharkDriver.compile(SharkDriver.scala:215)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
    at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
    at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
    at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at
org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
    ... 14 more

I can successfully select from that table with Hive and Impala, but shark
doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.

In what jar is this class "hidden", how can I get rid of this exception ?!?!

The lib folder of shark contains:
[root@hadoop-pg-9 shark-0.9.1]# ll lib
total 180
lrwxrwxrwx 1 root root    67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar ->
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
-rwxrwxr-x 1 root root 23086  9. Apr 10:57 JavaEWAH-0.4.2.jar
lrwxrwxrwx 1 root root    53 14. Apr 21:46 parquet-avro.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
lrwxrwxrwx 1 root root    58 14. Apr 21:46 parquet-cascading.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-column.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-common.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar
lrwxrwxrwx 1 root root    57 14. Apr 21:46 parquet-encoding.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar
lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-format.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar
lrwxrwxrwx 1 root root    58 14. Apr 21:46 parquet-generator.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar
lrwxrwxrwx 1 root root    62 14. Apr 21:46 parquet-hadoop-bundle.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar
lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-hadoop.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar
-rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar
lrwxrwxrwx 1 root root    56 14. Apr 21:46 parquet-scrooge.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar
lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-thrift.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar
-rw-rw-r-- 1 root root 76220  9. Apr 10:57 pyrolite.jar

thanks in advance, Gerd

Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Posted by Gerd Koenig <ko...@googlemail.com>.
Hi Arpit,

I didn't build it, I am using the prebuild version described here:
http://www.abcn.net/2014/04/install-shark-on-cdh5-hadoop2-spark.html
including adding e.g. the mentioned jar

br...Gerd...


On 17 April 2014 15:49, Arpit Tak <ar...@mobipulse.in> wrote:

> Just for curiosity , as you are using Cloudera-Manager hadoop and spark..
> How you build shark .....for it??
>
> are you able to read any file from hdfs .......did you tried that out..???
>
>
> Regards,
> Arpit Tak
>
>
> On Thu, Apr 17, 2014 at 7:07 PM, ge ko <ko...@gmail.com> wrote:
>
>> Hi,
>>
>> the error java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
>> resolved by adding
>> parquet-hive-bundle-1.4.1.jar to shark's lib folder.
>> Now the Hive metastore can be read successfully (also the parquet based
>> table).
>>
>> But if I want to select from that table I receive:
>>
>> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
>> (most recent failure: Exception failure: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>>
>> This is really strange, since the class
>> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
>> the parquet-hive-bundle-1.4.1.jar ?!?!
>> ...getting more and more confused ;)
>>
>> any help ?
>>
>> regards, Gerd
>>
>>
>> On 17 April 2014 11:55, ge ko <ko...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to select from a parquet based table in shark, but receive the
>>> error:
>>>
>>> shark> select * from wl_parquet;
>>> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
>>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run>
>>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
>>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile>
>>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
>>> wl_parquet
>>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
>>> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
>>> source tables
>>> FAILED: Hive Internal Error:
>>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>>> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
>>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>>>     at
>>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
>>>     at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
>>>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
>>>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
>>>     at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
>>>     at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
>>>     at
>>> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
>>>     at
>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>>>     at shark.SharkDriver.compile(SharkDriver.scala:215)
>>>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>>>     at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
>>>     at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>>>     at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
>>>     at shark.SharkCliDriver.main(SharkCliDriver.scala)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:270)
>>>     at
>>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
>>>     ... 14 more
>>>
>>> I can successfully select from that table with Hive and Impala, but
>>> shark doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>>>
>>> In what jar is this class "hidden", how can I get rid of this exception
>>> ?!?!
>>>
>>> The lib folder of shark contains:
>>> [root@hadoop-pg-9 shark-0.9.1]# ll lib
>>> total 180
>>> lrwxrwxrwx 1 root root    67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar
>>> -> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
>>> -rwxrwxr-x 1 root root 23086  9. Apr 10:57 JavaEWAH-0.4.2.jar
>>> lrwxrwxrwx 1 root root    53 14. Apr 21:46 parquet-avro.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
>>> lrwxrwxrwx 1 root root    58 14. Apr 21:46 parquet-cascading.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
>>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-column.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
>>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-common.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar
>>> lrwxrwxrwx 1 root root    57 14. Apr 21:46 parquet-encoding.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar
>>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-format.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar
>>> lrwxrwxrwx 1 root root    58 14. Apr 21:46 parquet-generator.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar
>>> lrwxrwxrwx 1 root root    62 14. Apr 21:46 parquet-hadoop-bundle.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar
>>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-hadoop.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar
>>> -rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar
>>> lrwxrwxrwx 1 root root    56 14. Apr 21:46 parquet-scrooge.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar
>>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-thrift.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar
>>> -rw-rw-r-- 1 root root 76220  9. Apr 10:57 pyrolite.jar
>>>
>>> thanks in advance, Gerd
>>>
>>
>>
>

Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Posted by Arpit Tak <ar...@mobipulse.in>.
Just for curiosity , as you are using Cloudera-Manager hadoop and spark..
How you build shark .....for it??

are you able to read any file from hdfs .......did you tried that out..???


Regards,
Arpit Tak


On Thu, Apr 17, 2014 at 7:07 PM, ge ko <ko...@gmail.com> wrote:

> Hi,
>
> the error java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
> resolved by adding
> parquet-hive-bundle-1.4.1.jar to shark's lib folder.
> Now the Hive metastore can be read successfully (also the parquet based
> table).
>
> But if I want to select from that table I receive:
>
> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
> (most recent failure: Exception failure: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
>     at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>     at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>
> This is really strange, since the class
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
> the parquet-hive-bundle-1.4.1.jar ?!?!
> ...getting more and more confused ;)
>
> any help ?
>
> regards, Gerd
>
>
> On 17 April 2014 11:55, ge ko <ko...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to select from a parquet based table in shark, but receive the
>> error:
>>
>> shark> select * from wl_parquet;
>> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run>
>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile>
>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
>> wl_parquet
>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
>> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
>> source tables
>> FAILED: Hive Internal Error:
>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>>     at
>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
>>     at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
>>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
>>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
>>     at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
>>     at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
>>     at
>> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
>>     at
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>>     at shark.SharkDriver.compile(SharkDriver.scala:215)
>>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>>     at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
>>     at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>>     at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
>>     at shark.SharkCliDriver.main(SharkCliDriver.scala)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:270)
>>     at
>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
>>     ... 14 more
>>
>> I can successfully select from that table with Hive and Impala, but shark
>> doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>>
>> In what jar is this class "hidden", how can I get rid of this exception
>> ?!?!
>>
>> The lib folder of shark contains:
>> [root@hadoop-pg-9 shark-0.9.1]# ll lib
>> total 180
>> lrwxrwxrwx 1 root root    67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar
>> -> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
>> -rwxrwxr-x 1 root root 23086  9. Apr 10:57 JavaEWAH-0.4.2.jar
>> lrwxrwxrwx 1 root root    53 14. Apr 21:46 parquet-avro.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
>> lrwxrwxrwx 1 root root    58 14. Apr 21:46 parquet-cascading.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-column.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-common.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar
>> lrwxrwxrwx 1 root root    57 14. Apr 21:46 parquet-encoding.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar
>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-format.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar
>> lrwxrwxrwx 1 root root    58 14. Apr 21:46 parquet-generator.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar
>> lrwxrwxrwx 1 root root    62 14. Apr 21:46 parquet-hadoop-bundle.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar
>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-hadoop.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar
>> -rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar
>> lrwxrwxrwx 1 root root    56 14. Apr 21:46 parquet-scrooge.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar
>> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-thrift.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar
>> -rw-rw-r-- 1 root root 76220  9. Apr 10:57 pyrolite.jar
>>
>> thanks in advance, Gerd
>>
>
>

Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Posted by ge ko <ko...@gmail.com>.
Hi,

the error java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
resolved by adding
parquet-hive-bundle-1.4.1.jar to shark's lib folder.
Now the Hive metastore can be read successfully (also the parquet based
table).

But if I want to select from that table I receive:

org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
(most recent failure: Exception failure: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

This is really strange, since the class
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
the parquet-hive-bundle-1.4.1.jar ?!?!
...getting more and more confused ;)

any help ?

regards, Gerd


On 17 April 2014 11:55, ge ko <ko...@gmail.com> wrote:

> Hi,
>
> I want to select from a parquet based table in shark, but receive the
> error:
>
> shark> select * from wl_parquet;
> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run>
> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile>
> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
> wl_parquet
> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
> source tables
> FAILED: Hive Internal Error:
> java.lang.RuntimeException(java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
> java.lang.RuntimeException(java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>     at
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
>     at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
>     at
> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
>     at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>     at shark.SharkDriver.compile(SharkDriver.scala:215)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>     at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>     at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
>     at shark.SharkCliDriver.main(SharkCliDriver.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
>     ... 14 more
>
> I can successfully select from that table with Hive and Impala, but shark
> doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>
> In what jar is this class "hidden", how can I get rid of this exception
> ?!?!
>
> The lib folder of shark contains:
> [root@hadoop-pg-9 shark-0.9.1]# ll lib
> total 180
> lrwxrwxrwx 1 root root    67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar ->
> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
> -rwxrwxr-x 1 root root 23086  9. Apr 10:57 JavaEWAH-0.4.2.jar
> lrwxrwxrwx 1 root root    53 14. Apr 21:46 parquet-avro.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
> lrwxrwxrwx 1 root root    58 14. Apr 21:46 parquet-cascading.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-column.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-common.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar
> lrwxrwxrwx 1 root root    57 14. Apr 21:46 parquet-encoding.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar
> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-format.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar
> lrwxrwxrwx 1 root root    58 14. Apr 21:46 parquet-generator.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar
> lrwxrwxrwx 1 root root    62 14. Apr 21:46 parquet-hadoop-bundle.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar
> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-hadoop.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar
> -rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar
> lrwxrwxrwx 1 root root    56 14. Apr 21:46 parquet-scrooge.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar
> lrwxrwxrwx 1 root root    55 14. Apr 21:46 parquet-thrift.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar
> -rw-rw-r-- 1 root root 76220  9. Apr 10:57 pyrolite.jar
>
> thanks in advance, Gerd
>