You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ge ko <ko...@gmail.com> on 2014/04/17 11:55:45 UTC
Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Hi,
I want to select from a parquet based table in shark, but receive the error:
shark> select * from wl_parquet;
14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run>
14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile>
14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
wl_parquet
14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for source
tables
FAILED: Hive Internal Error:
java.lang.RuntimeException(java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
java.lang.RuntimeException(java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
at
org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
at
shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
at shark.SharkDriver.compile(SharkDriver.scala:215)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at
org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
... 14 more
I can successfully select from that table with Hive and Impala, but shark
doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
In what jar is this class "hidden", how can I get rid of this exception ?!?!
The lib folder of shark contains:
[root@hadoop-pg-9 shark-0.9.1]# ll lib
total 180
lrwxrwxrwx 1 root root 67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar ->
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
-rwxrwxr-x 1 root root 23086 9. Apr 10:57 JavaEWAH-0.4.2.jar
lrwxrwxrwx 1 root root 53 14. Apr 21:46 parquet-avro.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-cascading.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-column.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-common.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar
lrwxrwxrwx 1 root root 57 14. Apr 21:46 parquet-encoding.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar
lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-format.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar
lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-generator.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar
lrwxrwxrwx 1 root root 62 14. Apr 21:46 parquet-hadoop-bundle.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar
lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-hadoop.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar
-rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar
lrwxrwxrwx 1 root root 56 14. Apr 21:46 parquet-scrooge.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar
lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-thrift.jar ->
/opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar
-rw-rw-r-- 1 root root 76220 9. Apr 10:57 pyrolite.jar
thanks in advance, Gerd
Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Posted by Gerd Koenig <ko...@googlemail.com>.
Hi Arpit,
I didn't build it, I am using the prebuild version described here:
http://www.abcn.net/2014/04/install-shark-on-cdh5-hadoop2-spark.html
including adding e.g. the mentioned jar
br...Gerd...
On 17 April 2014 15:49, Arpit Tak <ar...@mobipulse.in> wrote:
> Just for curiosity , as you are using Cloudera-Manager hadoop and spark..
> How you build shark .....for it??
>
> are you able to read any file from hdfs .......did you tried that out..???
>
>
> Regards,
> Arpit Tak
>
>
> On Thu, Apr 17, 2014 at 7:07 PM, ge ko <ko...@gmail.com> wrote:
>
>> Hi,
>>
>> the error java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
>> resolved by adding
>> parquet-hive-bundle-1.4.1.jar to shark's lib folder.
>> Now the Hive metastore can be read successfully (also the parquet based
>> table).
>>
>> But if I want to select from that table I receive:
>>
>> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
>> (most recent failure: Exception failure: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>>
>> This is really strange, since the class
>> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
>> the parquet-hive-bundle-1.4.1.jar ?!?!
>> ...getting more and more confused ;)
>>
>> any help ?
>>
>> regards, Gerd
>>
>>
>> On 17 April 2014 11:55, ge ko <ko...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to select from a parquet based table in shark, but receive the
>>> error:
>>>
>>> shark> select * from wl_parquet;
>>> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
>>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run>
>>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
>>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile>
>>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
>>> wl_parquet
>>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
>>> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
>>> source tables
>>> FAILED: Hive Internal Error:
>>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>>> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
>>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>>> at
>>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
>>> at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
>>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
>>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
>>> at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
>>> at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
>>> at
>>> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
>>> at
>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>>> at shark.SharkDriver.compile(SharkDriver.scala:215)
>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>>> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
>>> at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>>> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
>>> at shark.SharkCliDriver.main(SharkCliDriver.scala)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:270)
>>> at
>>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
>>> ... 14 more
>>>
>>> I can successfully select from that table with Hive and Impala, but
>>> shark doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>>>
>>> In what jar is this class "hidden", how can I get rid of this exception
>>> ?!?!
>>>
>>> The lib folder of shark contains:
>>> [root@hadoop-pg-9 shark-0.9.1]# ll lib
>>> total 180
>>> lrwxrwxrwx 1 root root 67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar
>>> -> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
>>> -rwxrwxr-x 1 root root 23086 9. Apr 10:57 JavaEWAH-0.4.2.jar
>>> lrwxrwxrwx 1 root root 53 14. Apr 21:46 parquet-avro.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
>>> lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-cascading.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
>>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-column.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
>>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-common.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar
>>> lrwxrwxrwx 1 root root 57 14. Apr 21:46 parquet-encoding.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar
>>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-format.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar
>>> lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-generator.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar
>>> lrwxrwxrwx 1 root root 62 14. Apr 21:46 parquet-hadoop-bundle.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar
>>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-hadoop.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar
>>> -rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar
>>> lrwxrwxrwx 1 root root 56 14. Apr 21:46 parquet-scrooge.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar
>>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-thrift.jar ->
>>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar
>>> -rw-rw-r-- 1 root root 76220 9. Apr 10:57 pyrolite.jar
>>>
>>> thanks in advance, Gerd
>>>
>>
>>
>
Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Posted by Arpit Tak <ar...@mobipulse.in>.
Just for curiosity , as you are using Cloudera-Manager hadoop and spark..
How you build shark .....for it??
are you able to read any file from hdfs .......did you tried that out..???
Regards,
Arpit Tak
On Thu, Apr 17, 2014 at 7:07 PM, ge ko <ko...@gmail.com> wrote:
> Hi,
>
> the error java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
> resolved by adding
> parquet-hive-bundle-1.4.1.jar to shark's lib folder.
> Now the Hive metastore can be read successfully (also the parquet based
> table).
>
> But if I want to select from that table I receive:
>
> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
> (most recent failure: Exception failure: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>
> This is really strange, since the class
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
> the parquet-hive-bundle-1.4.1.jar ?!?!
> ...getting more and more confused ;)
>
> any help ?
>
> regards, Gerd
>
>
> On 17 April 2014 11:55, ge ko <ko...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to select from a parquet based table in shark, but receive the
>> error:
>>
>> shark> select * from wl_parquet;
>> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run>
>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile>
>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
>> wl_parquet
>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
>> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
>> source tables
>> FAILED: Hive Internal Error:
>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
>> java.lang.RuntimeException(java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>> at
>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
>> at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
>> at
>> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
>> at
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>> at shark.SharkDriver.compile(SharkDriver.scala:215)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
>> at shark.SharkCliDriver.main(SharkCliDriver.scala)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:270)
>> at
>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
>> ... 14 more
>>
>> I can successfully select from that table with Hive and Impala, but shark
>> doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>>
>> In what jar is this class "hidden", how can I get rid of this exception
>> ?!?!
>>
>> The lib folder of shark contains:
>> [root@hadoop-pg-9 shark-0.9.1]# ll lib
>> total 180
>> lrwxrwxrwx 1 root root 67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar
>> -> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
>> -rwxrwxr-x 1 root root 23086 9. Apr 10:57 JavaEWAH-0.4.2.jar
>> lrwxrwxrwx 1 root root 53 14. Apr 21:46 parquet-avro.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
>> lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-cascading.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-column.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-common.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar
>> lrwxrwxrwx 1 root root 57 14. Apr 21:46 parquet-encoding.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar
>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-format.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar
>> lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-generator.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar
>> lrwxrwxrwx 1 root root 62 14. Apr 21:46 parquet-hadoop-bundle.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar
>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-hadoop.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar
>> -rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar
>> lrwxrwxrwx 1 root root 56 14. Apr 21:46 parquet-scrooge.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar
>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-thrift.jar ->
>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar
>> -rw-rw-r-- 1 root root 76220 9. Apr 10:57 pyrolite.jar
>>
>> thanks in advance, Gerd
>>
>
>
Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Posted by ge ko <ko...@gmail.com>.
Hi,
the error java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
resolved by adding
parquet-hive-bundle-1.4.1.jar to shark's lib folder.
Now the Hive metastore can be read successfully (also the parquet based
table).
But if I want to select from that table I receive:
org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
(most recent failure: Exception failure: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
This is really strange, since the class
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
the parquet-hive-bundle-1.4.1.jar ?!?!
...getting more and more confused ;)
any help ?
regards, Gerd
On 17 April 2014 11:55, ge ko <ko...@gmail.com> wrote:
> Hi,
>
> I want to select from a parquet based table in shark, but receive the
> error:
>
> shark> select * from wl_parquet;
> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run>
> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile>
> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
> wl_parquet
> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
> source tables
> FAILED: Hive Internal Error:
> java.lang.RuntimeException(java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
> java.lang.RuntimeException(java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> at
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
> at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
> at
> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
> at shark.SharkDriver.compile(SharkDriver.scala:215)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
> at shark.SharkCliDriver.main(SharkCliDriver.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
> ... 14 more
>
> I can successfully select from that table with Hive and Impala, but shark
> doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.
>
> In what jar is this class "hidden", how can I get rid of this exception
> ?!?!
>
> The lib folder of shark contains:
> [root@hadoop-pg-9 shark-0.9.1]# ll lib
> total 180
> lrwxrwxrwx 1 root root 67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar ->
> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
> -rwxrwxr-x 1 root root 23086 9. Apr 10:57 JavaEWAH-0.4.2.jar
> lrwxrwxrwx 1 root root 53 14. Apr 21:46 parquet-avro.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar
> lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-cascading.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar
> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-column.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar
> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-common.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar
> lrwxrwxrwx 1 root root 57 14. Apr 21:46 parquet-encoding.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar
> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-format.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar
> lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-generator.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar
> lrwxrwxrwx 1 root root 62 14. Apr 21:46 parquet-hadoop-bundle.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar
> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-hadoop.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar
> -rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar
> lrwxrwxrwx 1 root root 56 14. Apr 21:46 parquet-scrooge.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar
> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-thrift.jar ->
> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar
> -rw-rw-r-- 1 root root 76220 9. Apr 10:57 pyrolite.jar
>
> thanks in advance, Gerd
>