You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "leosandylh@gmail.com" <le...@gmail.com> on 2014/01/08 16:02:36 UTC

native-lzo / gpl lib

HI,
    I do a query from shark , it read a compress data from hdfs . but spark could't find the native-lzo lib .

14/01/08 22:58:21 ERROR executor.Executor: Exception in task ID 286
java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:175)
at org.apache.hadoop.hive.ql.io.CodecPool.getDecompressor(CodecPool.java:122)
at org.apache.hadoop.hive.ql.io.RCFile$Reader.init(RCFile.java:1299)
at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1139)
at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1118)
at org.apache.hadoop.hive.ql.io.RCFileRecordReader.<init>(RCFileRecordReader.java:52)
at org.apache.hadoop.hive.ql.io.RCFileInputFormat.getRecordReader(RCFileInputFormat.java:57)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:93)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:83)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:51)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:36)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:29)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at org.apache.spark.rdd.MapPartitionsWithIndexRDD.compute(MapPartitionsWithIndexRDD.scala:40)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at org.apache.spark.rdd.MapPartitionsWithIndexRDD.compute(MapPartitionsWithIndexRDD.scala:40)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at org.apache.spark.scheduler.ResultTask.run(ResultTask.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

can anyone give me the hint

thank you !




leosandylh@gmail.com

Re: native-lzo / gpl lib

Posted by Andrew Ash <an...@andrewash.com>.
To get shark on LZO files working (I have it up and running with CDH4.4.0)
you first need the hadoop-lzo jar on the classpath for shark (and spark).
 Hadoop-lzo seems to require its native code component, unlike Hadoop which
can run non-native if it can't find native.  So you'll need to add
hadoop-lzo's native component to the library path too.

Here's an excerpt from my puppet module that does these things.  Edit
accordingly and put these two rows into your shark-env.sh

export SPARK_LIBRARY_PATH="<%= scope['common::masterBaseDir']
%>/hadoop-current/lib/native/"
export SPARK_CLASSPATH="<%= scope['common::masterBaseDir']
%>/hadoop-current/lib/hadoop-lzo.jar"

And here's what I have in hadoop-current/lib/native:

[user@machine hadoop-current]$ ls
bin   hadoop-ant-2.0.0-mr1-cdh4.4.0.jar
hadoop-examples-2.0.0-mr1-cdh4.4.0.jar  hadoop-tools-2.0.0-mr1-cdh4.4.0.jar
 lib      logs  webapps
conf  hadoop-core-2.0.0-mr1-cdh4.4.0.jar
 hadoop-test-2.0.0-mr1-cdh4.4.0.jar      include
   libexec  sbin
[user@machine hadoop-current]$ ls lib/native/
libgplcompression.a  libgplcompression.la  libgplcompression.so
 libgplcompression.so.0  libgplcompression.so.0.0.0  Linux-amd64-64
[user@machine hadoop-current]$


Does that help?

Andrew


On Wed, Jan 8, 2014 at 7:02 AM, leosandylh@gmail.com
<le...@gmail.com>wrote:

>  HI,
>     I do a query from shark , it read a compress data from hdfs . but
> spark could't find the native-lzo lib .
>
>  14/01/08 22:58:21 ERROR executor.Executor: Exception in task ID 286
> java.lang.RuntimeException: native-lzo library not available
> at
> com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:175)
> at
> org.apache.hadoop.hive.ql.io.CodecPool.getDecompressor(CodecPool.java:122)
> at org.apache.hadoop.hive.ql.io.RCFile$Reader.init(RCFile.java:1299)
> at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1139)
> at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1118)
> at
> org.apache.hadoop.hive.ql.io.RCFileRecordReader.<init>(RCFileRecordReader.java:52)
> at
> org.apache.hadoop.hive.ql.io.RCFileInputFormat.getRecordReader(RCFileInputFormat.java:57)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:93)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:83)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:51)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:36)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:29)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:69)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> at
> org.apache.spark.rdd.MapPartitionsWithIndexRDD.compute(MapPartitionsWithIndexRDD.scala:40)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> at
> org.apache.spark.rdd.MapPartitionsWithIndexRDD.compute(MapPartitionsWithIndexRDD.scala:40)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> at org.apache.spark.scheduler.ResultTask.run(ResultTask.scala:99)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
>
>  can anyone give me the hint
>
> thank you !
>
> ------------------------------
>  leosandylh@gmail.com
>