You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by rogthefrog <ro...@amino.com> on 2014/09/18 02:40:26 UTC

LZO support in Spark 1.0.0 - nothing seems to work

I have a HDFS cluster managed with CDH Manager. Version is CDH 5.1 with
matching GPLEXTRAS parcel. LZO works with Hive and Pig, but I can't make it
work with Spark 1.0.0. I've tried:

* Setting this:

HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS
-Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/"

* Setting this in spark-env.sh. I tried with and without "export". I tried
in CDH Manager and manually on the host.

export
SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
export
SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/

* Setting this in /etc/spark/conf/spark-defaults.conf:

spark.executor.extraLibraryPath 
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
spark.spark.executor.extraClassPath
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar

* Adding this in CDH manager:

export LD_LIBRARY_PATH=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native

* Hardcoding
-Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native in
the Spark command 

* Symlinking the gpl compression binaries into
/opt/cloudera/parcels/CDH/lib/hadoop/lib/native

* Symlinking the gpl compression binaries into /usr/lib

And nothing worked. When I run pyspark I get this:

14/09/17 20:38:54 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

and when I try to run a simple job on a LZO file in HDFS I get this:

distFile.count()
14/09/17 13:51:54 ERROR GPLNativeCodeLoader: Could not load native gpl
library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
       at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
       at java.lang.Runtime.loadLibrary0(Runtime.java:849)
       at java.lang.System.loadLibrary(System.java:1088)
       at
com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
       at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)

Can anybody help please? Many thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: LZO support in Spark 1.0.0 - nothing seems to work

Posted by rogthefrog <ro...@amino.com>.

That does appear to be the case. Thanks!

For posterity, I ran my pyspark like this:

$ sudo su yarn
$ pyspark --driver-library-path
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/

>>> p = sc.textFile("/some/file")
>>> p.count()

everything appears to be working now.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494p14498.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: LZO support in Spark 1.0.0 - nothing seems to work

Posted by Tim Smith <se...@gmail.com>.

I believe this is a known bug:
https://issues.apache.org/jira/browse/SPARK-1719

On Wed, Sep 17, 2014 at 5:40 PM, rogthefrog <ro...@amino.com> wrote:
> I have a HDFS cluster managed with CDH Manager. Version is CDH 5.1 with
> matching GPLEXTRAS parcel. LZO works with Hive and Pig, but I can't make it
> work with Spark 1.0.0. I've tried:
>
> * Setting this:
>
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/"
>
> * Setting this in spark-env.sh. I tried with and without "export". I tried
> in CDH Manager and manually on the host.
>
> export
> SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
> export
> SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/
>
> * Setting this in /etc/spark/conf/spark-defaults.conf:
>
> spark.executor.extraLibraryPath
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
> spark.spark.executor.extraClassPath
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
>
> * Adding this in CDH manager:
>
> export LD_LIBRARY_PATH=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
>
> * Hardcoding
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native in
> the Spark command
>
> * Symlinking the gpl compression binaries into
> /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
>
> * Symlinking the gpl compression binaries into /usr/lib
>
> And nothing worked. When I run pyspark I get this:
>
> 14/09/17 20:38:54 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> and when I try to run a simple job on a LZO file in HDFS I get this:
>
> distFile.count()
> 14/09/17 13:51:54 ERROR GPLNativeCodeLoader: Could not load native gpl
> library
> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
>        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
>        at java.lang.Runtime.loadLibrary0(Runtime.java:849)
>        at java.lang.System.loadLibrary(System.java:1088)
>        at
> com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
>        at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
>
> Can anybody help please? Many thanks.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: LZO support in Spark 1.0.0 - nothing seems to work

Posted by Vipul Pandey <vi...@gmail.com>.

It works for me : 


export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native

export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/hadoop-lzo-cdh4-0.4.15-gplextras.jar


I hope you are adding this to the code : 

    val conf = sc.hadoopConfiguration
    conf.set("io.compression.codecs","com.hadoop.compression.lzo.LzopCodec")



Vipul

On Sep 17, 2014, at 5:40 PM, rogthefrog <ro...@amino.com> wrote:

> I have a HDFS cluster managed with CDH Manager. Version is CDH 5.1 with
> matching GPLEXTRAS parcel. LZO works with Hive and Pig, but I can't make it
> work with Spark 1.0.0. I've tried:
> 
> * Setting this:
> 
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/"
> 
> * Setting this in spark-env.sh. I tried with and without "export". I tried
> in CDH Manager and manually on the host.
> 
> export
> SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
> export
> SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/
> 
> * Setting this in /etc/spark/conf/spark-defaults.conf:
> 
> spark.executor.extraLibraryPath 
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
> spark.spark.executor.extraClassPath
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
> 
> * Adding this in CDH manager:
> 
> export LD_LIBRARY_PATH=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
> 
> * Hardcoding
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native in
> the Spark command 
> 
> * Symlinking the gpl compression binaries into
> /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
> 
> * Symlinking the gpl compression binaries into /usr/lib
> 
> And nothing worked. When I run pyspark I get this:
> 
> 14/09/17 20:38:54 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 
> and when I try to run a simple job on a LZO file in HDFS I get this:
> 
> distFile.count()
> 14/09/17 13:51:54 ERROR GPLNativeCodeLoader: Could not load native gpl
> library
> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
>       at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
>       at java.lang.Runtime.loadLibrary0(Runtime.java:849)
>       at java.lang.System.loadLibrary(System.java:1088)
>       at
> com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
>       at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
> 
> Can anybody help please? Many thanks.
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

MESOS slaves shut down due to "'health check timed out"

Posted by Yangcheng Huang <ya...@huawei.com>.

Hi guys

Do you know how to handle the following case -

===== From MESOS log file =====
Slave asked to shut down by master@....:5050 because 'health
check timed out'
I1107 17:33:20.860988 27573 slave.cpp:1337] Asked to shut down framework ....
===============================

Any configurations to increase this timeout interval?

Thanks
YC

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: LZO support in Spark 1.0.0 - nothing seems to work

Posted by Sree Harsha <99...@gmail.com>.

@rogthefrog

Were you able to figure out how to fix this issue? 
Even I tried all combinations that possible but no luck yet.

Thanks,
Harsha



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494p18349.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org