You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by rogthefrog <ro...@amino.com> on 2014/09/18 02:40:26 UTC
LZO support in Spark 1.0.0 - nothing seems to work
I have a HDFS cluster managed with CDH Manager. Version is CDH 5.1 with
matching GPLEXTRAS parcel. LZO works with Hive and Pig, but I can't make it
work with Spark 1.0.0. I've tried:
* Setting this:
HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS
-Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/"
* Setting this in spark-env.sh. I tried with and without "export". I tried
in CDH Manager and manually on the host.
export
SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
export
SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/
* Setting this in /etc/spark/conf/spark-defaults.conf:
spark.executor.extraLibraryPath
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
spark.spark.executor.extraClassPath
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
* Adding this in CDH manager:
export LD_LIBRARY_PATH=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
* Hardcoding
-Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native in
the Spark command
* Symlinking the gpl compression binaries into
/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
* Symlinking the gpl compression binaries into /usr/lib
And nothing worked. When I run pyspark I get this:
14/09/17 20:38:54 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
and when I try to run a simple job on a LZO file in HDFS I get this:
distFile.count()
14/09/17 13:51:54 ERROR GPLNativeCodeLoader: Could not load native gpl
library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
at java.lang.Runtime.loadLibrary0(Runtime.java:849)
at java.lang.System.loadLibrary(System.java:1088)
at
com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
Can anybody help please? Many thanks.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: LZO support in Spark 1.0.0 - nothing seems to work
Posted by rogthefrog <ro...@amino.com>.
That does appear to be the case. Thanks!
For posterity, I ran my pyspark like this:
$ sudo su yarn
$ pyspark --driver-library-path
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/
>>> p = sc.textFile("/some/file")
>>> p.count()
everything appears to be working now.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494p14498.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: LZO support in Spark 1.0.0 - nothing seems to work
Posted by Tim Smith <se...@gmail.com>.
I believe this is a known bug:
https://issues.apache.org/jira/browse/SPARK-1719
On Wed, Sep 17, 2014 at 5:40 PM, rogthefrog <ro...@amino.com> wrote:
> I have a HDFS cluster managed with CDH Manager. Version is CDH 5.1 with
> matching GPLEXTRAS parcel. LZO works with Hive and Pig, but I can't make it
> work with Spark 1.0.0. I've tried:
>
> * Setting this:
>
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/"
>
> * Setting this in spark-env.sh. I tried with and without "export". I tried
> in CDH Manager and manually on the host.
>
> export
> SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
> export
> SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/
>
> * Setting this in /etc/spark/conf/spark-defaults.conf:
>
> spark.executor.extraLibraryPath
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
> spark.spark.executor.extraClassPath
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
>
> * Adding this in CDH manager:
>
> export LD_LIBRARY_PATH=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
>
> * Hardcoding
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native in
> the Spark command
>
> * Symlinking the gpl compression binaries into
> /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
>
> * Symlinking the gpl compression binaries into /usr/lib
>
> And nothing worked. When I run pyspark I get this:
>
> 14/09/17 20:38:54 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> and when I try to run a simple job on a LZO file in HDFS I get this:
>
> distFile.count()
> 14/09/17 13:51:54 ERROR GPLNativeCodeLoader: Could not load native gpl
> library
> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
> at java.lang.Runtime.loadLibrary0(Runtime.java:849)
> at java.lang.System.loadLibrary(System.java:1088)
> at
> com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
> at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
>
> Can anybody help please? Many thanks.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: LZO support in Spark 1.0.0 - nothing seems to work
Posted by Vipul Pandey <vi...@gmail.com>.
It works for me :
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/hadoop-lzo-cdh4-0.4.15-gplextras.jar
I hope you are adding this to the code :
val conf = sc.hadoopConfiguration
conf.set("io.compression.codecs","com.hadoop.compression.lzo.LzopCodec")
Vipul
On Sep 17, 2014, at 5:40 PM, rogthefrog <ro...@amino.com> wrote:
> I have a HDFS cluster managed with CDH Manager. Version is CDH 5.1 with
> matching GPLEXTRAS parcel. LZO works with Hive and Pig, but I can't make it
> work with Spark 1.0.0. I've tried:
>
> * Setting this:
>
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/"
>
> * Setting this in spark-env.sh. I tried with and without "export". I tried
> in CDH Manager and manually on the host.
>
> export
> SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
> export
> SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/
>
> * Setting this in /etc/spark/conf/spark-defaults.conf:
>
> spark.executor.extraLibraryPath
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
> spark.spark.executor.extraClassPath
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
>
> * Adding this in CDH manager:
>
> export LD_LIBRARY_PATH=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
>
> * Hardcoding
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native in
> the Spark command
>
> * Symlinking the gpl compression binaries into
> /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
>
> * Symlinking the gpl compression binaries into /usr/lib
>
> And nothing worked. When I run pyspark I get this:
>
> 14/09/17 20:38:54 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> and when I try to run a simple job on a LZO file in HDFS I get this:
>
> distFile.count()
> 14/09/17 13:51:54 ERROR GPLNativeCodeLoader: Could not load native gpl
> library
> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
> at java.lang.Runtime.loadLibrary0(Runtime.java:849)
> at java.lang.System.loadLibrary(System.java:1088)
> at
> com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
> at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
>
> Can anybody help please? Many thanks.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
MESOS slaves shut down due to "'health check timed out"
Posted by Yangcheng Huang <ya...@huawei.com>.
Hi guys
Do you know how to handle the following case -
===== From MESOS log file =====
Slave asked to shut down by master@....:5050 because 'health
check timed out'
I1107 17:33:20.860988 27573 slave.cpp:1337] Asked to shut down framework ....
===============================
Any configurations to increase this timeout interval?
Thanks
YC
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: LZO support in Spark 1.0.0 - nothing seems to work
Posted by Sree Harsha <99...@gmail.com>.
@rogthefrog
Were you able to figure out how to fix this issue?
Even I tried all combinations that possible but no luck yet.
Thanks,
Harsha
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494p18349.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org