You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Uddin, Nasir M." <nu...@dtcc.com> on 2014/07/01 16:15:37 UTC

Spark 1.0: Unable to Read LZO Compressed File

Dear Spark Users:

Spark 1.0 has been installed as Standalone - But it can't read any compressed (CMX/Snappy) and Sequence file residing on HDFS (it can read uncompressed files from HDFS). The key notable message is: "Unable to load native-hadoop library.....". Other related messages are -

Caused by: java.lang.IllegalStateException: Cannot load com.ibm.biginsights.compress.CmxDecompressor without native library! at com.ibm.biginsights.compress.CmxDecompressor.<clinit>(CmxDecompressor.java:65)

Here is the core-site.xml's key part:
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec,com.ibm.biginsights.compress.CmxCodec</value>
</property>

Here is the spark.env.sh:
export SPARK_WORKER_CORES=4
export SPARK_WORKER_MEMORY=10g
export SCALA_HOME=/opt/spark/scala-2.11.1
export JAVA_HOME=/opt/spark/jdk1.7.0_55
export SPARK_HOME=/opt/spark/spark-0.9.1-bin-hadoop2
export ADD_JARS=/opt/IHC/lib/compression.jar
export SPARK_CLASSPATH=/opt/IHC/lib/compression.jar
export SPARK_LIBRARY_PATH=/opt/IHC/lib/native/Linux-amd64-64/
export SPARK_MASTER_WEBUI_PORT=1080
export HADOOP_CONF_DIR=/opt/IHC/hadoop-conf

Note: core-site.xml and hdfs-site.xml are in hadoop-conf. CMX is an IBM branded splittable LZO based compression codec.

Any help to resolve the issue is appreciated.

Thanks,
Nasir
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

Re: Spark 1.0: Unable to Read LZO Compressed File

Posted by Matei Zaharia <ma...@gmail.com>.

I’d suggest asking the IBM Hadoop folks, but my guess is that the library cannot be found in /opt/IHC/lib/native/Linux-amd64-64/. Or maybe if this exception is happening in your driver program, the driver program’s java.library.path doesn’t include this. (SPARK_LIBRARY_PATH from spark-env.sh only applies to stuff launched on the clusters).

Matei

On Jul 1, 2014, at 7:15 AM, Uddin, Nasir M. <nu...@dtcc.com> wrote:

> Dear Spark Users:
>  
> Spark 1.0 has been installed as Standalone – But it can’t read any compressed (CMX/Snappy) and Sequence file residing on HDFS (it can read uncompressed files from HDFS). The key notable message is: “Unable to load native-hadoop library…..”. Other related messages are –
>  
> Caused by: java.lang.IllegalStateException: Cannot load com.ibm.biginsights.compress.CmxDecompressor without native library! at com.ibm.biginsights.compress.CmxDecompressor.<clinit>(CmxDecompressor.java:65)
>  
> Here is the core-site.xml’s key part:
> <name>io.compression.codecs</name>
> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec,com.ibm.biginsights.compress.CmxCodec</value>
>   </property>
>  
> Here is the spark.env.sh:
> export SPARK_WORKER_CORES=4
> export SPARK_WORKER_MEMORY=10g
> export SCALA_HOME=/opt/spark/scala-2.11.1
> export JAVA_HOME=/opt/spark/jdk1.7.0_55
> export SPARK_HOME=/opt/spark/spark-0.9.1-bin-hadoop2
> export ADD_JARS=/opt/IHC/lib/compression.jar
> export SPARK_CLASSPATH=/opt/IHC/lib/compression.jar
> export SPARK_LIBRARY_PATH=/opt/IHC/lib/native/Linux-amd64-64/
> export SPARK_MASTER_WEBUI_PORT=1080
> export HADOOP_CONF_DIR=/opt/IHC/hadoop-conf
>  
> Note: core-site.xml and hdfs-site.xml are in hadoop-conf. CMX is an IBM branded splittable LZO based compression codec.
>  
> Any help to resolve the issue is appreciated.
>  
> Thanks,
> Nasir
> 
> DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.