You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Marcin Cylke <ma...@ext.allegro.pl> on 2015/07/15 13:33:53 UTC

compression behaviour inconsistency between 1.3 and 1.4

Hi

I've observed an inconsistent behaviour in .saveAsTextFile. 

Up until version 1.3 it was possible to save RDDs as snappy compressed
files with the invocation of

rdd.saveAsTextFile(targetFile)

but after upgrading to 1.4 this no longer works. I need to specify a
codec for that:

rdd.saveAsTextFile(targetFile, classOf[SnappyCodec])

As I understand I should be able to either set the appropriate codec
class or set those options globally on the cluster using properties. I
have the following settings in /etc/hadoop/conf/core-site.xml

        <property>
                <name>mapred.map.output.compression.codec</name>
                <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        </property>

        <property>
                <name>mapred.compress.map.output</name>
                <value>false</value>
        </property>

        <property>
                <name>mapred.output.compression.codec</name>
                <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        </property>

The config hasn't changed between upgrading from 1.3 to 1.4.

What is the proper behaviour? Am I doing something strange here or has
this recently changed?

Regards
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org