You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by rv...@apache.org on 2015/03/04 12:04:18 UTC

svn commit: r1663934 - /jena/site/trunk/content/documentation/hadoop/io.mdtext

Author: rvesse
Date: Wed Mar  4 11:04:18 2015
New Revision: 1663934

URL: http://svn.apache.org/r1663934
Log:
Add notes on configuring compression with Elephas

Modified:
    jena/site/trunk/content/documentation/hadoop/io.mdtext

Modified: jena/site/trunk/content/documentation/hadoop/io.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/hadoop/io.mdtext?rev=1663934&r1=1663933&r2=1663934&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/hadoop/io.mdtext (original)
+++ jena/site/trunk/content/documentation/hadoop/io.mdtext Wed Mar  4 11:04:18 2015
@@ -18,7 +18,16 @@ In some cases there are file formats tha
 
 Hadoop natively provides support for compressed input and output providing your Hadoop cluster is appropriately configured.  The advantage of compressing the input/output data is that it means there is less IO workload on the cluster however this comes with the disadvantage that most compression formats block Hadoop's ability to *split* up the input.
 
-Hadoop generally handles compression automatically and all our input and output formats are capable of handling compressed input and output as necessary.
+Hadoop generally handles compression automatically and all our input and output formats are capable of handling compressed input and output as necessary.  However in order to use this your Hadoop cluster/job configuration must be appropriately configured to inform Hadoop about what compression codecs are in use.
+
+For example to enable BZip2 compression (assuming your cluster doesn't enable this by default):
+
+    // Assumes you already have a Configuration object you are preparing 
+    // in the variable config
+    
+    config.set(HadoopIOConstants.IO_COMPRESSION_CODECS, BZip2Codec.class.getCanonicalName());
+
+See the Javadocs for the Hadoop [CompressionCodec](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html) API to see the available out of the box implementations.  Note that some clusters may provide additional compression codecs beyond those built directly into Hadoop.
 
 # RDF IO in Hadoop