You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by rv...@apache.org on 2015/03/04 12:04:18 UTC
svn commit: r1663934 -
/jena/site/trunk/content/documentation/hadoop/io.mdtext
Author: rvesse
Date: Wed Mar 4 11:04:18 2015
New Revision: 1663934
URL: http://svn.apache.org/r1663934
Log:
Add notes on configuring compression with Elephas
Modified:
jena/site/trunk/content/documentation/hadoop/io.mdtext
Modified: jena/site/trunk/content/documentation/hadoop/io.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/hadoop/io.mdtext?rev=1663934&r1=1663933&r2=1663934&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/hadoop/io.mdtext (original)
+++ jena/site/trunk/content/documentation/hadoop/io.mdtext Wed Mar 4 11:04:18 2015
@@ -18,7 +18,16 @@ In some cases there are file formats tha
Hadoop natively provides support for compressed input and output providing your Hadoop cluster is appropriately configured. The advantage of compressing the input/output data is that it means there is less IO workload on the cluster however this comes with the disadvantage that most compression formats block Hadoop's ability to *split* up the input.
-Hadoop generally handles compression automatically and all our input and output formats are capable of handling compressed input and output as necessary.
+Hadoop generally handles compression automatically and all our input and output formats are capable of handling compressed input and output as necessary. However in order to use this your Hadoop cluster/job configuration must be appropriately configured to inform Hadoop about what compression codecs are in use.
+
+For example to enable BZip2 compression (assuming your cluster doesn't enable this by default):
+
+ // Assumes you already have a Configuration object you are preparing
+ // in the variable config
+
+ config.set(HadoopIOConstants.IO_COMPRESSION_CODECS, BZip2Codec.class.getCanonicalName());
+
+See the Javadocs for the Hadoop [CompressionCodec](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html) API to see the available out of the box implementations. Note that some clusters may provide additional compression codecs beyond those built directly into Hadoop.
# RDF IO in Hadoop