You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2006/06/28 22:59:23 UTC

[Lucene-hadoop Wiki] Update of "RandomWriter" by HairongKuang

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by HairongKuang:
http://wiki.apache.org/lucene-hadoop/RandomWriter

New page:
'''RandomWriter''' example

''RandomWriter" example implements a distributed random writer. Each map takes a file name as input and writes one Gig of random data to the DFS sequence file by generating multiple records, each with a random key and a random value. A mapper does not emit any output and the reduce phase is null.

This example uses a useful pattern for dealing with Hadoop's constraints on !InputSplits. Since each input split can only consist of a file and byte range and we want to control how many maps there are (and we don't really have any inputs), we create a directory with a set of artificial files, each of which contains the filename that we want a given map to write to. Then, using the text line reader and this "fake" input directory, we generate exactly the right number of maps. Each map gets a single record that is the filename, to which it is supposed to write its output. 

To run the example, the command syntax is[[BR]]
bin/hadoop org.apache.hadoop.examples.RandomWriter <out-dir> [<configuration file>]