You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2006/08/07 20:07:18 UTC
[Lucene-hadoop Wiki] Update of "PythonWordCount" by OwenOMalley
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/PythonWordCount
New page:
= WordCount Example in Python =
This is the WordCount example completely translated into [http://python.org/ Python] and translated using [http://www.jython.org/Project/index.html Jython] into a Java jar file.
The program reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occured, separated by a tab.
Each mapper takes a line as input and breaks it into words. It then emits a key/value pair of the word and 1. Each reducer sums the counts for each word and emits a single key/value with the word and sum.
As an optimization, the reducer is also used as a combiner on the map outputs. This reduces the amount of data sent across the network by combining each word into a single record.
To compile the example, build the Hadoop code:{{{
ant
cd src/examples/python
./compile
}}}
To run the example, the command syntax is: {{{
../../../bin/hadoop jar wc.jar [-m <#maps>] [-r <#reducers>] \
<in-dir> <out-dir>
}}}