You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2008/10/09 20:53:39 UTC

[Hadoop Wiki] Update of "Grep" by DanielNaber

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by DanielNaber:
http://wiki.apache.org/hadoop/Grep

------------------------------------------------------------------------------
  = Grep Example =
- '''Grep''' example extracts matching strings from text files and count how many time they occured.
+ '''Grep''' example extracts matching strings from text files and counts how many time they occured.
+ 
+ To run the example, type the following command:[[BR]]
+ {{{bin/hadoop org.apache.hadoop.examples.Grep <indir> <outdir> <regex> [<group>]}}}
+ 
+ The command works different than the Unix {{{grep}}} call: it doesn't display the complete matching line, but only the matching string, so in order to display lines matching "foo", use {{{.*foo.*}}} as a regular expression.
  
  The program runs two map/reduce jobs in sequence. The first job counts how many times a matching string occured and the second job sorts matching strings by their frequency and stores the output in a single output file.
   
@@ -11, +16 @@

  
  The example also demonstrates how to pass a command-line parameter to a mapper or a reducer. This is done by adding (key, value) pairs to the job's configuration before the job is submitted. Map or reduce tasks are able to access the value by getting it from the job's configuration in the method ''configure''.
  
+ Grep supports generic options: see DevelopmentCommandLineOptions 
- To run the example, type the following command:[[BR]]
- bin/hadoop org.apache.hadoop.examples.Grep <indir> <outdir> <regex> [<group>]
  
- Grep supports generic options : see DevelopmentCommandLineOptions 
-