You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Joey Echeverria <jo...@cloudera.com> on 2011/05/05 13:39:41 UTC

Re: How hadoop parse input files into (Key,Value) pairs ??

Hadoop uses an InputFormat class to parse files and generate key,
value pairs for your Mapper. An InputFormat is any class which extends
the base abstract class:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/InputFormat.html

The default InputFormat parse text files generating keys which are
byte offsets and values which are complete lines of text:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/InputFormat.html

You can write your own InputFormat and configure your job to use it by
calling setInputFormat() on your Job before submitting it:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/Job.html#setInputFormatClass(java.lang.Class)

Hope that helps.

-Joey

P.S. I moved this over to the mapreduce-user alias since it's
MapReduce specific.

On Thu, May 5, 2011 at 7:31 AM, praveenesh kumar <pr...@gmail.com> wrote:
> Hi,
>
> As we know hadoop mapper takes input as (Key,Value) pairs and generate
> intermediate (Key,Value) pairs and usually we give input to our Mapper as a
> text file.
> How hadoop understand this and parse our input text file into (Key,Value)
> Pairs
>
> Usually our mapper looks like  --
> *public* *void* map(LongWritable key, Text value,OutputCollector<Text, Text>
> outputCollector, Reporter reporter) *throws* IOException {
>
> String word = value.toString();
>
> //Some lines of code
>
> }
>
> So if I pass any text file as input, it is taking every line as VALUE to
> Mapper..on which I will do some processing and put it to OutputCollector.
> But how hadoop parsed my text file into ( Key,Value ) pair and how can we
> tell hadoop what (key,value) it should give to mapper ??
>
> Thanks.
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434