You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Vamshi Krishna <va...@gmail.com> on 2012/02/14 15:58:59 UTC

how to specify key and value for an input to mapreduce job

Hi all,
i have a job which read all the rows from a hbase table and had written
them to a location in dfs i.e  /user/HSOP. HSOP is a folder which has 9
files each having their content as
00015DEGgJ    -HM
00016Pc4Tl    -HM
0001H0iImI    -HM
0001Oyb0Ju    -HM
0001hwBEOr    -HM
0002Qx2Uj9    -HM
0002jCs6gr    -HM
0003PMcWRa    -HM
000488xKIE    -HM

Both 1st and second columns are of Text type as specified in the first
job's outputformat class.

Now i want onemore job to read all these files as input and and treat first
column  element as "key" and second column  element as "value". For that i
tried starting one job by specifying  line
job.getConfiguration().set("key.value.separator.in.input.line", "-");

In the reduce() method i had context.write(key, value);  key is
Longwritable and value is Text. But if i see the output of this job, i had
seen the format like,

46    0002mCjpo9    -HM
253    000AxT9LSA    -HM
460    000FYtnxiB    -HM
667    000WNVBo9N    -HM
874    000dQiseKz    -HM

But i don't want first column to be added to each row. Please how to do
that,
somebody help.

RE: how to specify key and value for an input to mapreduce job

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Vamshi,

1. To read the input which have both key and value in text format you can use
KeyValueTextInputFormat inside org.apache.hadoop.mapreduce.lib.input package as InputFormat class to your Job. This Input format will have KeyValueLineRecordReader which will read the line and separate the key and value present in the same line.
Here you need to set the keyValue separator using following configuration in the job configuration.
"mapreduce.input.keyvaluelinerecordreader.key.value.separator"
Be default this will be '\t'.

2. Reduce output will be default TextOutputFormat with LongWritable key and Text value.
In Your case u need to have Text as both Key and Value.
Since you were using default TextInputFormat, u were getting complete line as the Value and the position as the key. Now if you use KeyValueTextInputFormat you will get the desired result.

Thanks and Regards,
Vinayakumar B
______________________
________________________________
From: Vamshi Krishna [vamshi2105@gmail.com]
Sent: Tuesday, February 14, 2012 8:28 PM
To: mapreduce-user@hadoop.apache.org
Subject: how to specify key and value for an input to mapreduce job

Hi all,
i have a job which read all the rows from a hbase table and had written them to a location in dfs i.e  /user/HSOP. HSOP is a folder which has 9 files each having their content as
00015DEGgJ    -HM
00016Pc4Tl    -HM
0001H0iImI    -HM
0001Oyb0Ju    -HM
0001hwBEOr    -HM
0002Qx2Uj9    -HM
0002jCs6gr    -HM
0003PMcWRa    -HM
000488xKIE    -HM

Both 1st and second columns are of Text type as specified in the first job's outputformat class.

Now i want onemore job to read all these files as input and and treat first column  element as "key" and second column  element as "value". For that i tried starting one job by specifying  line job.getConfiguration().set("key.value.separator.in.input.line", "-");

In the reduce() method i had context.write(key, value);  key is Longwritable and value is Text. But if i see the output of this job, i had seen the format like,

46    0002mCjpo9    -HM
253    000AxT9LSA    -HM
460    000FYtnxiB    -HM
667    000WNVBo9N    -HM
874    000dQiseKz    -HM

But i don't want first column to be added to each row. Please how to do that,
somebody help.