You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark <st...@gmail.com> on 2010/08/26 20:07:53 UTC

KeyValueTextInputFormat

  When I configure my job to use a KeyValueTextInputFormat doesn't that 
imply that the key and value to my mapper will be both Text?

I have it set up like this and I am using the default Mapper.class ie 
IdentityMapper
- KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));

but I keep receiving this error:
- java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot 
be cast to org.apache.hadoop.io.Text

I would expect this error if I was using the FileInputFormat because 
that return the key as a LongWritable and the value as Text but I am 
unsure of why its happening here.

Also on the same note, when I supply FileInputFormat or 
KeyValueTextInputFormat does that implicitly set 
job.setMapOutputKeyClass and job.setMapOutputValueClass. When are these 
used?

Thanks for the clarification





Re: KeyValueTextInputFormat

Posted by newpant <ne...@gmail.com>.
Hi Mark, for example, i have a dataset which contains the weather data like
the example in Hadoop: The Definitive Guide, the file is ascii-encoded,
and we use TextInputFormat, so, the input key and value type  of mapper is
LongWritable(the line offset in bytes) and Text(data record),
mapper will parser the record and output the year and temperature, year is
a Text, temperature is a IntWritable. Reducer will take the mapper output,
find the maximum value for a given key.

in this case, the mapper input type is : <LongWritable, Text> and output
type is <Text, IntWritable>. Reducer's input type is <Text,
Iterable<IntWritable>>, and output a <Text, IntWritable>


2010/8/27 Mark <st...@gmail.com>

>   On 8/26/10 7:47 PM, newpant wrote:
>
>> Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to
>> set
>> the input format class ? Default input format class is TextInputFormat,
>> and
>> the Key type is LongWritable, which store offset of lines in the file (in
>> byte)
>>
>> if your reducer accept a different key or value from mapper output, you
>> need
>> to setMapOutputKeyClass and setMapOutputValueClass
>>
>> 2010/8/27 Mark<st...@gmail.com>
>>
>>  When I configure my job to use a KeyValueTextInputFormat doesn't that
>>> imply that the key and value to my mapper will be both Text?
>>>
>>> I have it set up like this and I am using the default Mapper.class ie
>>> IdentityMapper
>>> - KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>>
>>> but I keep receiving this error:
>>> - java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
>>> be
>>> cast to org.apache.hadoop.io.Text
>>>
>>> I would expect this error if I was using the FileInputFormat because that
>>> return the key as a LongWritable and the value as Text but I am unsure of
>>> why its happening here.
>>>
>>> Also on the same note, when I supply FileInputFormat or
>>> KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
>>> and job.setMapOutputValueClass. When are these used?
>>>
>>> Thanks for the clarification
>>>
>>>
>>>
>>>
>>>
>>> No I didnt set that and when I did everything worked as expected. I
> thought if I used:
>
>
> KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]))
>
>
> it would set that for me or at lest know that it would be text/text as
> input. Im guessing that is wrong.
>
>
> if your reducer accept a different key or value from mapper output, you
> need
> to setMapOutputKeyClass and setMapOutputValueClass
>
> When would this ever come up? Does it just cast to the appropriate classes
> then?
>
> Thanks
>
>

Re: KeyValueTextInputFormat

Posted by Mark <st...@gmail.com>.
  On 8/26/10 7:47 PM, newpant wrote:
> Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to set
> the input format class ? Default input format class is TextInputFormat, and
> the Key type is LongWritable, which store offset of lines in the file (in
> byte)
>
> if your reducer accept a different key or value from mapper output, you need
> to setMapOutputKeyClass and setMapOutputValueClass
>
> 2010/8/27 Mark<st...@gmail.com>
>
>>   When I configure my job to use a KeyValueTextInputFormat doesn't that
>> imply that the key and value to my mapper will be both Text?
>>
>> I have it set up like this and I am using the default Mapper.class ie
>> IdentityMapper
>> - KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>
>> but I keep receiving this error:
>> - java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to org.apache.hadoop.io.Text
>>
>> I would expect this error if I was using the FileInputFormat because that
>> return the key as a LongWritable and the value as Text but I am unsure of
>> why its happening here.
>>
>> Also on the same note, when I supply FileInputFormat or
>> KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
>> and job.setMapOutputValueClass. When are these used?
>>
>> Thanks for the clarification
>>
>>
>>
>>
>>
No I didnt set that and when I did everything worked as expected. I 
thought if I used:

KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]))


it would set that for me or at lest know that it would be text/text as 
input. Im guessing that is wrong.

if your reducer accept a different key or value from mapper output, you need
to setMapOutputKeyClass and setMapOutputValueClass

When would this ever come up? Does it just cast to the appropriate 
classes then?

Thanks


Re: KeyValueTextInputFormat

Posted by newpant <ne...@gmail.com>.
Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to set
the input format class ? Default input format class is TextInputFormat, and
the Key type is LongWritable, which store offset of lines in the file (in
byte)

if your reducer accept a different key or value from mapper output, you need
to setMapOutputKeyClass and setMapOutputValueClass

2010/8/27 Mark <st...@gmail.com>

>  When I configure my job to use a KeyValueTextInputFormat doesn't that
> imply that the key and value to my mapper will be both Text?
>
> I have it set up like this and I am using the default Mapper.class ie
> IdentityMapper
> - KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
>
> but I keep receiving this error:
> - java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.Text
>
> I would expect this error if I was using the FileInputFormat because that
> return the key as a LongWritable and the value as Text but I am unsure of
> why its happening here.
>
> Also on the same note, when I supply FileInputFormat or
> KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
> and job.setMapOutputValueClass. When are these used?
>
> Thanks for the clarification
>
>
>
>
>