You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark <st...@gmail.com> on 2010/08/26 20:07:53 UTC
KeyValueTextInputFormat
When I configure my job to use a KeyValueTextInputFormat doesn't that
imply that the key and value to my mapper will be both Text?
I have it set up like this and I am using the default Mapper.class ie
IdentityMapper
- KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
but I keep receiving this error:
- java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text
I would expect this error if I was using the FileInputFormat because
that return the key as a LongWritable and the value as Text but I am
unsure of why its happening here.
Also on the same note, when I supply FileInputFormat or
KeyValueTextInputFormat does that implicitly set
job.setMapOutputKeyClass and job.setMapOutputValueClass. When are these
used?
Thanks for the clarification
Re: KeyValueTextInputFormat
Posted by newpant <ne...@gmail.com>.
Hi Mark, for example, i have a dataset which contains the weather data like
the example in Hadoop: The Definitive Guide, the file is ascii-encoded,
and we use TextInputFormat, so, the input key and value type of mapper is
LongWritable(the line offset in bytes) and Text(data record),
mapper will parser the record and output the year and temperature, year is
a Text, temperature is a IntWritable. Reducer will take the mapper output,
find the maximum value for a given key.
in this case, the mapper input type is : <LongWritable, Text> and output
type is <Text, IntWritable>. Reducer's input type is <Text,
Iterable<IntWritable>>, and output a <Text, IntWritable>
2010/8/27 Mark <st...@gmail.com>
> On 8/26/10 7:47 PM, newpant wrote:
>
>> Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to
>> set
>> the input format class ? Default input format class is TextInputFormat,
>> and
>> the Key type is LongWritable, which store offset of lines in the file (in
>> byte)
>>
>> if your reducer accept a different key or value from mapper output, you
>> need
>> to setMapOutputKeyClass and setMapOutputValueClass
>>
>> 2010/8/27 Mark<st...@gmail.com>
>>
>> When I configure my job to use a KeyValueTextInputFormat doesn't that
>>> imply that the key and value to my mapper will be both Text?
>>>
>>> I have it set up like this and I am using the default Mapper.class ie
>>> IdentityMapper
>>> - KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>>
>>> but I keep receiving this error:
>>> - java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
>>> be
>>> cast to org.apache.hadoop.io.Text
>>>
>>> I would expect this error if I was using the FileInputFormat because that
>>> return the key as a LongWritable and the value as Text but I am unsure of
>>> why its happening here.
>>>
>>> Also on the same note, when I supply FileInputFormat or
>>> KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
>>> and job.setMapOutputValueClass. When are these used?
>>>
>>> Thanks for the clarification
>>>
>>>
>>>
>>>
>>>
>>> No I didnt set that and when I did everything worked as expected. I
> thought if I used:
>
>
> KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]))
>
>
> it would set that for me or at lest know that it would be text/text as
> input. Im guessing that is wrong.
>
>
> if your reducer accept a different key or value from mapper output, you
> need
> to setMapOutputKeyClass and setMapOutputValueClass
>
> When would this ever come up? Does it just cast to the appropriate classes
> then?
>
> Thanks
>
>
Re: KeyValueTextInputFormat
Posted by Mark <st...@gmail.com>.
On 8/26/10 7:47 PM, newpant wrote:
> Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to set
> the input format class ? Default input format class is TextInputFormat, and
> the Key type is LongWritable, which store offset of lines in the file (in
> byte)
>
> if your reducer accept a different key or value from mapper output, you need
> to setMapOutputKeyClass and setMapOutputValueClass
>
> 2010/8/27 Mark<st...@gmail.com>
>
>> When I configure my job to use a KeyValueTextInputFormat doesn't that
>> imply that the key and value to my mapper will be both Text?
>>
>> I have it set up like this and I am using the default Mapper.class ie
>> IdentityMapper
>> - KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>
>> but I keep receiving this error:
>> - java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to org.apache.hadoop.io.Text
>>
>> I would expect this error if I was using the FileInputFormat because that
>> return the key as a LongWritable and the value as Text but I am unsure of
>> why its happening here.
>>
>> Also on the same note, when I supply FileInputFormat or
>> KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
>> and job.setMapOutputValueClass. When are these used?
>>
>> Thanks for the clarification
>>
>>
>>
>>
>>
No I didnt set that and when I did everything worked as expected. I
thought if I used:
KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]))
it would set that for me or at lest know that it would be text/text as
input. Im guessing that is wrong.
if your reducer accept a different key or value from mapper output, you need
to setMapOutputKeyClass and setMapOutputValueClass
When would this ever come up? Does it just cast to the appropriate
classes then?
Thanks
Re: KeyValueTextInputFormat
Posted by newpant <ne...@gmail.com>.
Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to set
the input format class ? Default input format class is TextInputFormat, and
the Key type is LongWritable, which store offset of lines in the file (in
byte)
if your reducer accept a different key or value from mapper output, you need
to setMapOutputKeyClass and setMapOutputValueClass
2010/8/27 Mark <st...@gmail.com>
> When I configure my job to use a KeyValueTextInputFormat doesn't that
> imply that the key and value to my mapper will be both Text?
>
> I have it set up like this and I am using the default Mapper.class ie
> IdentityMapper
> - KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
>
> but I keep receiving this error:
> - java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.Text
>
> I would expect this error if I was using the FileInputFormat because that
> return the key as a LongWritable and the value as Text but I am unsure of
> why its happening here.
>
> Also on the same note, when I supply FileInputFormat or
> KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
> and job.setMapOutputValueClass. When are these used?
>
> Thanks for the clarification
>
>
>
>
>