You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Andrew Pennebaker <ap...@42six.com> on 2013/08/27 17:16:02 UTC

MapReduce Tutorial tweak

In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
line 16 declares:

private Text word = new Text();

...

But only lines 22 and 23 use this, and only to pass the value along to
output:

word.set(tokenizer.nextToken());
output.collect(word, one);

Wouldn't this be better expressed as:

(no private Text word)

...

output.collect(tokenizer.nextToken(), one);

?

Re: MapReduce Tutorial tweak

Posted by Ravi Kiran <ra...@gmail.com>.
Also to add, the default serialization libraries supported are specified in
core-default,xml as

<property>
  <name>io.serializations</name>

<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value>
  <description>A list of serialization classes that can be used for
 obtaining serializers and deserializers.</description>
</property>

Since the default Java Serialization isn't supported , you would need to
convert to *Writables that Hadoop can use for better , compact
serialization of objects.

Regards
Ravi Magham


On Tue, Aug 27, 2013 at 9:27 PM, Shahab Yunus <sh...@gmail.com>wrote:

> As far as I undersstand, StringTokenizer.nextToken returns Java String
> type object which does not implement the required Writable and Comparable
> interfaces needed to Hadoop Mapreduce serialization and transport. The Text
> class does that and is compatible and thus that is why that is being used
> to wrap Java String and pass it on.
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>
>> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
>> line 16 declares:
>>
>> private Text word = new Text();
>>
>> ...
>>
>> But only lines 22 and 23 use this, and only to pass the value along to
>> output:
>>
>> word.set(tokenizer.nextToken());
>> output.collect(word, one);
>>
>> Wouldn't this be better expressed as:
>>
>> (no private Text word)
>>
>> ...
>>
>> output.collect(tokenizer.nextToken(), one);
>>
>> ?
>>
>
>

Re: MapReduce Tutorial tweak

Posted by Ravi Kiran <ra...@gmail.com>.
Also to add, the default serialization libraries supported are specified in
core-default,xml as

<property>
  <name>io.serializations</name>

<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value>
  <description>A list of serialization classes that can be used for
 obtaining serializers and deserializers.</description>
</property>

Since the default Java Serialization isn't supported , you would need to
convert to *Writables that Hadoop can use for better , compact
serialization of objects.

Regards
Ravi Magham


On Tue, Aug 27, 2013 at 9:27 PM, Shahab Yunus <sh...@gmail.com>wrote:

> As far as I undersstand, StringTokenizer.nextToken returns Java String
> type object which does not implement the required Writable and Comparable
> interfaces needed to Hadoop Mapreduce serialization and transport. The Text
> class does that and is compatible and thus that is why that is being used
> to wrap Java String and pass it on.
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>
>> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
>> line 16 declares:
>>
>> private Text word = new Text();
>>
>> ...
>>
>> But only lines 22 and 23 use this, and only to pass the value along to
>> output:
>>
>> word.set(tokenizer.nextToken());
>> output.collect(word, one);
>>
>> Wouldn't this be better expressed as:
>>
>> (no private Text word)
>>
>> ...
>>
>> output.collect(tokenizer.nextToken(), one);
>>
>> ?
>>
>
>

Re: MapReduce Tutorial tweak

Posted by Ravi Kiran <ra...@gmail.com>.
Also to add, the default serialization libraries supported are specified in
core-default,xml as

<property>
  <name>io.serializations</name>

<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value>
  <description>A list of serialization classes that can be used for
 obtaining serializers and deserializers.</description>
</property>

Since the default Java Serialization isn't supported , you would need to
convert to *Writables that Hadoop can use for better , compact
serialization of objects.

Regards
Ravi Magham


On Tue, Aug 27, 2013 at 9:27 PM, Shahab Yunus <sh...@gmail.com>wrote:

> As far as I undersstand, StringTokenizer.nextToken returns Java String
> type object which does not implement the required Writable and Comparable
> interfaces needed to Hadoop Mapreduce serialization and transport. The Text
> class does that and is compatible and thus that is why that is being used
> to wrap Java String and pass it on.
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>
>> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
>> line 16 declares:
>>
>> private Text word = new Text();
>>
>> ...
>>
>> But only lines 22 and 23 use this, and only to pass the value along to
>> output:
>>
>> word.set(tokenizer.nextToken());
>> output.collect(word, one);
>>
>> Wouldn't this be better expressed as:
>>
>> (no private Text word)
>>
>> ...
>>
>> output.collect(tokenizer.nextToken(), one);
>>
>> ?
>>
>
>

Re: MapReduce Tutorial tweak

Posted by Ravi Kiran <ra...@gmail.com>.
Also to add, the default serialization libraries supported are specified in
core-default,xml as

<property>
  <name>io.serializations</name>

<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value>
  <description>A list of serialization classes that can be used for
 obtaining serializers and deserializers.</description>
</property>

Since the default Java Serialization isn't supported , you would need to
convert to *Writables that Hadoop can use for better , compact
serialization of objects.

Regards
Ravi Magham


On Tue, Aug 27, 2013 at 9:27 PM, Shahab Yunus <sh...@gmail.com>wrote:

> As far as I undersstand, StringTokenizer.nextToken returns Java String
> type object which does not implement the required Writable and Comparable
> interfaces needed to Hadoop Mapreduce serialization and transport. The Text
> class does that and is compatible and thus that is why that is being used
> to wrap Java String and pass it on.
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>
>> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
>> line 16 declares:
>>
>> private Text word = new Text();
>>
>> ...
>>
>> But only lines 22 and 23 use this, and only to pass the value along to
>> output:
>>
>> word.set(tokenizer.nextToken());
>> output.collect(word, one);
>>
>> Wouldn't this be better expressed as:
>>
>> (no private Text word)
>>
>> ...
>>
>> output.collect(tokenizer.nextToken(), one);
>>
>> ?
>>
>
>

Re: MapReduce Tutorial tweak

Posted by Shahab Yunus <sh...@gmail.com>.
As far as I undersstand, StringTokenizer.nextToken returns Java String type
object which does not implement the required Writable and Comparable
interfaces needed to Hadoop Mapreduce serialization and transport. The Text
class does that and is compatible and thus that is why that is being used
to wrap Java String and pass it on.

Regards,
Shahab


On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker
<ap...@42six.com>wrote:

> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
> line 16 declares:
>
> private Text word = new Text();
>
> ...
>
> But only lines 22 and 23 use this, and only to pass the value along to
> output:
>
> word.set(tokenizer.nextToken());
> output.collect(word, one);
>
> Wouldn't this be better expressed as:
>
> (no private Text word)
>
> ...
>
> output.collect(tokenizer.nextToken(), one);
>
> ?
>

Re: MapReduce Tutorial tweak

Posted by Shahab Yunus <sh...@gmail.com>.
As far as I undersstand, StringTokenizer.nextToken returns Java String type
object which does not implement the required Writable and Comparable
interfaces needed to Hadoop Mapreduce serialization and transport. The Text
class does that and is compatible and thus that is why that is being used
to wrap Java String and pass it on.

Regards,
Shahab


On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker
<ap...@42six.com>wrote:

> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
> line 16 declares:
>
> private Text word = new Text();
>
> ...
>
> But only lines 22 and 23 use this, and only to pass the value along to
> output:
>
> word.set(tokenizer.nextToken());
> output.collect(word, one);
>
> Wouldn't this be better expressed as:
>
> (no private Text word)
>
> ...
>
> output.collect(tokenizer.nextToken(), one);
>
> ?
>

Re: MapReduce Tutorial tweak

Posted by Shahab Yunus <sh...@gmail.com>.
As far as I undersstand, StringTokenizer.nextToken returns Java String type
object which does not implement the required Writable and Comparable
interfaces needed to Hadoop Mapreduce serialization and transport. The Text
class does that and is compatible and thus that is why that is being used
to wrap Java String and pass it on.

Regards,
Shahab


On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker
<ap...@42six.com>wrote:

> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
> line 16 declares:
>
> private Text word = new Text();
>
> ...
>
> But only lines 22 and 23 use this, and only to pass the value along to
> output:
>
> word.set(tokenizer.nextToken());
> output.collect(word, one);
>
> Wouldn't this be better expressed as:
>
> (no private Text word)
>
> ...
>
> output.collect(tokenizer.nextToken(), one);
>
> ?
>

Re: MapReduce Tutorial tweak

Posted by Shahab Yunus <sh...@gmail.com>.
As far as I undersstand, StringTokenizer.nextToken returns Java String type
object which does not implement the required Writable and Comparable
interfaces needed to Hadoop Mapreduce serialization and transport. The Text
class does that and is compatible and thus that is why that is being used
to wrap Java String and pass it on.

Regards,
Shahab


On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker
<ap...@42six.com>wrote:

> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
> line 16 declares:
>
> private Text word = new Text();
>
> ...
>
> But only lines 22 and 23 use this, and only to pass the value along to
> output:
>
> word.set(tokenizer.nextToken());
> output.collect(word, one);
>
> Wouldn't this be better expressed as:
>
> (no private Text word)
>
> ...
>
> output.collect(tokenizer.nextToken(), one);
>
> ?
>