You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Alex Kozlov <al...@cloudera.com> on 2010/01/27 19:49:48 UTC

Re: Does the class of the Mapper output need to match the exact class of the specified output?

Currently map output supports only one class, but it does not prevent you
from encapsulating another field or class in your own Writable and
serializing it.

AVRO <http://hadoop.apache.org/avro/docs/current/spec.html> is supposed to
have multiple formats out-of-the-box, but it does not have Input/Output
formats yet (0.21.0?)

Hadoop is using it's own serialization outside the Java serialization for
performance reasons.

Alex

On Tue, Jan 26, 2010 at 5:17 PM, Wilkes, Chris <cw...@gmail.com> wrote:

> I'm outputting a Text and a LongWritable in my mapper and told the job that
> my mapper output class is Writable (the interface shared by both of them):
>   job.setMapOutputValueClass(Writable.class);
> I'm doing this as I have two different types of input files and am
> combining them together.  I could write them both as as Text but then I'll
> have to put a marker in front of the tag to indicate what type of entry it
> is instead of doing a
>   if (value instanceof Text) { }  else if (value instanceof LongWritable) {
> }
>
> This exception is thrown:
>
> java.io.IOException: Type mismatch in value from map: expected
> org.apache.hadoop.io.Writable, recieved org.apache.hadoop.io.LongWritable
>
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
> 	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>
> The MapTask code (which is being used even though I'm using the new API) shows that a != is used to compare the classes:
>
> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/MapTask.java?view=log
>
>  if (value.getClass() != valClass) {
>         throw new IOException("Type mismatch in value from map: expected "
>                               + valClass.getName() + ", recieved "
>                               + value.getClass().getName());
>       }
>
> Does this level of checking really need to be done?  Could it just be a Class.isAssignableFrom() check?
>
>

Re: Does the class of the Mapper output need to match the exact class of the specified output?

Posted by Alex Kozlov <al...@cloudera.com>.

You can get them through JobTracker object.  Where the other program is
running?

On Tue, Feb 2, 2010 at 4:50 AM, Rajan Dev <ku...@gmail.com> wrote:

> Hello
>  I have been working on counters for the jobs.
>  I did get all the counters for jobs . but i do require to show  all the
> counters in my own UI. i am using webservices to show all the counters.
>
> Could someone help me with this..?
> or is there any other way to get thecounters
> My approach :-
> i use reporter.incrcounter method to get the counters increment.
> and run another program to read all the counters
>
> Regards,
> Rajan
>

Re: Does the class of the Mapper output need to match the exact class of the specified output?

Posted by Rajan Dev <ku...@gmail.com>.

Hello
 I have been working on counters for the jobs.
 I did get all the counters for jobs . but i do require to show  all the
counters in my own UI. i am using webservices to show all the counters.

Could someone help me with this..?
or is there any other way to get thecounters
My approach :-
i use reporter.incrcounter method to get the counters increment.
and run another program to read all the counters

Regards,
Rajan

Re: Does the class of the Mapper output need to match the exact class of the specified output?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Hey Chris,

You may want to see https://issues.apache.org/jira/browse/MAPREDUCE-1126 and
https://issues.apache.org/jira/browse/MAPREDUCE-815 to see how Avro is being
integrated into MapReduce. In particular, I think you would be well served
by Avro's union type, though I'm not sure I understand your use case
completely.

Thanks,
Jeff

On Wed, Jan 27, 2010 at 11:01 AM, Wilkes, Chris <cw...@gmail.com> wrote:

> What I've done is created a simple wrapper class "TaggedWritable" that has
> a String (the "tag") and a Writable as fields.  That does the trick.
>
> public class TaggedWritable implements Writable {
> private String m_key;
> private Writable m_value;
> public TaggedWritable() { }
> public TaggedWritable(String key, Writable value) {
> m_key = key;
> m_value = value;
> }
> public String getTag() {
> return m_key;
> }
> public Writable getValue() {
> return m_value;
> }
> @SuppressWarnings("unchecked")
> @Override
> public void readFields(DataInput in) throws IOException {
> m_key = readString(in);
> String className = readString(in);
> try {
> Class<Writable> valueClass = (Class<Writable>) Class.forName(className);
> m_value = valueClass.newInstance();
> m_value.readFields(in);
> } catch (Exception ex) {
> throw new IllegalStateException("Error converting " + className + " to
> writable", ex);
> }
> }
> @Override
> public void write(DataOutput out) throws IOException {
> writeString(out, m_key);
> writeString(out, m_value.getClass().getName());
> m_value.write(out);
> }
> }
>
> On Jan 27, 2010, at 10:49 AM, Alex Kozlov wrote:
>
> Currently map output supports only one class, but it does not prevent you
> from encapsulating another field or class in your own Writable and
> serializing it.
>
> AVRO <http://hadoop.apache.org/avro/docs/current/spec.html> is supposed to
> have multiple formats out-of-the-box, but it does not have Input/Output
> formats yet (0.21.0?)
>
> Hadoop is using it's own serialization outside the Java serialization for
> performance reasons.
>
> Alex
>
> On Tue, Jan 26, 2010 at 5:17 PM, Wilkes, Chris <cw...@gmail.com> wrote:
>
>> I'm outputting a Text and a LongWritable in my mapper and told the job
>> that my mapper output class is Writable (the interface shared by both of
>> them):
>>   job.setMapOutputValueClass(Writable.class);
>> I'm doing this as I have two different types of input files and am
>> combining them together.  I could write them both as as Text but then I'll
>> have to put a marker in front of the tag to indicate what type of entry it
>> is instead of doing a
>>   if (value instanceof Text) { }  else if (value instanceof LongWritable)
>> { }
>>
>> This exception is thrown:
>>
>> java.io.IOException: Type mismatch in value from map: expected
>> org.apache.hadoop.io.Writable, recieved org.apache.hadoop.io.LongWritable
>>
>> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
>> 	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
>> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>
>> The MapTask code (which is being used even though I'm using the new API) shows that a != is used to compare the classes:
>>
>> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/MapTask.java?view=log
>>
>>  if (value.getClass() != valClass) {
>>
>>         throw new IOException("Type mismatch in value from map: expected "
>>
>>                               + valClass.getName() + ", recieved "
>>
>>                               + value.getClass().getName());
>>       }
>>
>>
>> Does this level of checking really need to be done?  Could it just be a Class.isAssignableFrom() check?
>>
>>
>>
>
>

Re: Does the class of the Mapper output need to match the exact class of the specified output?

Posted by "Wilkes, Chris" <cw...@gmail.com>.

What I've done is created a simple wrapper class "TaggedWritable" that  
has a String (the "tag") and a Writable as fields.  That does the trick.

public class TaggedWritable implements Writable {
	private String m_key;
	private Writable m_value;
	public TaggedWritable() {	}
	public TaggedWritable(String key, Writable value) {
		m_key = key;
		m_value = value;
	}
	public String getTag() {
		return m_key;
	}
	public Writable getValue() {
		return m_value;
	}
	@SuppressWarnings("unchecked")
	@Override
	public void readFields(DataInput in) throws IOException {
		m_key = readString(in);
		String className = readString(in);
		try {
			Class<Writable> valueClass = (Class<Writable>)  
Class.forName(className);
			m_value = valueClass.newInstance();
			m_value.readFields(in);
		} catch (Exception ex) {
			throw new IllegalStateException("Error converting " + className + "  
to writable", ex);
		}
	}
	@Override
	public void write(DataOutput out) throws IOException {
		writeString(out, m_key);
		writeString(out, m_value.getClass().getName());
		m_value.write(out);
	}
}

On Jan 27, 2010, at 10:49 AM, Alex Kozlov wrote:

> Currently map output supports only one class, but it does not  
> prevent you from encapsulating another field or class in your own  
> Writable and serializing it.
>
> AVRO is supposed to have multiple formats out-of-the-box, but it  
> does not have Input/Output formats yet (0.21.0?)
>
> Hadoop is using it's own serialization outside the Java  
> serialization for performance reasons.
>
> Alex
>
> On Tue, Jan 26, 2010 at 5:17 PM, Wilkes, Chris <cw...@gmail.com>  
> wrote:
> I'm outputting a Text and a LongWritable in my mapper and told the  
> job that my mapper output class is Writable (the interface shared by  
> both of them):
>   job.setMapOutputValueClass(Writable.class);
> I'm doing this as I have two different types of input files and am  
> combining them together.  I could write them both as as Text but  
> then I'll have to put a marker in front of the tag to indicate what  
> type of entry it is instead of doing a
>   if (value instanceof Text) { }  else if (value instanceof  
> LongWritable) { }
>
> This exception is thrown:
>
> java.io.IOException: Type mismatch in value from map: expected  
> org.apache.hadoop.io.Writable, recieved  
> org.apache.hadoop.io.LongWritable
> 	at org.apache.hadoop.mapred.MapTask 
> $MapOutputBuffer.collect(MapTask.java:812)
> 	at org.apache.hadoop.mapred.MapTask 
> $NewOutputCollector.write(MapTask.java:504)
> 	at  
> org 
> .apache 
> .hadoop 
> .mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java: 
> 80)
> The MapTask code (which is being used even though I'm using the new  
> API) shows that a != is used to compare the classes:
> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/MapTask.java?view=log
>  if (value.getClass() != valClass) {
>
>
>         throw new IOException("Type mismatch in value from map:  
> expected "
>
>
>                               + valClass.getName() + ", recieved "
>
>
>                               + value.getClass().getName());
>       }
>
>
>
> Does this level of checking really need to be done?  Could it just  
> be a Class.isAssignableFrom() check?
>
>
>
>