You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2009/03/10 17:36:51 UTC

[jira] Commented: (HADOOP-5452) Relax the strict type check by allowing subclasses pass the check

    [ https://issues.apache.org/jira/browse/HADOOP-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680528#action_12680528 ] 

Hong Tang commented on HADOOP-5452:
-----------------------------------

I suspect this restriction is provided for performance reasons. To deserialize an object in SequenceFile Reader, the SequenceFile needs to know the concrete type of the serialized bytes. In other words, if objects of any sub-cloasses of the Key-class are admissible, then SequenceFile may have to pay a per-key or per-value string to record the actual type of the key or value objects.

Typically, you would have to write a wrapper class over the set of possible key types and a numeric tag. The serialized form of your wrapper object is simply the numeric tag followed by the actual object in serialized form. This effectively is to minimize the  per-key or per-value overhead by using small integers instead of long strings.

> Relax the strict type check by allowing subclasses pass the check
> -----------------------------------------------------------------
>
>                 Key: HADOOP-5452
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5452
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: he yongqiang
>
> The type check like:
> {code}
> if (key.getClass() != keyClass)
>         throw new IOException("wrong key class: "+key.getClass().getName()
>                               +" is not "+keyClass);
> if (val.getClass() != valClass)
>         throw new IOException("wrong value class: "+val.getClass().getName()
>                               +" is not "+valClass);
> {code}
> is used a lot when a type check is needed. 
> I found their uses in org.apache.hadoop.io.SequenceFile, org.apache.hadoop.mapred.IFile, org.apache.hadoop.mapred.MapTask. Because i search with(key.getClass() != keyClass), so these codes may also appear in other classes.
> I suggest we can relax the strict type check by using 
> {code}
> if (key.getClass().isAssignableFrom(keyClass))
> {code}
> The error in my situation is listed below:
> {panel:borderStyle=dashed| borderColor=#ccc| titleBGColor=#F7D6C1| bgColor=#FFFFCE}
> java.io.IOException: Type mismatch in value from map: expected cn.ac.ict.vega.type.Type, recieved cn.ac.ict.vega.type.Type$Float
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:553)
> 	at cn.ac.ict.vega.parse.mapreduce.block.FilterColumnBlockMapper.map(FilterColumnBlockMapper.java:77)
> 	at cn.ac.ict.vega.parse.mapreduce.block.BlockMapRunner.run(BlockMapRunner.java:33)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:155)
> {panel} 
> Float is a sub class of Type. I wish it can pass the check. I use Type instead of Float is because i can not determint exactly whether it is Float, String or  some others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.