You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Pete Wyckoff <pw...@facebook.com> on 2008/09/12 23:01:43 UTC

aerialization.Deserializer.deserialize method help

This method's signature is
{code}
T deserialize(T);
{code}

But, the RecordReader next method is

{code}
boolean next(K,V);
{code}

So, if the deserialize method does not return the same T (i.e., K or V), how
would this new Object be propagated back thru the RecordReader next method.

It seems the contract on the deserialize method is that it must return the
same  T (although the javadocs say "may").

Am I missing something? And if not, why isn't the API boolean deserialize(T)
?

Thanks, pete

Ps for things like Thrift, there's no way to re-use the object as there's no
clear method, so if this is the case, I don't see how it would work??


Re: aerialization.Deserializer.deserialize method help

Posted by Owen O'Malley <om...@apache.org>.
On Sep 12, 2008, at 3:01 PM, Chris Douglas wrote:

> Oh, I see what you mean. Yes, you need to reuse the objects that  
> you're given in your deserializer.


This isn't true in the general case. The Java serializer for instance,  
always returns a new instance. The SequenceFile reader has a pair of  
methods:

public Object next(Object key) throws IOException;
public Object nextValue(Object value) throws IOException;

so that you can read java serialized objects from a sequence file.  
They also work as map outputs and reduce outputs. The only place where  
you are hosed is the RecordReader interface.  HADOOP-1230's changes to  
the RecordReader were designed to fix the problem.

-- Owen

Re: aerialization.Deserializer.deserialize method help

Posted by Chris Douglas <ch...@yahoo-inc.com>.
Oh, I see what you mean. Yes, you need to reuse the objects that  
you're given in your deserializer.

This will change with HADOOP-1230, though. -C

On Sep 12, 2008, at 2:28 PM, Pete Wyckoff wrote:

>
> What I mean is let's say I plug in a deserializer that always  
> returns a new
> Object - in that case, since everything is pass by value, the new  
> object
> cannot make its way back to the SequenceFileRecordReader user.
>
> While(sequenceFileRecordReader.next(mykey, myvalue)) {
>  // do something
> }
>
> And then my deserializers one/both looks like:
>
> T deserialize(T obj) {
> // ignore obj
>  return new T(params);
> }
>
> Obj would be the key or the value passed in by the user, but since I  
> ignore
> it, basically what happens is the deserialized value actually gets  
> thrown
> away.
>
> More specifically, it gets thrown away in SequenceFile.Reader I  
> believe.
>
> -- pete
>
>
> On 9/12/08 2:20 PM, "Chris Douglas" <ch...@yahoo-inc.com> wrote:
>
>> If you pass in null to the deserializer, it creates a new instance  
>> and
>> returns it; passing in an instance reuses it.
>>
>> I don't understand the disconnect between Deserializer and the
>> RecordReader. Does your RecordReader generate instances that only
>> share a common subtype T? You need separate Deserializers for K and  
>> V,
>> if that's the issue... -C
>>
>> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>>
>>>
>>> This method's signature is
>>> {code}
>>> T deserialize(T);
>>> {code}
>>>
>>> But, the RecordReader next method is
>>>
>>> {code}
>>> boolean next(K,V);
>>> {code}
>>>
>>> So, if the deserialize method does not return the same T (i.e., K or
>>> V), how
>>> would this new Object be propagated back thru the RecordReader next
>>> method.
>>>
>>> It seems the contract on the deserialize method is that it must
>>> return the
>>> same  T (although the javadocs say "may").
>>>
>>> Am I missing something? And if not, why isn't the API boolean
>>> deserialize(T)
>>> ?
>>>
>>> Thanks, pete
>>>
>>> Ps for things like Thrift, there's no way to re-use the object as
>>> there's no
>>> clear method, so if this is the case, I don't see how it would  
>>> work??
>>>
>>
>


Re: aerialization.Deserializer.deserialize method help

Posted by Pete Wyckoff <pw...@facebook.com>.
Sorry - saw the response after I sent this. But the current javadocs are
wrong and should probably say must return what was passed in.


On 9/12/08 3:02 PM, "Pete Wyckoff" <pw...@facebook.com> wrote:

> 
> Specifically, line 75 of SequenceFileRecordReader:
> 
>>    boolean remaining = (in.next(key) != null);
> 
> Throws out the return value of SequenceFile.next which is the result of
> deserialize(obj).
> 
> -- pete
> 
> 
> On 9/12/08 2:28 PM, "Pete Wyckoff" <pw...@facebook.com> wrote:
> 
>> 
>> What I mean is let's say I plug in a deserializer that always returns a new
>> Object - in that case, since everything is pass by value, the new object
>> cannot make its way back to the SequenceFileRecordReader user.
>> 
>> While(sequenceFileRecordReader.next(mykey, myvalue)) {
>>   // do something
>> }
>> 
>> And then my deserializers one/both looks like:
>> 
>> T deserialize(T obj) {
>>  // ignore obj
>>   return new T(params);
>> }
>> 
>> Obj would be the key or the value passed in by the user, but since I ignore
>> it, basically what happens is the deserialized value actually gets thrown
>> away. 
>> 
>> More specifically, it gets thrown away in SequenceFile.Reader I believe.
>> 
>> -- pete
>> 
>> 
>> On 9/12/08 2:20 PM, "Chris Douglas" <ch...@yahoo-inc.com> wrote:
>> 
>>> If you pass in null to the deserializer, it creates a new instance and
>>> returns it; passing in an instance reuses it.
>>> 
>>> I don't understand the disconnect between Deserializer and the
>>> RecordReader. Does your RecordReader generate instances that only
>>> share a common subtype T? You need separate Deserializers for K and V,
>>> if that's the issue... -C
>>> 
>>> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>>> 
>>>> 
>>>> This method's signature is
>>>> {code}
>>>> T deserialize(T);
>>>> {code}
>>>> 
>>>> But, the RecordReader next method is
>>>> 
>>>> {code}
>>>> boolean next(K,V);
>>>> {code}
>>>> 
>>>> So, if the deserialize method does not return the same T (i.e., K or
>>>> V), how
>>>> would this new Object be propagated back thru the RecordReader next
>>>> method.
>>>> 
>>>> It seems the contract on the deserialize method is that it must
>>>> return the
>>>> same  T (although the javadocs say "may").
>>>> 
>>>> Am I missing something? And if not, why isn't the API boolean
>>>> deserialize(T)
>>>> ?
>>>> 
>>>> Thanks, pete
>>>> 
>>>> Ps for things like Thrift, there's no way to re-use the object as
>>>> there's no
>>>> clear method, so if this is the case, I don't see how it would work??
>>>> 
>>> 
>> 
> 


Re: aerialization.Deserializer.deserialize method help

Posted by Pete Wyckoff <pw...@facebook.com>.
Specifically, line 75 of SequenceFileRecordReader:

>    boolean remaining = (in.next(key) != null);

Throws out the return value of SequenceFile.next which is the result of
deserialize(obj).

-- pete


On 9/12/08 2:28 PM, "Pete Wyckoff" <pw...@facebook.com> wrote:

> 
> What I mean is let's say I plug in a deserializer that always returns a new
> Object - in that case, since everything is pass by value, the new object
> cannot make its way back to the SequenceFileRecordReader user.
> 
> While(sequenceFileRecordReader.next(mykey, myvalue)) {
>   // do something
> }
> 
> And then my deserializers one/both looks like:
> 
> T deserialize(T obj) {
>  // ignore obj
>   return new T(params);
> }
> 
> Obj would be the key or the value passed in by the user, but since I ignore
> it, basically what happens is the deserialized value actually gets thrown
> away. 
> 
> More specifically, it gets thrown away in SequenceFile.Reader I believe.
> 
> -- pete
> 
> 
> On 9/12/08 2:20 PM, "Chris Douglas" <ch...@yahoo-inc.com> wrote:
> 
>> If you pass in null to the deserializer, it creates a new instance and
>> returns it; passing in an instance reuses it.
>> 
>> I don't understand the disconnect between Deserializer and the
>> RecordReader. Does your RecordReader generate instances that only
>> share a common subtype T? You need separate Deserializers for K and V,
>> if that's the issue... -C
>> 
>> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>> 
>>> 
>>> This method's signature is
>>> {code}
>>> T deserialize(T);
>>> {code}
>>> 
>>> But, the RecordReader next method is
>>> 
>>> {code}
>>> boolean next(K,V);
>>> {code}
>>> 
>>> So, if the deserialize method does not return the same T (i.e., K or
>>> V), how
>>> would this new Object be propagated back thru the RecordReader next
>>> method.
>>> 
>>> It seems the contract on the deserialize method is that it must
>>> return the
>>> same  T (although the javadocs say "may").
>>> 
>>> Am I missing something? And if not, why isn't the API boolean
>>> deserialize(T)
>>> ?
>>> 
>>> Thanks, pete
>>> 
>>> Ps for things like Thrift, there's no way to re-use the object as
>>> there's no
>>> clear method, so if this is the case, I don't see how it would work??
>>> 
>> 
> 


Re: aerialization.Deserializer.deserialize method help

Posted by Pete Wyckoff <pw...@facebook.com>.
What I mean is let's say I plug in a deserializer that always returns a new
Object - in that case, since everything is pass by value, the new object
cannot make its way back to the SequenceFileRecordReader user.

While(sequenceFileRecordReader.next(mykey, myvalue)) {
  // do something
}

And then my deserializers one/both looks like:

T deserialize(T obj) {
 // ignore obj
  return new T(params);
}

Obj would be the key or the value passed in by the user, but since I ignore
it, basically what happens is the deserialized value actually gets thrown
away. 

More specifically, it gets thrown away in SequenceFile.Reader I believe.

-- pete


On 9/12/08 2:20 PM, "Chris Douglas" <ch...@yahoo-inc.com> wrote:

> If you pass in null to the deserializer, it creates a new instance and
> returns it; passing in an instance reuses it.
> 
> I don't understand the disconnect between Deserializer and the
> RecordReader. Does your RecordReader generate instances that only
> share a common subtype T? You need separate Deserializers for K and V,
> if that's the issue... -C
> 
> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
> 
>> 
>> This method's signature is
>> {code}
>> T deserialize(T);
>> {code}
>> 
>> But, the RecordReader next method is
>> 
>> {code}
>> boolean next(K,V);
>> {code}
>> 
>> So, if the deserialize method does not return the same T (i.e., K or
>> V), how
>> would this new Object be propagated back thru the RecordReader next
>> method.
>> 
>> It seems the contract on the deserialize method is that it must
>> return the
>> same  T (although the javadocs say "may").
>> 
>> Am I missing something? And if not, why isn't the API boolean
>> deserialize(T)
>> ?
>> 
>> Thanks, pete
>> 
>> Ps for things like Thrift, there's no way to re-use the object as
>> there's no
>> clear method, so if this is the case, I don't see how it would work??
>> 
> 


Re: aerialization.Deserializer.deserialize method help

Posted by Chris Douglas <ch...@yahoo-inc.com>.
If you pass in null to the deserializer, it creates a new instance and  
returns it; passing in an instance reuses it.

I don't understand the disconnect between Deserializer and the  
RecordReader. Does your RecordReader generate instances that only  
share a common subtype T? You need separate Deserializers for K and V,  
if that's the issue... -C

On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:

>
> This method's signature is
> {code}
> T deserialize(T);
> {code}
>
> But, the RecordReader next method is
>
> {code}
> boolean next(K,V);
> {code}
>
> So, if the deserialize method does not return the same T (i.e., K or  
> V), how
> would this new Object be propagated back thru the RecordReader next  
> method.
>
> It seems the contract on the deserialize method is that it must  
> return the
> same  T (although the javadocs say "may").
>
> Am I missing something? And if not, why isn't the API boolean  
> deserialize(T)
> ?
>
> Thanks, pete
>
> Ps for things like Thrift, there's no way to re-use the object as  
> there's no
> clear method, so if this is the case, I don't see how it would work??
>