You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Pete Wyckoff <pw...@facebook.com> on 2008/09/12 23:01:43 UTC
aerialization.Deserializer.deserialize method help
This method's signature is
{code}
T deserialize(T);
{code}
But, the RecordReader next method is
{code}
boolean next(K,V);
{code}
So, if the deserialize method does not return the same T (i.e., K or V), how
would this new Object be propagated back thru the RecordReader next method.
It seems the contract on the deserialize method is that it must return the
same T (although the javadocs say "may").
Am I missing something? And if not, why isn't the API boolean deserialize(T)
?
Thanks, pete
Ps for things like Thrift, there's no way to re-use the object as there's no
clear method, so if this is the case, I don't see how it would work??
Re: aerialization.Deserializer.deserialize method help
Posted by Owen O'Malley <om...@apache.org>.
On Sep 12, 2008, at 3:01 PM, Chris Douglas wrote:
> Oh, I see what you mean. Yes, you need to reuse the objects that
> you're given in your deserializer.
This isn't true in the general case. The Java serializer for instance,
always returns a new instance. The SequenceFile reader has a pair of
methods:
public Object next(Object key) throws IOException;
public Object nextValue(Object value) throws IOException;
so that you can read java serialized objects from a sequence file.
They also work as map outputs and reduce outputs. The only place where
you are hosed is the RecordReader interface. HADOOP-1230's changes to
the RecordReader were designed to fix the problem.
-- Owen
Re: aerialization.Deserializer.deserialize method help
Posted by Chris Douglas <ch...@yahoo-inc.com>.
Oh, I see what you mean. Yes, you need to reuse the objects that
you're given in your deserializer.
This will change with HADOOP-1230, though. -C
On Sep 12, 2008, at 2:28 PM, Pete Wyckoff wrote:
>
> What I mean is let's say I plug in a deserializer that always
> returns a new
> Object - in that case, since everything is pass by value, the new
> object
> cannot make its way back to the SequenceFileRecordReader user.
>
> While(sequenceFileRecordReader.next(mykey, myvalue)) {
> // do something
> }
>
> And then my deserializers one/both looks like:
>
> T deserialize(T obj) {
> // ignore obj
> return new T(params);
> }
>
> Obj would be the key or the value passed in by the user, but since I
> ignore
> it, basically what happens is the deserialized value actually gets
> thrown
> away.
>
> More specifically, it gets thrown away in SequenceFile.Reader I
> believe.
>
> -- pete
>
>
> On 9/12/08 2:20 PM, "Chris Douglas" <ch...@yahoo-inc.com> wrote:
>
>> If you pass in null to the deserializer, it creates a new instance
>> and
>> returns it; passing in an instance reuses it.
>>
>> I don't understand the disconnect between Deserializer and the
>> RecordReader. Does your RecordReader generate instances that only
>> share a common subtype T? You need separate Deserializers for K and
>> V,
>> if that's the issue... -C
>>
>> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>>
>>>
>>> This method's signature is
>>> {code}
>>> T deserialize(T);
>>> {code}
>>>
>>> But, the RecordReader next method is
>>>
>>> {code}
>>> boolean next(K,V);
>>> {code}
>>>
>>> So, if the deserialize method does not return the same T (i.e., K or
>>> V), how
>>> would this new Object be propagated back thru the RecordReader next
>>> method.
>>>
>>> It seems the contract on the deserialize method is that it must
>>> return the
>>> same T (although the javadocs say "may").
>>>
>>> Am I missing something? And if not, why isn't the API boolean
>>> deserialize(T)
>>> ?
>>>
>>> Thanks, pete
>>>
>>> Ps for things like Thrift, there's no way to re-use the object as
>>> there's no
>>> clear method, so if this is the case, I don't see how it would
>>> work??
>>>
>>
>
Re: aerialization.Deserializer.deserialize method help
Posted by Pete Wyckoff <pw...@facebook.com>.
Sorry - saw the response after I sent this. But the current javadocs are
wrong and should probably say must return what was passed in.
On 9/12/08 3:02 PM, "Pete Wyckoff" <pw...@facebook.com> wrote:
>
> Specifically, line 75 of SequenceFileRecordReader:
>
>> boolean remaining = (in.next(key) != null);
>
> Throws out the return value of SequenceFile.next which is the result of
> deserialize(obj).
>
> -- pete
>
>
> On 9/12/08 2:28 PM, "Pete Wyckoff" <pw...@facebook.com> wrote:
>
>>
>> What I mean is let's say I plug in a deserializer that always returns a new
>> Object - in that case, since everything is pass by value, the new object
>> cannot make its way back to the SequenceFileRecordReader user.
>>
>> While(sequenceFileRecordReader.next(mykey, myvalue)) {
>> // do something
>> }
>>
>> And then my deserializers one/both looks like:
>>
>> T deserialize(T obj) {
>> // ignore obj
>> return new T(params);
>> }
>>
>> Obj would be the key or the value passed in by the user, but since I ignore
>> it, basically what happens is the deserialized value actually gets thrown
>> away.
>>
>> More specifically, it gets thrown away in SequenceFile.Reader I believe.
>>
>> -- pete
>>
>>
>> On 9/12/08 2:20 PM, "Chris Douglas" <ch...@yahoo-inc.com> wrote:
>>
>>> If you pass in null to the deserializer, it creates a new instance and
>>> returns it; passing in an instance reuses it.
>>>
>>> I don't understand the disconnect between Deserializer and the
>>> RecordReader. Does your RecordReader generate instances that only
>>> share a common subtype T? You need separate Deserializers for K and V,
>>> if that's the issue... -C
>>>
>>> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>>>
>>>>
>>>> This method's signature is
>>>> {code}
>>>> T deserialize(T);
>>>> {code}
>>>>
>>>> But, the RecordReader next method is
>>>>
>>>> {code}
>>>> boolean next(K,V);
>>>> {code}
>>>>
>>>> So, if the deserialize method does not return the same T (i.e., K or
>>>> V), how
>>>> would this new Object be propagated back thru the RecordReader next
>>>> method.
>>>>
>>>> It seems the contract on the deserialize method is that it must
>>>> return the
>>>> same T (although the javadocs say "may").
>>>>
>>>> Am I missing something? And if not, why isn't the API boolean
>>>> deserialize(T)
>>>> ?
>>>>
>>>> Thanks, pete
>>>>
>>>> Ps for things like Thrift, there's no way to re-use the object as
>>>> there's no
>>>> clear method, so if this is the case, I don't see how it would work??
>>>>
>>>
>>
>
Re: aerialization.Deserializer.deserialize method help
Posted by Pete Wyckoff <pw...@facebook.com>.
Specifically, line 75 of SequenceFileRecordReader:
> boolean remaining = (in.next(key) != null);
Throws out the return value of SequenceFile.next which is the result of
deserialize(obj).
-- pete
On 9/12/08 2:28 PM, "Pete Wyckoff" <pw...@facebook.com> wrote:
>
> What I mean is let's say I plug in a deserializer that always returns a new
> Object - in that case, since everything is pass by value, the new object
> cannot make its way back to the SequenceFileRecordReader user.
>
> While(sequenceFileRecordReader.next(mykey, myvalue)) {
> // do something
> }
>
> And then my deserializers one/both looks like:
>
> T deserialize(T obj) {
> // ignore obj
> return new T(params);
> }
>
> Obj would be the key or the value passed in by the user, but since I ignore
> it, basically what happens is the deserialized value actually gets thrown
> away.
>
> More specifically, it gets thrown away in SequenceFile.Reader I believe.
>
> -- pete
>
>
> On 9/12/08 2:20 PM, "Chris Douglas" <ch...@yahoo-inc.com> wrote:
>
>> If you pass in null to the deserializer, it creates a new instance and
>> returns it; passing in an instance reuses it.
>>
>> I don't understand the disconnect between Deserializer and the
>> RecordReader. Does your RecordReader generate instances that only
>> share a common subtype T? You need separate Deserializers for K and V,
>> if that's the issue... -C
>>
>> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>>
>>>
>>> This method's signature is
>>> {code}
>>> T deserialize(T);
>>> {code}
>>>
>>> But, the RecordReader next method is
>>>
>>> {code}
>>> boolean next(K,V);
>>> {code}
>>>
>>> So, if the deserialize method does not return the same T (i.e., K or
>>> V), how
>>> would this new Object be propagated back thru the RecordReader next
>>> method.
>>>
>>> It seems the contract on the deserialize method is that it must
>>> return the
>>> same T (although the javadocs say "may").
>>>
>>> Am I missing something? And if not, why isn't the API boolean
>>> deserialize(T)
>>> ?
>>>
>>> Thanks, pete
>>>
>>> Ps for things like Thrift, there's no way to re-use the object as
>>> there's no
>>> clear method, so if this is the case, I don't see how it would work??
>>>
>>
>
Re: aerialization.Deserializer.deserialize method help
Posted by Pete Wyckoff <pw...@facebook.com>.
What I mean is let's say I plug in a deserializer that always returns a new
Object - in that case, since everything is pass by value, the new object
cannot make its way back to the SequenceFileRecordReader user.
While(sequenceFileRecordReader.next(mykey, myvalue)) {
// do something
}
And then my deserializers one/both looks like:
T deserialize(T obj) {
// ignore obj
return new T(params);
}
Obj would be the key or the value passed in by the user, but since I ignore
it, basically what happens is the deserialized value actually gets thrown
away.
More specifically, it gets thrown away in SequenceFile.Reader I believe.
-- pete
On 9/12/08 2:20 PM, "Chris Douglas" <ch...@yahoo-inc.com> wrote:
> If you pass in null to the deserializer, it creates a new instance and
> returns it; passing in an instance reuses it.
>
> I don't understand the disconnect between Deserializer and the
> RecordReader. Does your RecordReader generate instances that only
> share a common subtype T? You need separate Deserializers for K and V,
> if that's the issue... -C
>
> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>
>>
>> This method's signature is
>> {code}
>> T deserialize(T);
>> {code}
>>
>> But, the RecordReader next method is
>>
>> {code}
>> boolean next(K,V);
>> {code}
>>
>> So, if the deserialize method does not return the same T (i.e., K or
>> V), how
>> would this new Object be propagated back thru the RecordReader next
>> method.
>>
>> It seems the contract on the deserialize method is that it must
>> return the
>> same T (although the javadocs say "may").
>>
>> Am I missing something? And if not, why isn't the API boolean
>> deserialize(T)
>> ?
>>
>> Thanks, pete
>>
>> Ps for things like Thrift, there's no way to re-use the object as
>> there's no
>> clear method, so if this is the case, I don't see how it would work??
>>
>
Re: aerialization.Deserializer.deserialize method help
Posted by Chris Douglas <ch...@yahoo-inc.com>.
If you pass in null to the deserializer, it creates a new instance and
returns it; passing in an instance reuses it.
I don't understand the disconnect between Deserializer and the
RecordReader. Does your RecordReader generate instances that only
share a common subtype T? You need separate Deserializers for K and V,
if that's the issue... -C
On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>
> This method's signature is
> {code}
> T deserialize(T);
> {code}
>
> But, the RecordReader next method is
>
> {code}
> boolean next(K,V);
> {code}
>
> So, if the deserialize method does not return the same T (i.e., K or
> V), how
> would this new Object be propagated back thru the RecordReader next
> method.
>
> It seems the contract on the deserialize method is that it must
> return the
> same T (although the javadocs say "may").
>
> Am I missing something? And if not, why isn't the API boolean
> deserialize(T)
> ?
>
> Thanks, pete
>
> Ps for things like Thrift, there's no way to re-use the object as
> there's no
> clear method, so if this is the case, I don't see how it would work??
>