You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yunming Zhang <zh...@gmail.com> on 2012/12/19 02:49:51 UTC

Is there anyway you could easily make a deep copy of Vector Writable class with hadoop's ReflectionUtils?

Hi, 

I have been trying to find a way to make a deep copy of key, value pairs inside SequenceFileRecordReader as I am implementing a getCurrentKeyCopy() and getCurrentValueCopy() method, but I am getting null pointer exception when calling "value.get()" in CIMapper.java, line 37,  

I want the record reader to create a new data object for the new key value pairs instead of reusing the original memory location, 

I found a few utility classes in Hadoop that should be able to make a deep copy or clone any writable class, 

       key = ReflectionUtils.copy(outer.getConfiguration(),
                                   outer.getCurrentKey(), key);
        value = ReflectionUtils.copy(conf, outer.getCurrentValue(), value);

from MultiThreadedMapper and there is also WritableUtils.clone(...) method, 

However, both approaches seems to be failing for VectorWritable, I did notice that it is a bit different in that it wraps around another Mahout type, Vector, instead of a primitive type like int, long or string, in the cases of IntWritable and all that, 

I am not sure if this is why the copy or clone utilities in Hadoop designed for Writable is not working ?

Currently it is giving me null pointer exception when I try to call value.get(), it does seem we copied something, but the vector inside VectorWritable might have failed to get copied, 

Thanks

Yunming

Re: Is there anyway you could easily make a deep copy of Vector Writable class with hadoop's ReflectionUtils?

Posted by Yunming Zhang <zh...@gmail.com>.
Thanks Ted, 

so I looked into the mahout source a code a bit more and realized the hadoop utils works fine with Vector Writable. Because when the VectorWritable writes itself to data output stream, it also serialized the objects inside the Vector field of VectorWritable. So there is no requirement that the Writable objects has to wrap a primitive type. 

I later realized the null pointer exception has nothing to with the copied key, value pair as I was able to print out its content, it was because I forgot to set up the context and initialize the classifier. 

Yunming


On Dec 19, 2012, at 9:59 AM, Ted Dunning <te...@gmail.com> wrote:

> You can always just call clone() on the vector inside the VectorWritable.
> 
> On Tue, Dec 18, 2012 at 5:49 PM, Yunming Zhang
> <zh...@gmail.com>wrote:
> 
>> Hi,
>> 
>> I have been trying to find a way to make a deep copy of key, value pairs
>> inside SequenceFileRecordReader as I am implementing a getCurrentKeyCopy()
>> and getCurrentValueCopy() method, but I am getting null pointer exception
>> when calling "value.get()" in CIMapper.java, line 37,
>> 
>> I want the record reader to create a new data object for the new key value
>> pairs instead of reusing the original memory location,
>> 
>> I found a few utility classes in Hadoop that should be able to make a deep
>> copy or clone any writable class,
>> 
>>       key = ReflectionUtils.copy(outer.getConfiguration(),
>>                                   outer.getCurrentKey(), key);
>>        value = ReflectionUtils.copy(conf, outer.getCurrentValue(), value);
>> 
>> from MultiThreadedMapper and there is also WritableUtils.clone(...) method,
>> 
>> However, both approaches seems to be failing for VectorWritable, I did
>> notice that it is a bit different in that it wraps around another Mahout
>> type, Vector, instead of a primitive type like int, long or string, in the
>> cases of IntWritable and all that,
>> 
>> I am not sure if this is why the copy or clone utilities in Hadoop
>> designed for Writable is not working ?
>> 
>> Currently it is giving me null pointer exception when I try to call
>> value.get(), it does seem we copied something, but the vector inside
>> VectorWritable might have failed to get copied,
>> 
>> Thanks
>> 
>> Yunming


Re: Is there anyway you could easily make a deep copy of Vector Writable class with hadoop's ReflectionUtils?

Posted by Ted Dunning <te...@gmail.com>.
You can always just call clone() on the vector inside the VectorWritable.

On Tue, Dec 18, 2012 at 5:49 PM, Yunming Zhang
<zh...@gmail.com>wrote:

> Hi,
>
> I have been trying to find a way to make a deep copy of key, value pairs
> inside SequenceFileRecordReader as I am implementing a getCurrentKeyCopy()
> and getCurrentValueCopy() method, but I am getting null pointer exception
> when calling "value.get()" in CIMapper.java, line 37,
>
> I want the record reader to create a new data object for the new key value
> pairs instead of reusing the original memory location,
>
> I found a few utility classes in Hadoop that should be able to make a deep
> copy or clone any writable class,
>
>        key = ReflectionUtils.copy(outer.getConfiguration(),
>                                    outer.getCurrentKey(), key);
>         value = ReflectionUtils.copy(conf, outer.getCurrentValue(), value);
>
> from MultiThreadedMapper and there is also WritableUtils.clone(...) method,
>
> However, both approaches seems to be failing for VectorWritable, I did
> notice that it is a bit different in that it wraps around another Mahout
> type, Vector, instead of a primitive type like int, long or string, in the
> cases of IntWritable and all that,
>
> I am not sure if this is why the copy or clone utilities in Hadoop
> designed for Writable is not working ?
>
> Currently it is giving me null pointer exception when I try to call
> value.get(), it does seem we copied something, but the vector inside
> VectorWritable might have failed to get copied,
>
> Thanks
>
> Yunming