You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Colum Foley <co...@gmail.com> on 2013/03/01 11:51:41 UTC

Elephant-Bird SequenceFileStorage VectorWritable Producing Empty Vectors

Hi,

I am trying to store Mahout RandomAccessSparseVector using
elephant-bird and pig. The data is of the form
key(text),value(RandomAccessSparseVector). when I run pig describe it
presents the following:

pair: {key: int,val: (cardinality: int,entries: {entry: (index:
int,value: double)})}

My problem is that when I try to store tuples using elephant-bird's
SequenceFileStorage as follows:

store clusteredOut into 'logsvectors.dat' using
com.twitter.elephantbird.pig.store.SequenceFileStorage (
   '-c com.twitter.elephantbird.pig.util.TextConverter',
   '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter  -- -sparse'
);

It runs successfully but when I examine the resulting Sequencefile all
the vectors are empty.

On the other hand, if I run the following instead:

store clusteredOut into 'logsvectors.dat' using
com.twitter.elephantbird.pig.store.SequenceFileStorage ();

ie do not specify the types of the key or value.

The vectors are non-empty but are of type text..and this causes my
clustering algorithm to fail(as they are expecting VectorWritable).

So my problem is that I need to output in VectorFileFormat, but when I
do the resulting vectors are empty.

Anyone else have experience with this issue?

Many thanks,
Colum

Re: Elephant-Bird SequenceFileStorage VectorWritable Producing Empty Vectors

Posted by Colum Foley <co...@gmail.com>.
Thank you Karl, apologies all for the spam

On Fri, Mar 1, 2013 at 11:10 AM, Karl Wright <da...@gmail.com> wrote:
> I think you want the Mahout list.  This is the ManifoldCF list.
>
> Karl
>
> On Fri, Mar 1, 2013 at 5:51 AM, Colum Foley <co...@gmail.com> wrote:
>> Hi,
>>
>> I am trying to store Mahout RandomAccessSparseVector using
>> elephant-bird and pig. The data is of the form
>> key(text),value(RandomAccessSparseVector). when I run pig describe it
>> presents the following:
>>
>> pair: {key: int,val: (cardinality: int,entries: {entry: (index:
>> int,value: double)})}
>>
>> My problem is that when I try to store tuples using elephant-bird's
>> SequenceFileStorage as follows:
>>
>> store clusteredOut into 'logsvectors.dat' using
>> com.twitter.elephantbird.pig.store.SequenceFileStorage (
>>    '-c com.twitter.elephantbird.pig.util.TextConverter',
>>    '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter  -- -sparse'
>> );
>>
>> It runs successfully but when I examine the resulting Sequencefile all
>> the vectors are empty.
>>
>> On the other hand, if I run the following instead:
>>
>> store clusteredOut into 'logsvectors.dat' using
>> com.twitter.elephantbird.pig.store.SequenceFileStorage ();
>>
>> ie do not specify the types of the key or value.
>>
>> The vectors are non-empty but are of type text..and this causes my
>> clustering algorithm to fail(as they are expecting VectorWritable).
>>
>> So my problem is that I need to output in VectorFileFormat, but when I
>> do the resulting vectors are empty.
>>
>> Anyone else have experience with this issue?
>>
>> Many thanks,
>> Colum

Re: Elephant-Bird SequenceFileStorage VectorWritable Producing Empty Vectors

Posted by Karl Wright <da...@gmail.com>.
I think you want the Mahout list.  This is the ManifoldCF list.

Karl

On Fri, Mar 1, 2013 at 5:51 AM, Colum Foley <co...@gmail.com> wrote:
> Hi,
>
> I am trying to store Mahout RandomAccessSparseVector using
> elephant-bird and pig. The data is of the form
> key(text),value(RandomAccessSparseVector). when I run pig describe it
> presents the following:
>
> pair: {key: int,val: (cardinality: int,entries: {entry: (index:
> int,value: double)})}
>
> My problem is that when I try to store tuples using elephant-bird's
> SequenceFileStorage as follows:
>
> store clusteredOut into 'logsvectors.dat' using
> com.twitter.elephantbird.pig.store.SequenceFileStorage (
>    '-c com.twitter.elephantbird.pig.util.TextConverter',
>    '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter  -- -sparse'
> );
>
> It runs successfully but when I examine the resulting Sequencefile all
> the vectors are empty.
>
> On the other hand, if I run the following instead:
>
> store clusteredOut into 'logsvectors.dat' using
> com.twitter.elephantbird.pig.store.SequenceFileStorage ();
>
> ie do not specify the types of the key or value.
>
> The vectors are non-empty but are of type text..and this causes my
> clustering algorithm to fail(as they are expecting VectorWritable).
>
> So my problem is that I need to output in VectorFileFormat, but when I
> do the resulting vectors are empty.
>
> Anyone else have experience with this issue?
>
> Many thanks,
> Colum