You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yohan Chin <yo...@gmail.com> on 2012/05/15 08:43:20 UTC

question on VectorWritable convertor in elephant-bird.

Hi, 
Recently, I've tried to utilize elephant-bird for loading mahout result into pig.
I could install elephant-bird and got .jar file.
and followed instructions as appears in below; (written by Andy Schlaikjer)
https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
ex)
pair = LOAD '$data' USING com.twitter.elephantbird.pig.store.SequenceFileLoader (
 '-c $INT_CONVERTER',
 '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
);
however,  there is no sequenceFileLoader in store folder,  and load/sequencefileloader.java doesn't import "com.twitter.elephantbird.pig.mahout.VectorWritableConverter"

Is there any points I've missed?

Thanks a lot for this awesome api!


Re: question on VectorWritable convertor in elephant-bird.

Posted by Yohan Chin <yo...@gmail.com>.
it worked after adding one more dependency jar

REGISTER /path/to/lephant-bird/elephant-bird/guava-r07.jar

thanks andy!

On May 15, 2012, at 8:29 AM, Andy Schlaikjer wrote:

> Looking at my setup, I register Mahout jars for mahout-collections,
> mahout-math, and mahout-core when using VectorWritableConverter, so the set
> of register statements might look something like this:
> 
> {{{
> 
> REGISTER 'hdfs:///path/to/jars/com.twitter-elephant-bird-*.jar';
> REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-collections-*.jar';
> REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-math-*.jar';
> REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-core-*.jar';
> 
> }}}
> 
> 
> On Tue, May 15, 2012 at 8:15 AM, Andy Schlaikjer <
> andrew.schlaikjer@gmail.com> wrote:
> 
>> Yohan, Sounds like you're almost there--
>> 
>> You need to register both EB and Mahout jars so that when
>> SequenceFileLoader class-loads VectorWritableConverter, the Mahout
>> VectorWritable and Vector classes (and all of their dependencies) are also
>> available.
>> 
>> Andy
>> 
>> 
>> On Tue, May 15, 2012 at 7:59 AM, Yohan Chin <yo...@gmail.com> wrote:
>> 
>>> Andy,
>>> thanks for your response.
>>> 
>>> I've tried it again with your suggestion.
>>> still error (as below). seems like, need to solve "mahout class"
>>> dependency which used in VectorWritableConverter.
>>> 
>>> When I set-up elephant-bird, followed  "
>>> https://github.com/kevinweil/elephant-bird" and completed quick-start
>>> and protocol-buffer, thrift 0.5 dependencies.
>>> so got  path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>>> 
>>> in the pig code, register path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>>> 
>>> Should I set-up for mahout-class dependencies separately?
>>> 
>>> Thanks!
>>> 
>>> 
>>> error message)
>>> 
>>> Unexpected internal error. could not instantiate
>>> 'com.twitter.elephantbird.pig.load.SequenceFileLoader' with arguments '[-c
>>> com.twitter.elephantbird.pig.util.IntWritableConverter, -c
>>> com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- -sparse]'
>>> 
>>> 
>>> Caused by: java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
>>>       at java.lang.Class.forName0(Native Method)
>>>       at java.lang.Class.forName(Class.java:247)
>>>       at
>>> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
>>>       at
>>> com.twitter.elephantbird.pig.load.SequenceFileLoader.getWritableConverter(SequenceFileLoader.java:233)
>>>       at
>>> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:152)
>>>       at
>>> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:175)
>>>       ... 21 more
>>> Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>       at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>       at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> 
>>> 
>>> On May 15, 2012, at 7:01 AM, Andy Schlaikjer wrote:
>>> 
>>>> Yohan, that's a typo in VectorWritableConverter javadoc. I'll update
>>> today.
>>>> 
>>>> The SequenceFileStorage and ...Loader classes are in separate packages:
>>>> 
>>>> com.twitter.elephantbird.pig.*load*.SequenceFileLoader<
>>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
>>>> 
>>>> com.twitter.elephantbird.pig.*store*.SequenceFileStorage<
>>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
>>>> 
>>>> 
>>>> Both of these classes rely on the
>>>> WritableConverter<
>>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java
>>>> interface.
>>>> They classload converters at runtime, given the classname of the
>>>> converters you'd like to use for key and value Writable instances. When
>>>> dealing with SequenceFile<IntWritable, VectorWritable> data, do this:
>>>> 
>>>> {{{
>>>> 
>>>> %declare SEQFILE_LOADER
>>>> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>>>> %declare INT_CONVERTER
>>>> 'com.twitter.elephantbird.pig.util.IntWritableConverter';
>>>> %declare VECTOR_CONVERTER
>>>> 'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';
>>>> 
>>>> pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
>>>> '-c $INT_CONVERTER',
>>>> '-c $VECTOR_CONVERTER -- -sparse'
>>>> );
>>>> 
>>>> }}}
>>>> 
>>>> Hope this helps!
>>>> 
>>>> Andy
>>>> 
>>>> 
>>>> On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>>>> Sounds like a class path issue.
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On May 15, 2012, at 2:43 AM, Yohan Chin <yo...@gmail.com> wrote:
>>>>> 
>>>>>> 
>>>>>> Hi,
>>>>>> Recently, I've tried to utilize elephant-bird for loading mahout
>>> result
>>>> into pig.
>>>>>> I could install elephant-bird and got .jar file.
>>>>>> and followed instructions as appears in below; (written by Andy
>>>> Schlaikjer)
>>>>>> 
>>>> 
>>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
>>>>>> ex)
>>>>>> pair = LOAD '$data' USING
>>>> com.twitter.elephantbird.pig.store.SequenceFileLoader (
>>>>>> '-c $INT_CONVERTER',
>>>>>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
>>>>>> );
>>>>>> however,  there is no sequenceFileLoader in store folder,  and
>>>> load/sequencefileloader.java doesn't import
>>>> "com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
>>>>>> 
>>>>>> Is there any points I've missed?
>>>>>> 
>>>>>> Thanks a lot for this awesome api!
>>>>>> 
>>> 
>>> 
>> 


Re: question on VectorWritable convertor in elephant-bird.

Posted by Andy Schlaikjer <an...@gmail.com>.
Looking at my setup, I register Mahout jars for mahout-collections,
mahout-math, and mahout-core when using VectorWritableConverter, so the set
of register statements might look something like this:

{{{

REGISTER 'hdfs:///path/to/jars/com.twitter-elephant-bird-*.jar';
REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-collections-*.jar';
REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-math-*.jar';
REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-core-*.jar';

}}}


On Tue, May 15, 2012 at 8:15 AM, Andy Schlaikjer <
andrew.schlaikjer@gmail.com> wrote:

> Yohan, Sounds like you're almost there--
>
> You need to register both EB and Mahout jars so that when
> SequenceFileLoader class-loads VectorWritableConverter, the Mahout
> VectorWritable and Vector classes (and all of their dependencies) are also
> available.
>
> Andy
>
>
> On Tue, May 15, 2012 at 7:59 AM, Yohan Chin <yo...@gmail.com> wrote:
>
>> Andy,
>> thanks for your response.
>>
>> I've tried it again with your suggestion.
>> still error (as below). seems like, need to solve "mahout class"
>> dependency which used in VectorWritableConverter.
>>
>> When I set-up elephant-bird, followed  "
>> https://github.com/kevinweil/elephant-bird" and completed quick-start
>> and protocol-buffer, thrift 0.5 dependencies.
>> so got  path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>>
>> in the pig code, register path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>>
>> Should I set-up for mahout-class dependencies separately?
>>
>> Thanks!
>>
>>
>> error message)
>>
>> Unexpected internal error. could not instantiate
>> 'com.twitter.elephantbird.pig.load.SequenceFileLoader' with arguments '[-c
>> com.twitter.elephantbird.pig.util.IntWritableConverter, -c
>> com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- -sparse]'
>>
>>
>> Caused by: java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
>>        at
>> com.twitter.elephantbird.pig.load.SequenceFileLoader.getWritableConverter(SequenceFileLoader.java:233)
>>        at
>> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:152)
>>        at
>> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:175)
>>        ... 21 more
>> Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>
>>
>> On May 15, 2012, at 7:01 AM, Andy Schlaikjer wrote:
>>
>> > Yohan, that's a typo in VectorWritableConverter javadoc. I'll update
>> today.
>> >
>> > The SequenceFileStorage and ...Loader classes are in separate packages:
>> >
>> > com.twitter.elephantbird.pig.*load*.SequenceFileLoader<
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
>> >
>> > com.twitter.elephantbird.pig.*store*.SequenceFileStorage<
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
>> >
>> >
>> > Both of these classes rely on the
>> > WritableConverter<
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java
>> >interface.
>> > They classload converters at runtime, given the classname of the
>> > converters you'd like to use for key and value Writable instances. When
>> > dealing with SequenceFile<IntWritable, VectorWritable> data, do this:
>> >
>> > {{{
>> >
>> > %declare SEQFILE_LOADER
>> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>> > %declare INT_CONVERTER
>> > 'com.twitter.elephantbird.pig.util.IntWritableConverter';
>> > %declare VECTOR_CONVERTER
>> > 'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';
>> >
>> > pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
>> >  '-c $INT_CONVERTER',
>> >  '-c $VECTOR_CONVERTER -- -sparse'
>> > );
>> >
>> > }}}
>> >
>> > Hope this helps!
>> >
>> > Andy
>> >
>> >
>> > On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>> >> Sounds like a class path issue.
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On May 15, 2012, at 2:43 AM, Yohan Chin <yo...@gmail.com> wrote:
>> >>
>> >>>
>> >>> Hi,
>> >>> Recently, I've tried to utilize elephant-bird for loading mahout
>> result
>> > into pig.
>> >>> I could install elephant-bird and got .jar file.
>> >>> and followed instructions as appears in below; (written by Andy
>> > Schlaikjer)
>> >>>
>> >
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
>> >>> ex)
>> >>> pair = LOAD '$data' USING
>> > com.twitter.elephantbird.pig.store.SequenceFileLoader (
>> >>> '-c $INT_CONVERTER',
>> >>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
>> >>> );
>> >>> however,  there is no sequenceFileLoader in store folder,  and
>> > load/sequencefileloader.java doesn't import
>> > "com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
>> >>>
>> >>> Is there any points I've missed?
>> >>>
>> >>> Thanks a lot for this awesome api!
>> >>>
>>
>>
>

Re: question on VectorWritable convertor in elephant-bird.

Posted by Andy Schlaikjer <an...@gmail.com>.
Yohan, Sounds like you're almost there--

You need to register both EB and Mahout jars so that when
SequenceFileLoader class-loads VectorWritableConverter, the Mahout
VectorWritable and Vector classes (and all of their dependencies) are also
available.

Andy


On Tue, May 15, 2012 at 7:59 AM, Yohan Chin <yo...@gmail.com> wrote:

> Andy,
> thanks for your response.
>
> I've tried it again with your suggestion.
> still error (as below). seems like, need to solve "mahout class"
> dependency which used in VectorWritableConverter.
>
> When I set-up elephant-bird, followed  "
> https://github.com/kevinweil/elephant-bird" and completed quick-start and
> protocol-buffer, thrift 0.5 dependencies.
> so got  path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>
> in the pig code, register path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>
> Should I set-up for mahout-class dependencies separately?
>
> Thanks!
>
>
> error message)
>
> Unexpected internal error. could not instantiate
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader' with arguments '[-c
> com.twitter.elephantbird.pig.util.IntWritableConverter, -c
> com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- -sparse]'
>
>
> Caused by: java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
>        at java.lang.Class.forName0(Native Method)
>        at java.lang.Class.forName(Class.java:247)
>        at
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
>        at
> com.twitter.elephantbird.pig.load.SequenceFileLoader.getWritableConverter(SequenceFileLoader.java:233)
>        at
> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:152)
>        at
> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:175)
>        ... 21 more
> Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>
>
> On May 15, 2012, at 7:01 AM, Andy Schlaikjer wrote:
>
> > Yohan, that's a typo in VectorWritableConverter javadoc. I'll update
> today.
> >
> > The SequenceFileStorage and ...Loader classes are in separate packages:
> >
> > com.twitter.elephantbird.pig.*load*.SequenceFileLoader<
> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
> >
> > com.twitter.elephantbird.pig.*store*.SequenceFileStorage<
> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
> >
> >
> > Both of these classes rely on the
> > WritableConverter<
> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java
> >interface.
> > They classload converters at runtime, given the classname of the
> > converters you'd like to use for key and value Writable instances. When
> > dealing with SequenceFile<IntWritable, VectorWritable> data, do this:
> >
> > {{{
> >
> > %declare SEQFILE_LOADER
> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > %declare INT_CONVERTER
> > 'com.twitter.elephantbird.pig.util.IntWritableConverter';
> > %declare VECTOR_CONVERTER
> > 'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';
> >
> > pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
> >  '-c $INT_CONVERTER',
> >  '-c $VECTOR_CONVERTER -- -sparse'
> > );
> >
> > }}}
> >
> > Hope this helps!
> >
> > Andy
> >
> >
> > On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >> Sounds like a class path issue.
> >>
> >> Sent from my iPhone
> >>
> >> On May 15, 2012, at 2:43 AM, Yohan Chin <yo...@gmail.com> wrote:
> >>
> >>>
> >>> Hi,
> >>> Recently, I've tried to utilize elephant-bird for loading mahout result
> > into pig.
> >>> I could install elephant-bird and got .jar file.
> >>> and followed instructions as appears in below; (written by Andy
> > Schlaikjer)
> >>>
> >
> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
> >>> ex)
> >>> pair = LOAD '$data' USING
> > com.twitter.elephantbird.pig.store.SequenceFileLoader (
> >>> '-c $INT_CONVERTER',
> >>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
> >>> );
> >>> however,  there is no sequenceFileLoader in store folder,  and
> > load/sequencefileloader.java doesn't import
> > "com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
> >>>
> >>> Is there any points I've missed?
> >>>
> >>> Thanks a lot for this awesome api!
> >>>
>
>

Re: question on VectorWritable convertor in elephant-bird.

Posted by Yohan Chin <yo...@gmail.com>.
Andy,
thanks for your response.

I've tried it again with your suggestion.
still error (as below). seems like, need to solve "mahout class" dependency which used in VectorWritableConverter.

When I set-up elephant-bird, followed  "https://github.com/kevinweil/elephant-bird" and completed quick-start and protocol-buffer, thrift 0.5 dependencies.
so got  path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar

in the pig code, register path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar

Should I set-up for mahout-class dependencies separately?

Thanks!


error message)

Unexpected internal error. could not instantiate 'com.twitter.elephantbird.pig.load.SequenceFileLoader' with arguments '[-c com.twitter.elephantbird.pig.util.IntWritableConverter, -c com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- -sparse]'


Caused by: java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
        at com.twitter.elephantbird.pig.load.SequenceFileLoader.getWritableConverter(SequenceFileLoader.java:233)
        at com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:152)
        at com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:175)
        ... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)


On May 15, 2012, at 7:01 AM, Andy Schlaikjer wrote:

> Yohan, that's a typo in VectorWritableConverter javadoc. I'll update today.
> 
> The SequenceFileStorage and ...Loader classes are in separate packages:
> 
> com.twitter.elephantbird.pig.*load*.SequenceFileLoader<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java>
> com.twitter.elephantbird.pig.*store*.SequenceFileStorage<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java>
> 
> Both of these classes rely on the
> WritableConverter<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java>interface.
> They classload converters at runtime, given the classname of the
> converters you'd like to use for key and value Writable instances. When
> dealing with SequenceFile<IntWritable, VectorWritable> data, do this:
> 
> {{{
> 
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare INT_CONVERTER
> 'com.twitter.elephantbird.pig.util.IntWritableConverter';
> %declare VECTOR_CONVERTER
> 'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';
> 
> pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
>  '-c $INT_CONVERTER',
>  '-c $VECTOR_CONVERTER -- -sparse'
> );
> 
> }}}
> 
> Hope this helps!
> 
> Andy
> 
> 
> On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <te...@gmail.com> wrote:
>> Sounds like a class path issue.
>> 
>> Sent from my iPhone
>> 
>> On May 15, 2012, at 2:43 AM, Yohan Chin <yo...@gmail.com> wrote:
>> 
>>> 
>>> Hi,
>>> Recently, I've tried to utilize elephant-bird for loading mahout result
> into pig.
>>> I could install elephant-bird and got .jar file.
>>> and followed instructions as appears in below; (written by Andy
> Schlaikjer)
>>> 
> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
>>> ex)
>>> pair = LOAD '$data' USING
> com.twitter.elephantbird.pig.store.SequenceFileLoader (
>>> '-c $INT_CONVERTER',
>>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
>>> );
>>> however,  there is no sequenceFileLoader in store folder,  and
> load/sequencefileloader.java doesn't import
> "com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
>>> 
>>> Is there any points I've missed?
>>> 
>>> Thanks a lot for this awesome api!
>>> 


Re: question on VectorWritable convertor in elephant-bird.

Posted by Andy Schlaikjer <an...@gmail.com>.
Yohan, that's a typo in VectorWritableConverter javadoc. I'll update today.

The SequenceFileStorage and ...Loader classes are in separate packages:

com.twitter.elephantbird.pig.*load*.SequenceFileLoader<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java>
com.twitter.elephantbird.pig.*store*.SequenceFileStorage<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java>

Both of these classes rely on the
WritableConverter<https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java>interface.
They classload converters at runtime, given the classname of the
converters you'd like to use for key and value Writable instances. When
dealing with SequenceFile<IntWritable, VectorWritable> data, do this:

{{{

%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare INT_CONVERTER
'com.twitter.elephantbird.pig.util.IntWritableConverter';
%declare VECTOR_CONVERTER
'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';

pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
  '-c $INT_CONVERTER',
  '-c $VECTOR_CONVERTER -- -sparse'
);

}}}

Hope this helps!

Andy


On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <te...@gmail.com> wrote:
> Sounds like a class path issue.
>
> Sent from my iPhone
>
> On May 15, 2012, at 2:43 AM, Yohan Chin <yo...@gmail.com> wrote:
>
>>
>> Hi,
>> Recently, I've tried to utilize elephant-bird for loading mahout result
into pig.
>> I could install elephant-bird and got .jar file.
>> and followed instructions as appears in below; (written by Andy
Schlaikjer)
>>
https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
>> ex)
>> pair = LOAD '$data' USING
com.twitter.elephantbird.pig.store.SequenceFileLoader (
>> '-c $INT_CONVERTER',
>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
>> );
>> however,  there is no sequenceFileLoader in store folder,  and
load/sequencefileloader.java doesn't import
"com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
>>
>> Is there any points I've missed?
>>
>> Thanks a lot for this awesome api!
>>

Re: question on VectorWritable convertor in elephant-bird.

Posted by Ted Dunning <te...@gmail.com>.
Sounds like a class path issue. 

Sent from my iPhone

On May 15, 2012, at 2:43 AM, Yohan Chin <yo...@gmail.com> wrote:

> 
> Hi, 
> Recently, I've tried to utilize elephant-bird for loading mahout result into pig.
> I could install elephant-bird and got .jar file.
> and followed instructions as appears in below; (written by Andy Schlaikjer)
> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
> ex)
> pair = LOAD '$data' USING com.twitter.elephantbird.pig.store.SequenceFileLoader (
> '-c $INT_CONVERTER',
> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
> );
> however,  there is no sequenceFileLoader in store folder,  and load/sequencefileloader.java doesn't import "com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
> 
> Is there any points I've missed?
> 
> Thanks a lot for this awesome api!
>