You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Alleon Guillaume <Gu...@eads.net> on 2010/02/23 14:44:26 UTC
Completely newbie
Hi all,
I am a complete newbie in action ... despite I have gone throuh the book of the same collection;)
I would like to classify a number of items - each of them being characterize by a number of vectors. I thought it will be a good idea first to classify the vectors. Unfortunately my items number keep on growing so what I have done so far is a small piece of code constructing the mahout dense vectors on the fly setting the name as my item name. As far as I understand those vectors are kept in memory ...
What are the next steps for me ?
Storing those vectors on disk I assume :)
Then creating some canopies and then using kmean to create my clusters.
Can you guide me trough some steps ?
Then I have more questions ?
Can mahout determine an "optimal" number of clusters ?
Once a set of clusters exist and new items are added, is it possible to update the existing clusters ? Is it possible to add clusters at alower cost than recreting it ?
T hanks for your help and time
Regards
Guillaume
Re: Completely newbie
Posted by Sean Owen <sr...@gmail.com>.
I believe the issue is that the manuscript is lagging the code just a
little bit. In 0.3, DenseVector has been separated from implementing
Writable, and instead we have DenseVectorWritable which wraps
DenseVector and knows how to serialize it.
At least, you will change DenseVector to DenseVectorWritable in the
line you quote. Robin is best position to say whether there are other
changes. All of which ought be reflected in an upcoming revision to
the MEAP chapter drafts of course.
Sean
On Thu, Mar 4, 2010 at 12:42 PM, tog <gu...@gmail.com> wrote:
> Dear all,
>
> Ok I tried 0.2 and it seems to work. Due to some comment on the list
> regarding KMean in 0.2 I decided to move to trunk since 0.3 seems
> close enough.
> Nevertheless I had an issue with the creation of a writer (I am still
> in the example of Ch 7 of Mahout in Action). Here is what I got:
>
> at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
> at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
> at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:843)
> at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:831)
> at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:823)
>
> when calling:
>
> writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class,
> DenseVector.class)
>
> Anyone having the same issue ?
>
> Regards
> Guillaume
>
>
> On Tue, Feb 23, 2010 at 7:14 PM, Alleon Guillaume
> <Gu...@eads.net> wrote:
>> Hi all,
>>
>> I am a complete newbie in action ... despite I have gone throuh the book of the same collection;)
>> I would like to classify a number of items - each of them being characterize by a number of vectors. I thought it will be a good idea first to classify the vectors. Unfortunately my items number keep on growing so what I have done so far is a small piece of code constructing the mahout dense vectors on the fly setting the name as my item name. As far as I understand those vectors are kept in memory ...
>> What are the next steps for me ?
>> Storing those vectors on disk I assume :)
>> Then creating some canopies and then using kmean to create my clusters.
>> Can you guide me trough some steps ?
>>
>> Then I have more questions ?
>> Can mahout determine an "optimal" number of clusters ?
>> Once a set of clusters exist and new items are added, is it possible to update the existing clusters ? Is it possible to add clusters at alower cost than recreting it ?
>>
>> T hanks for your help and time
>> Regards
>> Guillaume
>>
>
>
>
> --
> PGP KeyID: 1024D/69B00854 subkeys.pgp.net
>
> http://cheztog.blogspot.com
>
Re: Completely newbie
Posted by Robin Anil <ro...@gmail.com>.
Hi Guillaume,
Check out the manning forum, I have posted the code snippets there.
The thread is here
http://www.manning-sandbox.com/thread.jspa?threadID=36895&tstart=0
Robin
On Thu, Mar 4, 2010 at 6:12 PM, tog <gu...@gmail.com> wrote:
> Dear all,
>
> Ok I tried 0.2 and it seems to work. Due to some comment on the list
> regarding KMean in 0.2 I decided to move to trunk since 0.3 seems
> close enough.
> Nevertheless I had an issue with the creation of a writer (I am still
> in the example of Ch 7 of Mahout in Action). Here is what I got:
>
> at
> org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
> at
> org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
> at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:843)
> at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:831)
> at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:823)
>
> when calling:
>
> writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class,
> DenseVector.class)
>
> Anyone having the same issue ?
>
> Regards
> Guillaume
>
>
> On Tue, Feb 23, 2010 at 7:14 PM, Alleon Guillaume
> <Gu...@eads.net> wrote:
> > Hi all,
> >
> > I am a complete newbie in action ... despite I have gone throuh the book
> of the same collection;)
> > I would like to classify a number of items - each of them being
> characterize by a number of vectors. I thought it will be a good idea first
> to classify the vectors. Unfortunately my items number keep on growing so
> what I have done so far is a small piece of code constructing the mahout
> dense vectors on the fly setting the name as my item name. As far as I
> understand those vectors are kept in memory ...
> > What are the next steps for me ?
> > Storing those vectors on disk I assume :)
> > Then creating some canopies and then using kmean to create my clusters.
> > Can you guide me trough some steps ?
> >
> > Then I have more questions ?
> > Can mahout determine an "optimal" number of clusters ?
> > Once a set of clusters exist and new items are added, is it possible to
> update the existing clusters ? Is it possible to add clusters at alower cost
> than recreting it ?
> >
> > T hanks for your help and time
> > Regards
> > Guillaume
> >
>
>
>
> --
> PGP KeyID: 1024D/69B00854 subkeys.pgp.net
>
> http://cheztog.blogspot.com
>
Re: Completely newbie
Posted by tog <gu...@gmail.com>.
Dear all,
Ok I tried 0.2 and it seems to work. Due to some comment on the list
regarding KMean in 0.2 I decided to move to trunk since 0.3 seems
close enough.
Nevertheless I had an issue with the creation of a writer (I am still
in the example of Ch 7 of Mahout in Action). Here is what I got:
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:843)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:831)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:823)
when calling:
writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class,
DenseVector.class)
Anyone having the same issue ?
Regards
Guillaume
On Tue, Feb 23, 2010 at 7:14 PM, Alleon Guillaume
<Gu...@eads.net> wrote:
> Hi all,
>
> I am a complete newbie in action ... despite I have gone throuh the book of the same collection;)
> I would like to classify a number of items - each of them being characterize by a number of vectors. I thought it will be a good idea first to classify the vectors. Unfortunately my items number keep on growing so what I have done so far is a small piece of code constructing the mahout dense vectors on the fly setting the name as my item name. As far as I understand those vectors are kept in memory ...
> What are the next steps for me ?
> Storing those vectors on disk I assume :)
> Then creating some canopies and then using kmean to create my clusters.
> Can you guide me trough some steps ?
>
> Then I have more questions ?
> Can mahout determine an "optimal" number of clusters ?
> Once a set of clusters exist and new items are added, is it possible to update the existing clusters ? Is it possible to add clusters at alower cost than recreting it ?
>
> T hanks for your help and time
> Regards
> Guillaume
>
--
PGP KeyID: 1024D/69B00854 subkeys.pgp.net
http://cheztog.blogspot.com