You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Alleon Guillaume <Gu...@eads.net> on 2010/02/23 14:44:26 UTC

Completely newbie

Hi all,

I am a complete newbie in action ... despite I have gone throuh the book of the same collection;)
I would like to classify a number of items - each of them being characterize by a number of vectors. I thought it will be a good idea first to classify the vectors. Unfortunately my items number keep on growing so what I have done so far is a small piece of code constructing the mahout dense vectors on the fly setting the name as my item name. As far as I understand those vectors are kept in memory ...
What are the next steps for me ?
Storing those vectors on disk I assume :)
Then creating some canopies and then using kmean to create my clusters.
Can you guide me trough some steps ?

Then I have more questions ?
Can mahout determine an "optimal" number of clusters ?
Once a set of clusters exist and new items are added, is it possible to update the existing clusters ? Is it possible to add clusters at alower cost than recreting it ?

T hanks for your help and time
Regards
Guillaume

Re: Completely newbie

Posted by Sean Owen <sr...@gmail.com>.
I believe the issue is that the manuscript is lagging the code just a
little bit. In 0.3, DenseVector has been separated from implementing
Writable, and instead we have DenseVectorWritable which wraps
DenseVector and knows how to serialize it.

At least, you will change DenseVector to DenseVectorWritable in the
line you quote. Robin is best position to say whether there are other
changes. All of which ought be reflected in an upcoming revision to
the MEAP chapter drafts of course.

Sean

On Thu, Mar 4, 2010 at 12:42 PM, tog <gu...@gmail.com> wrote:
> Dear all,
>
> Ok I tried 0.2 and it seems to work. Due to some comment on the list
> regarding KMean in 0.2 I decided to move to trunk since 0.3 seems
> close enough.
> Nevertheless I had an issue with the creation of a writer (I am still
> in the example of Ch 7 of Mahout in Action). Here is what I got:
>
>        at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>        at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:843)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:831)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:823)
>
> when calling:
>
> writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class,
> DenseVector.class)
>
> Anyone having the same issue ?
>
> Regards
> Guillaume
>
>
> On Tue, Feb 23, 2010 at 7:14 PM, Alleon Guillaume
> <Gu...@eads.net> wrote:
>> Hi all,
>>
>> I am a complete newbie in action ... despite I have gone throuh the book of the same collection;)
>> I would like to classify a number of items - each of them being characterize by a number of vectors. I thought it will be a good idea first to classify the vectors. Unfortunately my items number keep on growing so what I have done so far is a small piece of code constructing the mahout dense vectors on the fly setting the name as my item name. As far as I understand those vectors are kept in memory ...
>> What are the next steps for me ?
>> Storing those vectors on disk I assume :)
>> Then creating some canopies and then using kmean to create my clusters.
>> Can you guide me trough some steps ?
>>
>> Then I have more questions ?
>> Can mahout determine an "optimal" number of clusters ?
>> Once a set of clusters exist and new items are added, is it possible to update the existing clusters ? Is it possible to add clusters at alower cost than recreting it ?
>>
>> T hanks for your help and time
>> Regards
>> Guillaume
>>
>
>
>
> --
> PGP KeyID: 1024D/69B00854  subkeys.pgp.net
>
> http://cheztog.blogspot.com
>

Re: Completely newbie

Posted by Robin Anil <ro...@gmail.com>.
Hi Guillaume,
Check out the manning forum, I have posted the code snippets there.

The thread is here
http://www.manning-sandbox.com/thread.jspa?threadID=36895&tstart=0

Robin

On Thu, Mar 4, 2010 at 6:12 PM, tog <gu...@gmail.com> wrote:

> Dear all,
>
> Ok I tried 0.2 and it seems to work. Due to some comment on the list
> regarding KMean in 0.2 I decided to move to trunk since 0.3 seems
> close enough.
> Nevertheless I had an issue with the creation of a writer (I am still
> in the example of Ch 7 of Mahout in Action). Here is what I got:
>
>        at
> org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>        at
> org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
>        at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:843)
>        at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:831)
>        at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:823)
>
> when calling:
>
> writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class,
> DenseVector.class)
>
> Anyone having the same issue ?
>
> Regards
> Guillaume
>
>
> On Tue, Feb 23, 2010 at 7:14 PM, Alleon Guillaume
> <Gu...@eads.net> wrote:
> > Hi all,
> >
> > I am a complete newbie in action ... despite I have gone throuh the book
> of the same collection;)
> > I would like to classify a number of items - each of them being
> characterize by a number of vectors. I thought it will be a good idea first
> to classify the vectors. Unfortunately my items number keep on growing so
> what I have done so far is a small piece of code constructing the mahout
> dense vectors on the fly setting the name as my item name. As far as I
> understand those vectors are kept in memory ...
> > What are the next steps for me ?
> > Storing those vectors on disk I assume :)
> > Then creating some canopies and then using kmean to create my clusters.
> > Can you guide me trough some steps ?
> >
> > Then I have more questions ?
> > Can mahout determine an "optimal" number of clusters ?
> > Once a set of clusters exist and new items are added, is it possible to
> update the existing clusters ? Is it possible to add clusters at alower cost
> than recreting it ?
> >
> > T hanks for your help and time
> > Regards
> > Guillaume
> >
>
>
>
> --
> PGP KeyID: 1024D/69B00854  subkeys.pgp.net
>
> http://cheztog.blogspot.com
>

Re: Completely newbie

Posted by tog <gu...@gmail.com>.
Dear all,

Ok I tried 0.2 and it seems to work. Due to some comment on the list
regarding KMean in 0.2 I decided to move to trunk since 0.3 seems
close enough.
Nevertheless I had an issue with the creation of a writer (I am still
in the example of Ch 7 of Mahout in Action). Here is what I got:

	at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
	at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
	at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:843)
	at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:831)
	at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:823)

when calling:

writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class,
DenseVector.class)

Anyone having the same issue ?

Regards
Guillaume


On Tue, Feb 23, 2010 at 7:14 PM, Alleon Guillaume
<Gu...@eads.net> wrote:
> Hi all,
>
> I am a complete newbie in action ... despite I have gone throuh the book of the same collection;)
> I would like to classify a number of items - each of them being characterize by a number of vectors. I thought it will be a good idea first to classify the vectors. Unfortunately my items number keep on growing so what I have done so far is a small piece of code constructing the mahout dense vectors on the fly setting the name as my item name. As far as I understand those vectors are kept in memory ...
> What are the next steps for me ?
> Storing those vectors on disk I assume :)
> Then creating some canopies and then using kmean to create my clusters.
> Can you guide me trough some steps ?
>
> Then I have more questions ?
> Can mahout determine an "optimal" number of clusters ?
> Once a set of clusters exist and new items are added, is it possible to update the existing clusters ? Is it possible to add clusters at alower cost than recreting it ?
>
> T hanks for your help and time
> Regards
> Guillaume
>



-- 
PGP KeyID: 1024D/69B00854  subkeys.pgp.net

http://cheztog.blogspot.com