You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by sharath jagannath <sh...@gmail.com> on 2011/02/08 00:15:54 UTC

Clustering with KMeans

Hey,

I tried to cluster the you_tube data set:
http://www.public.asu.edu/%7Emdechoud/datasets.html.
All the data points were under one cluster. Since the dataset is not really
big enough. I thought the  behavior is due to the dataset.
I did try to vary the threshold and the convergence delta but All my data
was still being put under the same cluster.

In my vectorizer code, I consider only the tags (and have assigned random
ranking to those) that are associated together with the user_id from the
above dataset.
My data format was List<User_Id tag|rating>.

I even encountered java out of heap for few values of threshold.

Output of clustering my data:

   Number of clusters : 1

Feb 7, 2011 3:14:18 PM org.slf4j.impl.JCLLoggerAdapter info

INFO: Reading Cluster:999 center:[242:2.000, 294:3.000, 425:4.000,
706:1.000] numPoints:1 radius:[0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000]



But, I also tested the Synthetic data clustering:
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
.
Data points were still put under a single cluster.

Can anybody tell why such behavior? Any suggestions to improve the
clustering performance.

Output of synthetic data clustering:

VL-99{n=1 c=[27.844, 26.473, 35.845, 27.200, 27.360, 24.333, 33.979, 30.604,
32.678, 31.190, 25.564, 30.310, 29.268, 32.330, 31.107, 29.685, 34.949,
28.897, 33.783, 29.133, 24.265, 31.624, 35.853, 32.800, 30.187, 27.977,
26.372, 32.658, 24.914, 34.400, 30.516, 24.473, 29.901, 26.470, 31.827,
31.975, 32.804, 27.100, 34.002, 25.306, 29.109, 25.275, 29.439, 34.912,
30.887, 29.513, 32.868, 34.125, 35.517, 24.820, 31.515, 36.003, 25.457,
29.016, 34.201, 25.769, 28.020, 29.241, 34.717, 30.070] r=[0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000]}

Weight:  Point:

1.0: [27.844, 26.473, 35.845, 27.200, 27.360, 24.333, 33.979, 30.604,
32.678, 31.190, 25.564, 30.310, 29.268, 32.330, 31.107, 29.685, 34.949,
28.897, 33.783, 29.133, 24.265, 31.624, 35.853, 32.800, 30.187, 27.977,
26.372, 32.658, 24.914, 34.400, 30.516, 24.473, 29.901, 26.470, 31.827,
31.975, 32.804, 27.100, 34.002, 25.306, 29.109, 25.275, 29.439, 34.912,
30.887, 29.513, 32.868, 34.125, 35.517, 24.820, 31.515, 36.003, 25.457,
29.016, 34.201, 25.769, 28.020, 29.241, 34.717, 30.070]



Thanks,
Sharath

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
oh!! that was id. Then how should I know total number of clusters?

Thanks,
Sharath

On Tue, Feb 8, 2011 at 2:32 PM, Kate Ericson <mo...@gmail.com> wrote:

> Hi Sharath,
>
> So do you have 197 clusters, or just one cluster where the id is 197?
> The ids don't always correspond to the number of clusters you have.
>
> -Kate
>
> On Tue, Feb 8, 2011 at 2:46 PM, sharath jagannath
> <sh...@gmail.com> wrote:
> > Now with t1=800, t2=750, SquaredEuclideanDistanceMeasure, I have 197
> > clusters:
> >
> > C-197{n=1 c=[194:13.118, 346:13.820, 497:13.118, 620:13.118, 1224:11.650]
> > r=[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000...
> >
> >
> > From the above sample output you can see cluster Id 197, centroids,
> number
> > of points and radius.
> >
> > For any value of t1 and t2 I always get n = 1. This is quite strange.
> >
> > Does it have to do anything with my dataset? Sorry for the confusion
> > created. All these while I have being saying number of clusters to be 1.
> >
> >
> >
> > Thanks,
> >
> > Sharath
> >
>



-- 
Thanks,
Sharath Jagannath

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
Yeah, that is the case actually. But they were of same dataset.
My dataset is rather too small, I had around 1200 data points. I clustered
1190 in the first run.
Used  the remaining 10 as the test data.

I have used the same set of vectorizers and drivers for both of them.
Only thing that I did not do in the test phase was create canopies using the
new data-points.
I used the canopies that was created in the training phase and guessed it
would work. Since the

thus the reason why they have different size. But I assumed the algorithm
works that way.
Correct me if I am wrong.

Things are working fine if I create canopies with training data + test data.
But I really do not want to do it unless that is the right way.

I read the following online clustering method in the Mahout in Action and
was trying to create such a system before I start doing anything else

1. Cluster 1 million articles as above and save the cluster centroids for
all clusters


2. Periodically, for each new article, use canopy clustering to assign it to
the cluster whose centroid is closest based on a very smal distance
threshold. This ensures that articles on topics that occurred previously are
associated with that topic cluster and shown instantly on the website. These
documents are removed from the new document list.


3. The left over documents, which are not associated with any old cluster,
forms new canopies. These canopies represent new topics that appeared in the
news that has little or no match with any articles that we have from the
past.


4. Use the new canopy centroids and cluster the articles that are not
associated with any of the old clusters and add these temporary cluster
centroids to our centroid list.

5. Less frequently, execute the full batch clustering to re-cluster the
entire set of documents. While doing so, it is useful to keep all previous
cluster centroids as input to the algorithm so that clustering achieves
faster convergence.


And have not done much progress though :D. I would have liked to see this
working by now.


Thanks
Sharath

Re: Clustering with KMeans

Posted by Ted Dunning <te...@gmail.com>.
Sharath,

This sounds like your vectors are not all the same length as the ones that
were originally used to do the clustering.

On Tue, Feb 8, 2011 at 7:36 PM, sharath jagannath <
sharathjagannath@gmail.com> wrote:

> Yeah, it was not the only cluster that was formed, there were around 200
> cluster. I played around with t1 and t2 and now I have 30 clusters which I
> am using to cluster the new data points, doing it with CanopyDriver.
>
> I get the following exceptions when the CanopyDriver.clusterData tries to
> find the closest Canopy.
>
> org.apache.mahout.math.CardinalityException: Required cardinality 23 but
> got
> 1234
>
> at org.apache.mahout.math.RandomAccessSparseVector.dot(
> RandomAccessSparseVector.java:172)
>
> at
> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(
> SquaredEuclideanDistanceMeasure.java:57)
>
> at org.apache.mahout.clustering.canopy.CanopyClusterer.findClosestCanopy(
> CanopyClusterer.java:139)
>
> at
>
> org.apache.mahout.clustering.canopy.CanopyClusterer.emitPointToClosestCanopy(
> CanopyClusterer.java:129)
>
> at org.apache.mahout.clustering.canopy.ClusterMapper.map(
> ClusterMapper.java:46)
>
> at org.apache.mahout.clustering.canopy.ClusterMapper.map(
> ClusterMapper.java:1)
>
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
> code which is trying to find the closest canopy:
>
>
> CanopyDriver.clusterData(conf, new Path ("test-vectors", "tfidf-vectors"),
>
> new Path (canopyCentroidsOutputPath, "clusters-0"),
> canopyCentroidsOutputPath, measure, t1, t2, false);
>
>
> * test-vectors/tfidf-vectors - path to the new test data, created using the
> previously mentioned customized data convertor and Seq2Sparse.
>
> * canopyCentroidsOutputPath, "clusters-0" - Path to the canopy centroids
> that were formed during the training phase.
>
> * measure - SquaredEuclideanDistanceMeasure, used the same even in the
> training phase.
>
> * t1 - 2000 t2 - 1900
>
> * Sequential false/true - either case it throws the cardinalityException in
> the RandomAccessSparseVector.dot method.
>
> dot method's first line of code is the cardinality comparison which throws
> the exception.  I wanted to use canopyClustering as a quick "online"
> clustering of the new data points(though not accurate compared to KMeans).
> Am
> I not supposed to use canopy that way?
>
>
> Thanks everybody, especially Kate. Your response to the previous emails are
> much appreciated.
>
>
> Thanks and Regards,
>
> Sharath
>

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
Yeah, it was not the only cluster that was formed, there were around 200
cluster. I played around with t1 and t2 and now I have 30 clusters which I
am using to cluster the new data points, doing it with CanopyDriver.

I get the following exceptions when the CanopyDriver.clusterData tries to
find the closest Canopy.

org.apache.mahout.math.CardinalityException: Required cardinality 23 but got
1234

at org.apache.mahout.math.RandomAccessSparseVector.dot(
RandomAccessSparseVector.java:172)

at
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(
SquaredEuclideanDistanceMeasure.java:57)

at org.apache.mahout.clustering.canopy.CanopyClusterer.findClosestCanopy(
CanopyClusterer.java:139)

at
org.apache.mahout.clustering.canopy.CanopyClusterer.emitPointToClosestCanopy(
CanopyClusterer.java:129)

at org.apache.mahout.clustering.canopy.ClusterMapper.map(
ClusterMapper.java:46)

at org.apache.mahout.clustering.canopy.ClusterMapper.map(
ClusterMapper.java:1)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


code which is trying to find the closest canopy:


CanopyDriver.clusterData(conf, new Path ("test-vectors", "tfidf-vectors"),

new Path (canopyCentroidsOutputPath, "clusters-0"),
canopyCentroidsOutputPath, measure, t1, t2, false);


* test-vectors/tfidf-vectors - path to the new test data, created using the
previously mentioned customized data convertor and Seq2Sparse.

* canopyCentroidsOutputPath, "clusters-0" - Path to the canopy centroids
that were formed during the training phase.

* measure - SquaredEuclideanDistanceMeasure, used the same even in the
training phase.

* t1 - 2000 t2 - 1900

* Sequential false/true - either case it throws the cardinalityException in
the RandomAccessSparseVector.dot method.

dot method's first line of code is the cardinality comparison which throws
the exception.  I wanted to use canopyClustering as a quick "online"
clustering of the new data points(though not accurate compared to KMeans). Am
I not supposed to use canopy that way?


Thanks everybody, especially Kate. Your response to the previous emails are
much appreciated.


Thanks and Regards,

Sharath

Re: Clustering with KMeans

Posted by Kate Ericson <er...@cs.colostate.edu>.
Hey Sharath,

I'm sure there's a better way to check the number of clusters, but you
could try looking over the file that you pulled cluster 197 from, and
see if there are more clusters in it.
I'm not very familiar with the canopy program, but you may want to try
smaller t1 and t2 values - maybe your points are too close together so
they're all ending up in one cluster?

-Kate

On Tue, Feb 8, 2011 at 3:52 PM, sharath jagannath
<sh...@gmail.com> wrote:
> btw it is just canopies generated by CanopyDriver.
>
> On Tue, Feb 8, 2011 at 2:32 PM, Kate Ericson <mo...@gmail.com> wrote:
>
>> Hi Sharath,
>>
>> So do you have 197 clusters, or just one cluster where the id is 197?
>> The ids don't always correspond to the number of clusters you have.
>>
>> -Kate
>>
>> On Tue, Feb 8, 2011 at 2:46 PM, sharath jagannath
>> <sh...@gmail.com> wrote:
>> > Now with t1=800, t2=750, SquaredEuclideanDistanceMeasure, I have 197
>> > clusters:
>> >
>> > C-197{n=1 c=[194:13.118, 346:13.820, 497:13.118, 620:13.118, 1224:11.650]
>> > r=[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000...
>> >
>> >
>> > From the above sample output you can see cluster Id 197, centroids,
>> number
>> > of points and radius.
>> >
>> > For any value of t1 and t2 I always get n = 1. This is quite strange.
>> >
>> > Does it have to do anything with my dataset? Sorry for the confusion
>> > created. All these while I have being saying number of clusters to be 1.
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Sharath
>> >
>>
>
>
>
> --
> Thanks,
> Sharath Jagannath
>

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
btw it is just canopies generated by CanopyDriver.

On Tue, Feb 8, 2011 at 2:32 PM, Kate Ericson <mo...@gmail.com> wrote:

> Hi Sharath,
>
> So do you have 197 clusters, or just one cluster where the id is 197?
> The ids don't always correspond to the number of clusters you have.
>
> -Kate
>
> On Tue, Feb 8, 2011 at 2:46 PM, sharath jagannath
> <sh...@gmail.com> wrote:
> > Now with t1=800, t2=750, SquaredEuclideanDistanceMeasure, I have 197
> > clusters:
> >
> > C-197{n=1 c=[194:13.118, 346:13.820, 497:13.118, 620:13.118, 1224:11.650]
> > r=[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000,
> > 0.000, 0.000, 0.000, 0.000, 0.000...
> >
> >
> > From the above sample output you can see cluster Id 197, centroids,
> number
> > of points and radius.
> >
> > For any value of t1 and t2 I always get n = 1. This is quite strange.
> >
> > Does it have to do anything with my dataset? Sorry for the confusion
> > created. All these while I have being saying number of clusters to be 1.
> >
> >
> >
> > Thanks,
> >
> > Sharath
> >
>



-- 
Thanks,
Sharath Jagannath

Re: Clustering with KMeans

Posted by Kate Ericson <mo...@gmail.com>.
Hi Sharath,

So do you have 197 clusters, or just one cluster where the id is 197?
The ids don't always correspond to the number of clusters you have.

-Kate

On Tue, Feb 8, 2011 at 2:46 PM, sharath jagannath
<sh...@gmail.com> wrote:
> Now with t1=800, t2=750, SquaredEuclideanDistanceMeasure, I have 197
> clusters:
>
> C-197{n=1 c=[194:13.118, 346:13.820, 497:13.118, 620:13.118, 1224:11.650]
> r=[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000...
>
>
> From the above sample output you can see cluster Id 197, centroids, number
> of points and radius.
>
> For any value of t1 and t2 I always get n = 1. This is quite strange.
>
> Does it have to do anything with my dataset? Sorry for the confusion
> created. All these while I have being saying number of clusters to be 1.
>
>
>
> Thanks,
>
> Sharath
>

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
Now with t1=800, t2=750, SquaredEuclideanDistanceMeasure, I have 197
clusters:

C-197{n=1 c=[194:13.118, 346:13.820, 497:13.118, 620:13.118, 1224:11.650]
r=[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000...


>From the above sample output you can see cluster Id 197, centroids, number
of points and radius.

For any value of t1 and t2 I always get n = 1. This is quite strange.

Does it have to do anything with my dataset? Sorry for the confusion
created. All these while I have being saying number of clusters to be 1.



Thanks,

Sharath

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
Dear Kate,

These are the set of commands that would be very similar to what my program
is doing. Even with these commands I got only one cluster.

1. Generate tf-idf vectors using the code samples mentioned previously
together with Seq2Sparse Command

2. ../bin/mahout canopy -t1 3 -t2 2.5  -i sj/output/tfidf-vectors -o
sj/canopy/output/ -ow

3. ../bin/mahout kmeans -i sj/output/tfidf-vectors -o sj/kmeans/output -x 10
-cd 0.001 -ow -c sj/canopy/output/clusters-0

4. ../bin/mahout clusterdump -s sj/kmeans/output/clusters-1/

>From this, I guess my distance thresholds and convergence delta needs to be
tuned. I am not sure about it though.
Being playing around with those values without any improvement.
I am pretty new at this and not able to conclude what is going wrong.

Thanks alot and appreciate your response.

Thanks,
Sharath

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
I am not using the command I have written a class extending Abstract Job and
 flow is as follows:

Convert the data to sequence files (all the mentioned data) using the codes
mentioned in the previous email -> Generate Vector using
SparseVectorFromSequenceFiles (Converted the main as a static method to use
this code from my class). -> Generate seed using Canopy -> Cluster using
KMeans.

Thanks for the response.

Cheers,
Sharath

On Tue, Feb 8, 2011 at 5:51 AM, Kate Ericson <mo...@gmail.com> wrote:

> Just to start from the top, can you show the command you've been using
> to start the kmeans job?
>
> -Kate
>
> On Mon, Feb 7, 2011 at 9:27 PM, sharath jagannath
> <sh...@gmail.com> wrote:
> > ok now I tried the digg data. Even now I am getting just one cluster
> (Digg
> > Data set: http://www.public.asu.edu/%7Emdechoud/datasets.html)
> >
> > My sample data for digg:
> >
> > 9275921    Ever heard about a movie and thought it sounded terrible, but
> > then when it came out it turned out to be pretty good, or even great Back
> in
> > August, fellow GeekDad Ken Denmead listed ten geeky movies that should've
> > been great, but weren't
> >
> > 9278984    Last week, an era came to a close in the NFL when Patriots
> safety
> > Rodney Harrison shredded his left calf muscle making an open-field tackle
> > against the Broncos. The injury ended his season and quite possibly his
> > stellar 15-year career. It also brought to a completion Harrison's long,
> > nasty reign as the NFL's dirtiest player. Now who's next?
> >
> > 9275737    Finally, the most prominent conservative in America has chosen
> > his pick for president, and it's liberal Democrat, Barack Obama
> >
> >
> > I am using everything that comes right out of the mahout's box. Only
> thing I
> > wrote was SequenceFromDigg:
> >
> >   for (String aFit : new FileLineIterable(current, charset, false)) {
> >
> >       StringBuilder file = new StringBuilder();
> >
> >       StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");
> >
> >       int tokenCount = tokenizer.countTokens();
> >
> >       if (tokenCount < 2)
> >
> >             continue;
> >
> >       String content = (String) tokenizer.nextToken().toString();
> >
> >       while (tokenizer.hasMoreTokens()) {
> >
> >             String token = tokenizer.nextToken().toString();
> >
> >             file.append(content).append(" ").append(token);
> >
> >             file.append("\n");
> >
> >       }
> >
> > }
> >
> >
> > and SequenceFromDelicious:
> >
> >   for (String aFit : new FileLineIterable(current, charset, false)) {
> >
> > StringBuilder file = new StringBuilder();
> >
> > StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");
> >
> > int tokenCount = tokenizer.countTokens();
> >
> > if (tokenCount < 2)
> >
> >  continue;
> >
> > --tokenCount;
> >
> > String content = (String) tokenizer.nextToken().toString();
> >
> > while (tokenizer.hasMoreTokens()) {
> >
> >  String token = tokenizer.nextToken().toString();
> >
> >  // Also consider the ranking, currently not handling it.
> >
> >  for (int i = 0; i < tokenCount; i++) {
> >
> >  file.append(token).append("\t");
> >
> >  }
> >
> >  --tokenCount;
> >
> >  file.append("\n");
> >
> > }
> >
> > writer.write(content, file.toString());
> >
> > }
> >
> >
> > Somebody, Please help :D
> >
> >
> > Thanks alot in advance.
> >
> >
> > Cheers,
> >
> > Sharath
> >
>

Re: Clustering with KMeans

Posted by Kate Ericson <mo...@gmail.com>.
Just to start from the top, can you show the command you've been using
to start the kmeans job?

-Kate

On Mon, Feb 7, 2011 at 9:27 PM, sharath jagannath
<sh...@gmail.com> wrote:
> ok now I tried the digg data. Even now I am getting just one cluster (Digg
> Data set: http://www.public.asu.edu/%7Emdechoud/datasets.html)
>
> My sample data for digg:
>
> 9275921    Ever heard about a movie and thought it sounded terrible, but
> then when it came out it turned out to be pretty good, or even great Back in
> August, fellow GeekDad Ken Denmead listed ten geeky movies that should've
> been great, but weren't
>
> 9278984    Last week, an era came to a close in the NFL when Patriots safety
> Rodney Harrison shredded his left calf muscle making an open-field tackle
> against the Broncos. The injury ended his season and quite possibly his
> stellar 15-year career. It also brought to a completion Harrison's long,
> nasty reign as the NFL's dirtiest player. Now who's next?
>
> 9275737    Finally, the most prominent conservative in America has chosen
> his pick for president, and it's liberal Democrat, Barack Obama
>
>
> I am using everything that comes right out of the mahout's box. Only thing I
> wrote was SequenceFromDigg:
>
>   for (String aFit : new FileLineIterable(current, charset, false)) {
>
>       StringBuilder file = new StringBuilder();
>
>       StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");
>
>       int tokenCount = tokenizer.countTokens();
>
>       if (tokenCount < 2)
>
>             continue;
>
>       String content = (String) tokenizer.nextToken().toString();
>
>       while (tokenizer.hasMoreTokens()) {
>
>             String token = tokenizer.nextToken().toString();
>
>             file.append(content).append(" ").append(token);
>
>             file.append("\n");
>
>       }
>
> }
>
>
> and SequenceFromDelicious:
>
>   for (String aFit : new FileLineIterable(current, charset, false)) {
>
> StringBuilder file = new StringBuilder();
>
> StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");
>
> int tokenCount = tokenizer.countTokens();
>
> if (tokenCount < 2)
>
>  continue;
>
> --tokenCount;
>
> String content = (String) tokenizer.nextToken().toString();
>
> while (tokenizer.hasMoreTokens()) {
>
>  String token = tokenizer.nextToken().toString();
>
>  // Also consider the ranking, currently not handling it.
>
>  for (int i = 0; i < tokenCount; i++) {
>
>  file.append(token).append("\t");
>
>  }
>
>  --tokenCount;
>
>  file.append("\n");
>
> }
>
> writer.write(content, file.toString());
>
> }
>
>
> Somebody, Please help :D
>
>
> Thanks alot in advance.
>
>
> Cheers,
>
> Sharath
>

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
ok now I tried the digg data. Even now I am getting just one cluster (Digg
Data set: http://www.public.asu.edu/%7Emdechoud/datasets.html)

My sample data for digg:

9275921    Ever heard about a movie and thought it sounded terrible, but
then when it came out it turned out to be pretty good, or even great Back in
August, fellow GeekDad Ken Denmead listed ten geeky movies that should've
been great, but weren't

9278984    Last week, an era came to a close in the NFL when Patriots safety
Rodney Harrison shredded his left calf muscle making an open-field tackle
against the Broncos. The injury ended his season and quite possibly his
stellar 15-year career. It also brought to a completion Harrison's long,
nasty reign as the NFL's dirtiest player. Now who's next?

9275737    Finally, the most prominent conservative in America has chosen
his pick for president, and it's liberal Democrat, Barack Obama


I am using everything that comes right out of the mahout's box. Only thing I
wrote was SequenceFromDigg:

   for (String aFit : new FileLineIterable(current, charset, false)) {

       StringBuilder file = new StringBuilder();

       StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");

       int tokenCount = tokenizer.countTokens();

       if (tokenCount < 2)

             continue;

       String content = (String) tokenizer.nextToken().toString();

       while (tokenizer.hasMoreTokens()) {

             String token = tokenizer.nextToken().toString();

             file.append(content).append(" ").append(token);

             file.append("\n");

       }

}


and SequenceFromDelicious:

   for (String aFit : new FileLineIterable(current, charset, false)) {

StringBuilder file = new StringBuilder();

StringTokenizer tokenizer = new StringTokenizer(aFit, "\t");

int tokenCount = tokenizer.countTokens();

if (tokenCount < 2)

 continue;

--tokenCount;

String content = (String) tokenizer.nextToken().toString();

while (tokenizer.hasMoreTokens()) {

 String token = tokenizer.nextToken().toString();

 // Also consider the ranking, currently not handling it.

 for (int i = 0; i < tokenCount; i++) {

 file.append(token).append("\t");

 }

 --tokenCount;

 file.append("\n");

}

writer.write(content, file.toString());

}


Somebody, Please help :D


Thanks alot in advance.


Cheers,

Sharath

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
Dear All,

Output posted is for delicious and not youtube. Sorry for the confusion.

My sample data would like:

<User_Id> <TagList (not showing the ranking associated with the tags, Even
without the ranking they were grouped into single cluster)>
greywall google chrome analysis open-source
mohammadi reuse resources reference programming tips tools web webdev
opensource open chrome google browsershot
djExprice google opensource programming chrome code development libraries
hidden class transitions whoopass Internet library


Even youtube data is more or less similar.


Thanks,
Sharath

Re: Clustering with KMeans

Posted by sharath jagannath <sh...@gmail.com>.
a better describing result for my data:

[[VL-0{n=2064 c=[0:0.002, 1:0.009, 2:0.002, 3:0.002, 4:0.001, 5:0.003,
6:0.001, 9:0.019, 10:0.001, 11:0.000, 12:0.002, 13:0.005, 14:0.001,
15:0.001, 16:0.004, 17:0.002, 18:0.002, 19:0.007, 20:0.006, 21:0.001,
22:0.002, 23:0.018, 24:0.002, 25:0.002, 26:0.011, 27:0.001, 28:0.037,
29:0.001, 30:0.001, 31:0.001, 32:0.006, 33:0.002, 34:0.035, 35:0.001,
36:0.004, 37:0.002, 38:0.001, 39:0.001, 40:0.001, 41:0.001, 42:0.009,
43:0.033, 44:0.002, 45:0.002, 46:0.011, 47:0.015, 48:0.002, 49:0.001,
50:0.010, 51:0.003, 52:0.001, 53:0.001, 54:0.001, 55:0.002, 56:0.005,
57:0.001, 58:0.001, 60:0.007, 61:0.001, 62:0.001, 63:0.003, 64:0.009,
65:0.004, 66:0.035, 67:0.015, 68:0.033, 69:0.006, 70:0.003, 71:0.002,
72:0.068, 73:0.016, 74:0.004, 75:0.001, 76:0.001, 78:0.022, 79:0.013,
80:0.027, 81:0.004, 82:0.006, 84:0.001, 85:0.001, 87:0.014, 88:0.001,
89:0.007, 90:0.001, 91:0.010, 92:0.003, 93:0.004, 94:0.016, 95:0.002,
96:0.005, 97:0.000, 98:0.002, 100:0.001, 101:0.056, 102:0.002, 103:0.018,
104:0.003, 105:0.001, 106:0.017, 107:0.063, 108:0.003, 109:0.001, 110:0.002,
111:0.005, 112:0.020, 113:0.001, 114:0.001, 115:0.001, 116:0.007, 117:0.001,
118:0.001, 119:0.001, 120:0.001, 121:0.001, 122:0.031, 123:0.002, 124:0.001,
125:0.001, 126:0.001, 127:0.001, 128:0.002, 129:0.003, 131:0.001, 132:0.008,
133:0.001, 134:0.019, 135:0.002, 136:0.025, 137:0.002, 138:0.005, 139:0.047,
140:0.001, 141:0.004, 142:0.001, 143:0.001, 144:0.005, 145:0.001, 146:0.010,
147:0.001, 148:0.025, 149:0.001, 150:0.008, 151:0.006, 152:0.001, 153:0.003,
154:0.004, 155:0.019, 156:0.007, 159:0.000, 160:0.025, 161:0.036, 162:0.007,
163:0.009, 164:0.026, 165:0.002, 166:0.001, 167:0.012, 168:0.021, 169:0.003,
170:0.000, 171:0.001, 172:0.010, 173:0.001, 174:0.001, 175:0.013, 176:0.010,
177:0.004, 178:0.055, 179:0.002, 180:0.001, 181:0.005, 182:0.002, 183:0.002,
184:0.003, 185:0.017, 186:0.001, 188:0.001, 189:0.019, 190:0.002, 191:0.014,
192:0.004, 193:0.006, 194:0.030, 195:0.001, 196:0.013, 197:0.003, 198:0.004,
199:0.006, 200:0.004, 202:0.009, 203:0.016, 204:0.009, 205:0.001, 206:0.008,
207:0.006, 208:0.014, 210:0.000, 211:0.003, 212:0.001, 213:0.002, 214:0.002,
215:0.005, 216:0.243, 217:0.001, 218:0.003, 219:0.081, 220:0.001, 221:0.015,
222:0.008, 223:0.030, 224:0.006, 225:0.001, 226:0.001, 227:0.001, 228:0.001,
229:0.028, 230:0.001, 231:0.003, 232:0.001, 233:0.006, 234:0.001, 235:0.031,
236:0.016, 237:0.002, 238:0.005, 239:0.000, 240:0.007, 241:0.002, 242:0.043,
243:0.004, 244:0.005, 245:0.001, 246:0.012, 247:0.001, 248:0.005, 249:0.001,
250:0.001, 251:0.001, 252:0.002, 253:0.002, 254:0.001, 255:0.003, 256:0.002,
257:0.002, 258:0.001, 259:0.001, 260:0.071, 261:0.007, 262:0.008, 263:0.001,
264:0.001, 265:0.023, 266:0.003, 267:0.005, 268:0.002, 269:0.003, 270:0.017,
271:0.006, 272:0.001, 273:0.002, 274:0.011, 275:0.002, 276:0.000, 277:0.040,
278:0.002, 279:0.011, 280:0.008, 281:0.001, 282:0.001, 283:0.006, 284:0.002,
285:0.001, 286:0.003, 287:0.004, 289:0.002, 290:0.004, 291:0.003, 292:0.001,
293:0.008, 294:0.022, 295:0.015, 296:0.006, 297:0.001, 298:0.008, 299:0.001,
300:0.002, 301:0.001, 302:0.005, 303:0.008, 304:0.027, 305:0.017, 306:0.006,
307:0.049, 308:0.001, 309:0.001, 310:0.021, 311:0.006, 312:0.003, 313:0.003,
314:0.005, 315:0.037, 316:0.002, 317:0.077, 318:0.001, 319:0.005, 320:0.002,
321:0.001, 322:0.003, 323:0.006, 324:0.020, 325:0.013, 326:0.003, 327:0.084,
328:0.082, 329:0.001, 330:0.001, 331:0.038, 332:0.002, 333:0.047, 334:0.002,
335:0.002, 336:0.014, 337:0.016, 338:0.002, 339:0.008, 340:0.006, 341:0.004,
342:0.007, 343:0.006, 344:0.002, 345:0.001, 346:0.001, 347:0.001, 348:0.002,
349:0.048, 350:0.002, 351:0.001, 352:0.002, 353:0.002, 354:0.001, 355:0.003,
356:0.003, 357:0.055, 358:0.004, 359:0.037, 360:0.001, 362:0.006, 363:0.003,
364:0.005, 365:0.012, 366:0.001, 368:0.001, 369:0.010, 370:0.008, 371:0.009,
372:0.001, 373:0.005, 374:0.001, 375:0.001, 376:0.002, 377:0.029, 379:0.001,
380:0.006, 381:0.005, 382:0.002, 383:0.035, 384:0.007, 385:0.001, 386:0.035,
387:0.001, 388:0.010, 390:0.130, 391:0.001, 392:0.011, 393:0.004, 394:0.016,
395:0.001, 396:0.044, 397:0.013, 398:0.020, 399:0.004, 400:0.002, 401:0.015,
402:0.001, 403:0.014, 404:0.020, 405:0.031, 407:0.002, 408:0.032, 411:0.095,
412:0.001, 413:0.001, 414:0.001, 415:0.004, 416:0.002, 417:0.007, 419:0.001,
420:0.006, 421:0.002, 422:0.083, 423:0.003, 424:0.000, 425:0.089, 426:0.001,
427:0.002, 428:0.002, 429:0.003, 430:0.048, 431:0.003, 432:0.050, 433:0.014,
434:0.002, 435:0.004, 436:0.007, 437:0.034, 438:0.010, 439:0.015, 440:0.111,
441:0.006, 442:0.008, 443:0.001, 444:0.009, 445:0.001, 446:0.002, 447:0.014,
448:0.085, 449:0.016, 450:0.004, 451:0.011, 452:0.001, 453:0.001, 454:0.002,
455:0.005, 457:0.001, 458:0.001, 459:0.002, 460:0.001, 461:0.002, 462:0.012,
463:0.005, 464:0.001, 465:0.002, 467:0.021, 468:0.001, 469:0.001, 470:0.041,
471:0.001, 472:0.020, 473:0.003, 474:0.032, 475:0.035, 476:0.002, 477:0.008,
478:0.057, 479:0.048, 480:0.006, 481:0.003, 482:0.011, 483:0.045, 484:0.048,
485:0.017, 486:0.001, 487:0.025, 488:0.011, 489:0.001, 490:0.001, 491:0.004,
492:0.019, 493:0.005, 494:0.001, 495:0.001, 496:0.001, 497:0.001, 498:0.015,
499:0.002, 500:0.003, 501:0.001, 502:0.001, 503:0.024, 504:0.002, 505:0.002,
506:0.008, 507:0.008, 508:0.005, 509:0.058, 510:0.003, 511:0.003, 512:0.029,
513:0.014, 514:0.012, 515:0.037, 516:0.001, 517:0.009, 518:0.003, 519:0.001,
520:0.001, 522:0.025, 524:0.002, 525:0.000, 526:0.002, 528:0.017, 529:0.011,
530:0.003, 531:0.020, 532:0.004, 534:0.001, 535:0.060, 536:0.001, 537:0.003,
538:0.007, 539:0.004, 540:0.002, 541:0.001, 542:0.002, 543:0.001, 544:0.015,
545:0.001, 546:0.004, 547:0.006, 549:0.004, 550:0.001, 551:0.049, 552:0.002,
553:0.001, 554:0.011, 555:0.005, 556:0.001, 557:0.022, 558:0.002, 559:0.054,
560:0.002, 561:0.001, 562:0.001, 563:0.012, 564:0.010, 565:0.036, 566:0.003,
567:0.011, 568:0.005, 569:0.016, 571:0.014, 572:0.007, 573:0.041, 574:0.006,
575:0.002, 576:0.008, 577:0.015, 578:0.002, 579:0.003, 580:0.002, 582:0.001,
583:0.008, 584:0.019, 585:0.001, 586:0.001, 587:0.001, 588:0.053, 589:0.001,
590:0.012, 591:0.015, 592:0.044, 593:0.002, 594:0.002, 595:0.002, 596:0.003,
597:0.058, 598:0.002, 599:0.157, 600:0.089, 601:0.160, 602:0.123, 603:0.002,
605:0.027, 606:0.001, 607:0.021, 608:0.058, 609:0.001, 610:0.001, 611:0.016,
612:0.013, 613:0.002, 614:0.003, 615:0.005, 616:0.006, 617:0.008, 618:0.023,
619:0.024, 620:0.009, 621:0.115, 622:0.001, 623:0.003, 624:0.006, 625:0.005,
626:0.002, 627:0.016, 628:0.001, 629:0.003, 630:0.001, 631:0.001, 632:0.001,
633:0.008, 634:0.001, 635:0.027, 636:0.002, 638:0.021, 639:0.054, 640:0.000,
641:0.000, 642:0.204, 643:0.015, 644:0.008, 645:0.001, 646:0.002, 647:0.021,
648:0.001, 649:0.001, 650:0.001, 651:0.003, 652:0.001, 653:0.012, 654:0.002,
655:0.001, 656:0.008, 657:0.007, 658:0.031, 659:0.003, 661:0.021, 662:0.025,
663:0.005, 664:0.001, 665:0.004, 666:0.001, 667:0.031, 668:0.001, 669:0.001,
670:0.126, 671:0.001, 672:0.001, 673:0.024, 674:0.001, 675:0.002, 676:0.014,
677:0.012, 678:0.007, 679:0.006, 680:0.015, 681:0.084, 682:0.001, 683:0.000,
684:0.009, 685:0.018, 686:0.020, 687:0.019, 688:0.001, 689:0.002, 690:0.017,
691:0.030, 692:0.002, 693:0.012, 694:0.004, 695:0.015, 696:0.024, 697:0.011,
698:0.022, 699:0.003, 700:0.005, 701:0.019, 702:0.003, 703:0.001, 704:0.008,
705:0.014, 706:0.106, 707:0.011, 708:0.002, 709:0.006, 710:0.002, 711:0.002,
712:0.035, 713:0.020, 714:0.001, 715:0.107, 716:0.002, 718:0.001, 719:0.001,
720:0.001, 721:0.001, 722:0.025, 723:0.004, 724:0.001, 725:0.010, 726:0.008,
727:0.001, 728:0.001, 729:0.001, 730:0.023, 731:0.001, 732:0.001, 733:0.007,
734:0.001, 735:0.016, 736:0.001, 737:0.001, 738:0.035, 739:0.025, 740:0.013,
741:0.001, 742:0.021, 743:0.015, 744:0.202, 745:0.001, 746:0.007, 747:0.008,
748:0.010, 749:0.015, 750:0.002, 751:0.002, 752:0.006, 753:0.002, 754:0.009,
755:0.016, 756:0.006, 757:0.020, 758:0.009, 759:0.001, 760:0.018, 761:0.004,
762:0.009, 763:0.005, 764:0.006, 765:0.030, 766:0.009, 767:0.011, 768:0.001,
769:0.016, 770:0.005, 771:0.019, 772:0.003, 773:0.020, 775:0.000, 776:0.002,
777:0.001, 778:0.001, 779:0.007, 780:0.000, 781:0.001, 782:0.023, 783:0.009,
784:0.001, 785:0.021, 786:0.001, 787:0.024, 788:0.015, 789:0.001, 790:0.010,
791:0.002, 792:0.027, 793:0.023, 794:0.002, 795:0.008, 796:0.001, 797:0.038,
798:0.013, 799:0.013, 800:0.009, 801:0.002, 802:0.016, 803:0.001, 804:0.020,
805:0.015, 806:0.054, 807:0.005, 808:0.003, 809:0.004, 810:0.114, 811:0.029,
812:0.031, 813:0.019, 814:0.241, 815:0.004, 816:0.054, 817:0.002, 818:0.001,
819:0.001, 820:0.001, 821:0.012, 822:0.020, 823:0.001, 824:0.003, 825:0.036,
826:0.001, 827:0.010, 828:0.013, 829:0.001, 830:0.176, 831:0.003, 832:0.129,
833:0.001, 834:0.001, 835:0.001, 836:0.010, 837:0.001, 838:0.001, 839:0.033,
840:0.002, 841:0.001, 842:0.005, 843:0.108, 844:0.001, 845:0.046, 846:0.002,
847:0.031, 848:0.002, 849:0.012, 850:0.017, 851:0.000, 852:0.003, 853:0.003,
854:0.008, 855:0.038, 856:0.001, 857:0.018, 859:0.017, 860:0.004, 862:0.063,
863:0.028, 865:0.019, 866:0.005, 867:0.019, 868:0.022, 869:0.008, 870:0.003,
871:0.005, 872:0.024, 873:0.001, 874:0.024, 875:0.001, 876:0.002, 877:0.037,
878:0.016, 879:0.066, 880:0.003, 882:0.020, 883:0.022, 884:0.002, 885:0.008,
886:0.002, 887:0.047, 888:0.217, 889:0.197, 890:0.002, 891:0.010, 892:0.006,
893:0.001, 894:0.222, 895:0.155, 896:0.002, 897:0.046, 898:0.004, 899:0.017,
900:0.002, 901:0.002, 902:0.056, 903:0.002, 905:0.003, 906:0.022, 907:0.051,
908:0.005, 909:0.021, 910:0.002, 912:0.100, 913:0.009, 914:0.002, 915:0.003,
916:0.003, 917:0.003, 918:0.034, 919:0.095, 920:0.001, 921:0.015, 922:0.037,
923:0.020, 924:0.002, 925:0.002, 926:0.006, 927:0.018, 928:0.025, 929:0.003,
930:0.002, 931:0.001, 932:0.001, 933:0.006] r=[0:0.088, 1:0.177, 2:0.110,
3:0.088, 4:0.066, 5:0.154, 6:0.044, 9:0.324, 10:0.044, 11:0.022, 12:0.088,
13:0.181, 14:0.044, 15:0.044, 16:0.198, 17:0.110, 18:0.088, 19:0.152,
20:0.157, 21:0.044, 22:0.110, 23:0.354, 24:0.110, 25:0.091, 26:0.258,
27:0.044, 28:0.495, 29:0.044, 30:0.044, 31:0.044, 32:0.246, 33:0.054,
34:0.403, 35:0.044, 36:0.176, 37:0.070, 38:0.044, 39:0.031, 40:0.044,
41:0.044, 42:0.194, 43:0.431, 44:0.070, 45:0.088, 46:0.190, 47:0.205,
48:0.079, 49:0.066, 50:0.377, 51:0.082, 52:0.044, 53:0.066, 54:0.066,
55:0.088, 56:0.130, 57:0.031, 58:0.044, 60:0.266, 61:0.066, 62:0.044,
63:0.132, 64:0.258, 65:0.112, 66:0.349, 67:0.282, 68:0.551, 69:0.175,
70:0.110, 71:0.110, 72:0.653, 73:0.244, 74:0.103, 75:0.044, 76:0.066,
78:0.329, 79:0.208, 80:0.364, 81:0.088, 82:0.208, 84:0.066, 85:0.066,
87:0.271, 88:0.066, 89:0.175, 90:0.044, 91:0.209, 92:0.091, 93:0.112,
94:0.267, 95:0.091, 96:0.220, 97:0.022, 98:0.070, 100:0.066, 101:0.702,
102:0.088, 103:0.374, 104:0.134, 105:0.044, 106:0.296, 107:0.485, 108:0.132,
109:0.044, 110:0.110, 111:0.220, 112:0.356, 113:0.044, 114:0.044, 115:0.066,
116:0.163, 117:0.044, 118:0.044, 119:0.044, 120:0.044, 121:0.066, 122:0.453,
123:0.070, 124:0.049, 125:0.044, 126:0.066, 127:0.044, 128:0.070, 129:0.085,
131:0.044, 132:0.203, 133:0.044, 134:0.376, 135:0.088, 136:0.530, 137:0.070,
138:0.177, 139:0.445, 140:0.066, 141:0.177, 142:0.031, 143:0.044, 144:0.199,
145:0.066, 146:0.234, 147:0.031, 148:0.448, 149:0.044, 150:0.162, 151:0.124,
152:0.066, 153:0.079, 154:0.160, 155:0.266, 156:0.149, 159:0.022, 160:0.415,
161:0.371, 162:0.165, 163:0.211, 164:0.384, 165:0.073, 166:0.044, 167:0.293,
168:0.418, 169:0.118, 170:0.022, 171:0.044, 172:0.207, 173:0.066, 174:0.066,
175:0.359, 176:0.224, 177:0.096, 178:0.695, 179:0.088, 180:0.044, 181:0.132,
182:0.088, 183:0.088, 184:0.093, 185:0.242, 186:0.066, 188:0.044, 189:0.359,
190:0.079, 191:0.342, 192:0.105, 193:0.217, 194:0.456, 195:0.044, 196:0.296,
197:0.132, 198:0.124, 199:0.224, 200:0.148, 202:0.320, 203:0.230, 204:0.173,
205:0.066, 206:0.315, 207:0.224, 208:0.287, 210:0.022, 211:0.132, 212:0.066,
213:0.088, 214:0.062, 215:0.242, 216:1.279, 217:0.066, 218:0.132, 219:0.694,
220:0.044, 221:0.371, 222:0.276, 223:0.419, 224:0.166, 225:0.044, 226:0.066,
227:0.044, 228:0.044, 229:0.315, 230:0.066, 231:0.098, 232:0.044, 233:0.173,
234:0.066, 235:0.524, 236:0.333, 237:0.062, 238:0.088, 239:0.022, 240:0.227,
241:0.091, 242:0.505, 243:0.139, 244:0.181, 245:0.066, 246:0.267, 247:0.044,
248:0.116, 249:0.044, 250:0.066, 251:0.044, 252:0.088, 253:0.066, 254:0.044,
255:0.132, 256:0.110, 257:0.088, 258:0.066, 259:0.044, 260:0.601, 261:0.184,
262:0.220, 263:0.066, 264:0.066, 265:0.299, 266:0.132, 267:0.143, 268:0.070,
269:0.085, 270:0.195, 271:0.151, 272:0.066, 273:0.054, 274:0.319, 275:0.079,
276:0.022, 277:0.426, 278:0.088, 279:0.207, 280:0.265, 281:0.066, 282:0.044,
283:0.103, 284:0.079, 285:0.044, 286:0.082, 287:0.108, 289:0.088, 290:0.141,
291:0.101, 292:0.066, 293:0.183, 294:0.409, 295:0.305, 296:0.149, 297:0.044,
298:0.197, 299:0.044, 300:0.062, 301:0.044, 302:0.203, 303:0.239, 304:0.460,
305:0.333, 306:0.146, 307:0.421, 308:0.049, 309:0.066, 310:0.253, 311:0.157,
312:0.132, 313:0.118, 314:0.110, 315:0.488, 316:0.054, 317:0.605, 318:0.044,
319:0.122, 320:0.110, 321:0.044, 322:0.112, 323:0.149, 324:0.300, 325:0.277,
326:0.132, 327:0.731, 328:0.707, 329:0.044, 330:0.044, 331:0.441, 332:0.110,
333:0.568, 334:0.070, 335:0.088, 336:0.259, 337:0.178, 338:0.088, 339:0.152,
340:0.166, 341:0.118, 342:0.205, 343:0.225, 344:0.088, 345:0.044, 346:0.066,
347:0.031, 348:0.079, 349:0.521, 350:0.070, 351:0.049, 352:0.088, 353:0.091,
354:0.044, 355:0.093, 356:0.114, 357:0.563, 358:0.139, 359:0.344, 360:0.066,
362:0.134, 363:0.082, 364:0.159, 365:0.254, 366:0.044, 368:0.066, 369:0.183,
370:0.203, 371:0.187, 372:0.049, 373:0.181, 374:0.049, 375:0.044, 376:0.088,
377:0.321, 379:0.044, 380:0.217, 381:0.101, 382:0.079, 383:0.438, 384:0.177,
385:0.044, 386:0.349, 387:0.044, 388:0.188, 390:0.908, 391:0.044, 392:0.384,
393:0.148, 394:0.235, 395:0.066, 396:0.481, 397:0.194, 398:0.285, 399:0.130,
400:0.088, 401:0.250, 402:0.031, 403:0.214, 404:0.345, 405:0.421, 407:0.110,
408:0.384, 411:0.718, 412:0.049, 413:0.044, 414:0.044, 415:0.098, 416:0.079,
417:0.228, 419:0.066, 420:0.203, 421:0.088, 422:0.673, 423:0.110, 424:0.022,
425:0.800, 426:0.044, 427:0.066, 428:0.110, 429:0.098, 430:0.464, 431:0.093,
432:0.424, 433:0.208, 434:0.079, 435:0.110, 436:0.169, 437:0.398, 438:0.209,
439:0.397, 440:0.765, 441:0.187, 442:0.204, 443:0.066, 444:0.213, 445:0.044,
446:0.110, 447:0.252, 448:0.570, 449:0.314, 450:0.198, 451:0.170, 452:0.044,
453:0.044, 454:0.110, 455:0.148, 457:0.066, 458:0.044, 459:0.062, 460:0.044,
461:0.110, 462:0.237, 463:0.203, 464:0.044, 465:0.058, 467:0.324, 468:0.044,
469:0.044, 470:0.450, 471:0.066, 472:0.334, 473:0.085, 474:0.521, 475:0.406,
476:0.088, 477:0.200, 478:0.520, 479:0.510, 480:0.146, 481:0.101, 482:0.349,
483:0.426, 484:0.541, 485:0.270, 486:0.044, 487:0.290, 488:0.210, 489:0.044,
490:0.066, 491:0.176, 492:0.305, 493:0.181, 494:0.031, 495:0.066, 496:0.044,
497:0.066, 498:0.224, 499:0.088, 500:0.112, 501:0.044, 502:0.066, 503:0.404,
504:0.070, 505:0.088, 506:0.176, 507:0.229, 508:0.134, 509:0.629, 510:0.082,
511:0.091, 512:0.391, 513:0.242, 514:0.214, 515:0.427, 516:0.044, 517:0.174,
518:0.093, 519:0.066, 520:0.066, 522:0.499, 524:0.088, 525:0.022, 526:0.088,
528:0.262, 529:0.209, 530:0.110, 531:0.301, 532:0.177, 534:0.066, 535:0.510,
536:0.049, 537:0.082, 538:0.130, 539:0.124, 540:0.088, 541:0.049, 542:0.062,
543:0.044, 544:0.318, 545:0.049, 546:0.198, 547:0.165, 549:0.156, 550:0.066,
551:0.487, 552:0.079, 553:0.049, 554:0.169, 555:0.103, 556:0.044, 557:0.354,
558:0.054, 559:0.479, 560:0.088, 561:0.044, 562:0.044, 563:0.302, 564:0.304,
565:0.350, 566:0.132, 567:0.207, 568:0.181, 569:0.352, 571:0.317, 572:0.151,
573:0.512, 574:0.151, 575:0.110, 576:0.212, 577:0.252, 578:0.088, 579:0.091,
580:0.088, 582:0.044, 583:0.200, 584:0.243, 585:0.049, 586:0.044, 587:0.044,
588:0.475, 589:0.066, 590:0.174, 591:0.246, 592:0.461, 593:0.066, 594:0.088,
595:0.110, 596:0.154, 597:0.640, 598:0.088, 599:1.008, 600:0.716, 601:0.858,
602:0.751, 603:0.088, 605:0.313, 606:0.049, 607:0.340, 608:0.631, 609:0.038,
610:0.066, 611:0.320, 612:0.291, 613:0.088, 614:0.118, 615:0.108, 616:0.189,
617:0.239, 618:0.290, 619:0.289, 620:0.181, 621:0.817, 622:0.031, 623:0.134,
624:0.157, 625:0.177, 626:0.088, 627:0.262, 628:0.044, 629:0.093, 630:0.031,
631:0.044, 632:0.066, 633:0.181, 634:0.044, 635:0.392, 636:0.070, 638:0.302,
639:0.608, 640:0.022, 641:0.022, 642:1.035, 643:0.426, 644:0.147, 645:0.066,
646:0.088, 647:0.324, 648:0.044, 649:0.066, 650:0.044, 651:0.098, 652:0.044,
653:0.296, 654:0.073, 655:0.044, 656:0.352, 657:0.202, 658:0.315, 659:0.114,
661:0.280, 662:0.339, 663:0.132, 664:0.031, 665:0.093, 666:0.044, 667:0.350,
668:0.049, 669:0.066, 670:0.864, 671:0.044, 672:0.066, 673:0.335, 674:0.044,
675:0.073, 676:0.280, 677:0.304, 678:0.155, 679:0.143, 680:0.237, 681:0.722,
682:0.044, 683:0.022, 684:0.155, 685:0.271, 686:0.366, 687:0.338, 688:0.044,
689:0.088, 690:0.240, 691:0.394, 692:0.088, 693:0.352, 694:0.110, 695:0.231,
696:0.364, 697:0.207, 698:0.391, 699:0.154, 700:0.160, 701:0.385, 702:0.098,
703:0.031, 704:0.212, 705:0.216, 706:0.779, 707:0.322, 708:0.091, 709:0.189,
710:0.110, 711:0.070, 712:0.358, 713:0.282, 714:0.066, 715:0.655, 716:0.088,
718:0.066, 719:0.049, 720:0.044, 721:0.066, 722:0.466, 723:0.101, 724:0.031,
725:0.187, 726:0.190, 727:0.044, 728:0.044, 729:0.066, 730:0.452, 731:0.044,
732:0.066, 733:0.288, 734:0.066, 735:0.367, 736:0.044, 737:0.049, 738:0.406,
739:0.359, 740:0.212, 741:0.044, 742:0.361, 743:0.305, 744:1.054, 745:0.066,
746:0.308, 747:0.203, 748:0.213, 749:0.304, 750:0.054, 751:0.088, 752:0.108,
753:0.054, 754:0.205, 755:0.371, 756:0.136, 757:0.221, 758:0.243, 759:0.066,
760:0.288, 761:0.176, 762:0.209, 763:0.242, 764:0.144, 765:0.392, 766:0.183,
767:0.181, 768:0.066, 769:0.269, 770:0.159, 771:0.415, 772:0.118, 773:0.283,
775:0.022, 776:0.088, 777:0.066, 778:0.066, 779:0.185, 780:0.022, 781:0.044,
782:0.291, 783:0.198, 784:0.044, 785:0.353, 786:0.049, 787:0.358, 788:0.306,
789:0.038, 790:0.272, 791:0.070, 792:0.404, 793:0.316, 794:0.088, 795:0.225,
796:0.066, 797:0.421, 798:0.310, 799:0.261, 800:0.205, 801:0.091, 802:0.289,
803:0.044, 804:0.353, 805:0.302, 806:0.936, 807:0.149, 808:0.154, 809:0.130,
810:0.771, 811:0.480, 812:0.403, 813:0.467, 814:1.208, 815:0.177, 816:0.508,
817:0.088, 818:0.044, 819:0.066, 820:0.044, 821:0.210, 822:0.275, 823:0.066,
824:0.154, 825:0.489, 826:0.066, 827:0.255, 828:0.264, 829:0.066, 830:0.985,
831:0.112, 832:0.824, 833:0.044, 834:0.044, 835:0.044, 836:0.197, 837:0.031,
838:0.066, 839:0.413, 840:0.110, 841:0.044, 842:0.144, 843:0.775, 844:0.066,
845:0.486, 846:0.079, 847:0.337, 848:0.110, 849:0.220, 850:0.359, 851:0.022,
852:0.154, 853:0.132, 854:0.198, 855:0.372, 856:0.066, 857:0.280, 859:0.274,
860:0.128, 862:0.559, 863:0.392, 865:0.292, 866:0.159, 867:0.328, 868:0.357,
869:0.218, 870:0.154, 871:0.108, 872:0.317, 873:0.031, 874:0.348, 875:0.066,
876:0.110, 877:0.429, 878:0.404, 879:0.606, 880:0.132, 882:0.352, 883:0.398,
884:0.088, 885:0.152, 886:0.088, 887:0.476, 888:1.176, 889:1.146, 890:0.088,
891:0.311, 892:0.180, 893:0.066, 894:1.171, 895:0.971, 896:0.110, 897:0.522,
898:0.177, 899:0.271, 900:0.054, 901:0.088, 902:0.558, 903:0.079, 905:0.132,
906:0.222, 907:0.611, 908:0.154, 909:0.324, 910:0.110, 912:0.803, 913:0.225,
914:0.062, 915:0.154, 916:0.132, 917:0.132, 918:0.464, 919:1.164, 920:0.044,
921:0.249, 922:0.489, 923:0.421, 924:0.054, 925:0.088, 926:0.265, 927:0.321,
928:0.455, 929:0.154, 930:0.088, 931:0.066, 932:0.044, 933:0.105]}]]

On Mon, Feb 7, 2011 at 3:15 PM, sharath jagannath <
sharathjagannath@gmail.com> wrote:

> Hey,
>
> I tried to cluster the you_tube data set:
> http://www.public.asu.edu/%7Emdechoud/datasets.html.
> All the data points were under one cluster. Since the dataset is not really
> big enough. I thought the  behavior is due to the dataset.
> I did try to vary the threshold and the convergence delta but All my data
> was still being put under the same cluster.
>
> In my vectorizer code, I consider only the tags (and have assigned random
> ranking to those) that are associated together with the user_id from the
> above dataset.
> My data format was List<User_Id tag|rating>.
>
> I even encountered java out of heap for few values of threshold.
>
> Output of clustering my data:
>
>  Number of clusters : 1
>
> Feb 7, 2011 3:14:18 PM org.slf4j.impl.JCLLoggerAdapter info
>
> INFO: Reading Cluster:999 center:[242:2.000, 294:3.000, 425:4.000,
> 706:1.000] numPoints:1 radius:[0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000]
>
>
>
> But, I also tested the Synthetic data clustering:
> https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
> .
> Data points were still put under a single cluster.
>
> Can anybody tell why such behavior? Any suggestions to improve the
> clustering performance.
>
> Output of synthetic data clustering:
>
> VL-99{n=1 c=[27.844, 26.473, 35.845, 27.200, 27.360, 24.333, 33.979,
> 30.604, 32.678, 31.190, 25.564, 30.310, 29.268, 32.330, 31.107, 29.685,
> 34.949, 28.897, 33.783, 29.133, 24.265, 31.624, 35.853, 32.800, 30.187,
> 27.977, 26.372, 32.658, 24.914, 34.400, 30.516, 24.473, 29.901, 26.470,
> 31.827, 31.975, 32.804, 27.100, 34.002, 25.306, 29.109, 25.275, 29.439,
> 34.912, 30.887, 29.513, 32.868, 34.125, 35.517, 24.820, 31.515, 36.003,
> 25.457, 29.016, 34.201, 25.769, 28.020, 29.241, 34.717, 30.070] r=[0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
> 0.000, 0.000, 0.000, 0.000]}
>
> Weight:  Point:
>
> 1.0: [27.844, 26.473, 35.845, 27.200, 27.360, 24.333, 33.979, 30.604,
> 32.678, 31.190, 25.564, 30.310, 29.268, 32.330, 31.107, 29.685, 34.949,
> 28.897, 33.783, 29.133, 24.265, 31.624, 35.853, 32.800, 30.187, 27.977,
> 26.372, 32.658, 24.914, 34.400, 30.516, 24.473, 29.901, 26.470, 31.827,
> 31.975, 32.804, 27.100, 34.002, 25.306, 29.109, 25.275, 29.439, 34.912,
> 30.887, 29.513, 32.868, 34.125, 35.517, 24.820, 31.515, 36.003, 25.457,
> 29.016, 34.201, 25.769, 28.020, 29.241, 34.717, 30.070]
>
>
>
> Thanks,
> Sharath
>



-- 
Thanks,
Sharath Jagannath