You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by praneet mhatre <pr...@gmail.com> on 2012/01/07 04:01:42 UTC

Re: How to determine which cluster an item belongs to

Hi Abin and Petar,

I tried the above approach with Dirichlet clustering. I am using the
following code snippet after clustering is completed.

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path path = new
Path("/home/praneet/Eclipse-Output/output/clusteredPoints");

        SequenceFile.Reader reader = new SequenceFile.Reader(fs,path,conf);
        IntWritable key = new IntWritable();
        WeightedVectorWritable value = new WeightedVectorWritable();
        while(reader.next(key,value))
        {
         System.out.print(value.toString() +" is in cluster " +
key.toString() );
        }
        System.out.println();

But I am getting the following error:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 1
12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 2
12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 3
12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 4
12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 5
12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 6
12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 7
12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 8
12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 9
12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 10
12/01/06 18:47:47 INFO clustering.ClusterDumper: Wrote 10 clusters
Exception in thread "main" java.io.FileNotFoundException:
/home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:137)
    at
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
    at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
    at
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
    at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
    at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
    at
org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at org.apache.mahout.clustering.Test.main(Test.java:46)

Any suggestions?

On Wed, Dec 28, 2011 at 12:25 AM, petar.mitrovic <pe...@gmail.com>wrote:

> Hi Abin,
>
> Thank you very much! Your suggestion helped me a lot.
>
> First, I've set named vector parameter (-nv) to Mahout's vector generation
> process (seq2sparse) in order to write more descriptive vectors.
>
> Later, I could use something like this:
>
> IntWritable key= new IntWritable();
> WeightedVectorWritable vector = new WeightedVectorWritable();
> while (reader.next(key, vector)) {
>        NamedVector nv = (NamedVector) vector.getVector();
>        System.out.println(nv.getName() + " belongs to cluster " +
> key.toString());
> }
>
> Hope this can be useful for someone else, too.
>
> Regards,
> Petar
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-determine-which-cluster-an-item-belongs-to-tp3613013p3615979.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>



-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine

Re: How to determine which cluster an item belongs to

Posted by Lance Norskog <go...@gmail.com>.

The ClusterDumper job can write the cluster data in various output
formats. CSV and GraphML (xml) formats are parseable, and include the
dictionary. I do not know how information about the cluster structures
are thrown away by these output formats.

The Gephi program reads GraphML and you can use it to visually explore
your clusters.

Lance

On Sat, Jan 7, 2012 at 12:57 PM, praneet mhatre <pr...@gmail.com> wrote:
> This seems to work perfectly. Thank you Sean!
>
> On Sat, Jan 7, 2012 at 12:36 PM, praneet mhatre <pr...@gmail.com>wrote:
>
>> Hi Sean,
>>
>> I tried passing the file too. But doing so gives me the following error:
>>
>>
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 1
>> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 2
>> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 3
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 4
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 5
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 6
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 7
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 8
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 9
>> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 10
>> java.lang.IllegalStateException:
>> file:/home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10
>>     at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:82)
>>     at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:1)
>>     at com.google.common.collect.Iterators$8.next(Iterators.java:667)
>>     at com.google.common.collect.Iterators$5.hasNext(Iterators.java:475)
>>     at
>> com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:39)
>>     at
>> org.apache.mahout.clustering.dirichlet.DirichletClusterMapper.loadClusters(DirichletClusterMapper.java:68)
>>     at
>> org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterDataSeq(DirichletDriver.java:487)
>>     at
>> org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterData(DirichletDriver.java:474)
>>     at
>> org.apache.mahout.clustering.dirichlet.DirichletDriver.run(DirichletDriver.java:172)
>>     at
>> org.apache.mahout.clustering.TestClusterDumper.testDirichlet2(TestClusterDumper.java:297)
>>     at org.apache.mahout.clustering.Test.main(Test.java:40)
>> Caused by: java.io.FileNotFoundException:
>> /home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10 (Is a
>> directory)
>>
>>     at java.io.FileInputStream.open(Native Method)
>>     at java.io.FileInputStream.<init>(FileInputStream.java:137)
>>     at
>> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>>     at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>>     at
>> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>>     at
>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>>     at
>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>     at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.<init>(SequenceFileValueIterator.java:51)
>>     at
>> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:78)
>>     ... 10 more
>>
>> This is what I get when I try
>>
>> Path path = new Path("/home/praneet/Eclipse-
>> Output/output/clusteredPoints/part-m-0");
>>
>> instead of
>>
>> Path path = new Path("/home/praneet/Eclipse-
>> Output/output/clusteredPoints");
>>
>> Since the directory has only one file part-m-0, I do not need to read the
>> whole directory. But I'll still try the approach you suggested and see how
>> things work out.
>>
>>
>>
>>
>> On Fri, Jan 6, 2012 at 9:09 PM, Sean Owen <sr...@gmail.com> wrote:
>>
>>> The error is right there:
>>>
>>> Exception in thread "main" java.io.FileNotFoundException:
>>> /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>>>
>>> You are passing a directory, not a file.
>>> Look at the class SequenceFileDirIterable for an easy way to iterate
>>> over all files in a directory as key-value pairs.
>>>
>>> On Sat, Jan 7, 2012 at 3:01 AM, praneet mhatre <pr...@gmail.com>
>>> wrote:
>>> > Hi Abin and Petar,
>>> >
>>> > I tried the above approach with Dirichlet clustering. I am using the
>>> > following code snippet after clustering is completed.
>>> >
>>> >        Configuration conf = new Configuration();
>>> >        FileSystem fs = FileSystem.get(conf);
>>> >        Path path = new
>>> > Path("/home/praneet/Eclipse-Output/output/clusteredPoints");
>>> >
>>> >        SequenceFile.Reader reader = new
>>> SequenceFile.Reader(fs,path,conf);
>>> >        IntWritable key = new IntWritable();
>>> >        WeightedVectorWritable value = new WeightedVectorWritable();
>>> >        while(reader.next(key,value))
>>> >        {
>>> >         System.out.print(value.toString() +" is in cluster " +
>>> > key.toString() );
>>> >        }
>>> >        System.out.println();
>>> >
>>> > But I am getting the following error:
>>> >
>>> > SLF4J: Class path contains multiple SLF4J bindings.
>>> > SLF4J: Found binding in
>>> >
>>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> > SLF4J: Found binding in
>>> >
>>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> > explanation.
>>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 1
>>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 2
>>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 3
>>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 4
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 5
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 6
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 7
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 8
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 9
>>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 10
>>> > 12/01/06 18:47:47 INFO clustering.ClusterDumper: Wrote 10 clusters
>>> > Exception in thread "main" java.io.FileNotFoundException:
>>> > /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>>> >    at java.io.FileInputStream.open(Native Method)
>>> >    at java.io.FileInputStream.<init>(FileInputStream.java:137)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>>> >    at
>>> >
>>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>>> >    at
>>> >
>>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>>> >    at
>>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>> >    at
>>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>> >    at
>>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>> >    at org.apache.mahout.clustering.Test.main(Test.java:46)
>>> >
>>> > Any suggestions?
>>> >
>>> > On Wed, Dec 28, 2011 at 12:25 AM, petar.mitrovic <
>>> petarmitrovic@gmail.com>wrote:
>>> >
>>> >> Hi Abin,
>>> >>
>>> >> Thank you very much! Your suggestion helped me a lot.
>>> >>
>>> >> First, I've set named vector parameter (-nv) to Mahout's vector
>>> generation
>>> >> process (seq2sparse) in order to write more descriptive vectors.
>>> >>
>>> >> Later, I could use something like this:
>>> >>
>>> >> IntWritable key= new IntWritable();
>>> >> WeightedVectorWritable vector = new WeightedVectorWritable();
>>> >> while (reader.next(key, vector)) {
>>> >>        NamedVector nv = (NamedVector) vector.getVector();
>>> >>        System.out.println(nv.getName() + " belongs to cluster " +
>>> >> key.toString());
>>> >> }
>>> >>
>>> >> Hope this can be useful for someone else, too.
>>> >>
>>> >> Regards,
>>> >> Petar
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >>
>>> http://lucene.472066.n3.nabble.com/How-to-determine-which-cluster-an-item-belongs-to-tp3613013p3615979.html
>>> >> Sent from the Mahout User List mailing list archive at Nabble.com.
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Praneet Mhatre
>>> > Graduate Student
>>> > Donald Bren School of ICS
>>> > University of California, Irvine
>>>
>>
>>
>>
>> --
>> Praneet Mhatre
>> Graduate Student
>> Donald Bren School of ICS
>> University of California, Irvine
>>
>>
>
>
> --
> Praneet Mhatre
> Graduate Student
> Donald Bren School of ICS
> University of California, Irvine



-- 
Lance Norskog
goksron@gmail.com

Re: How to determine which cluster an item belongs to

Posted by praneet mhatre <pr...@gmail.com>.

This seems to work perfectly. Thank you Sean!

On Sat, Jan 7, 2012 at 12:36 PM, praneet mhatre <pr...@gmail.com>wrote:

> Hi Sean,
>
> I tried passing the file too. But doing so gives me the following error:
>
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 1
> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 2
> 12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 3
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 4
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 5
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 6
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 7
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 8
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 9
> 12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 10
> java.lang.IllegalStateException:
> file:/home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10
>     at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:82)
>     at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:1)
>     at com.google.common.collect.Iterators$8.next(Iterators.java:667)
>     at com.google.common.collect.Iterators$5.hasNext(Iterators.java:475)
>     at
> com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:39)
>     at
> org.apache.mahout.clustering.dirichlet.DirichletClusterMapper.loadClusters(DirichletClusterMapper.java:68)
>     at
> org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterDataSeq(DirichletDriver.java:487)
>     at
> org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterData(DirichletDriver.java:474)
>     at
> org.apache.mahout.clustering.dirichlet.DirichletDriver.run(DirichletDriver.java:172)
>     at
> org.apache.mahout.clustering.TestClusterDumper.testDirichlet2(TestClusterDumper.java:297)
>     at org.apache.mahout.clustering.Test.main(Test.java:40)
> Caused by: java.io.FileNotFoundException:
> /home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10 (Is a
> directory)
>
>     at java.io.FileInputStream.open(Native Method)
>     at java.io.FileInputStream.<init>(FileInputStream.java:137)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>     at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>     at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>     at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.<init>(SequenceFileValueIterator.java:51)
>     at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:78)
>     ... 10 more
>
> This is what I get when I try
>
> Path path = new Path("/home/praneet/Eclipse-
> Output/output/clusteredPoints/part-m-0");
>
> instead of
>
> Path path = new Path("/home/praneet/Eclipse-
> Output/output/clusteredPoints");
>
> Since the directory has only one file part-m-0, I do not need to read the
> whole directory. But I'll still try the approach you suggested and see how
> things work out.
>
>
>
>
> On Fri, Jan 6, 2012 at 9:09 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> The error is right there:
>>
>> Exception in thread "main" java.io.FileNotFoundException:
>> /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>>
>> You are passing a directory, not a file.
>> Look at the class SequenceFileDirIterable for an easy way to iterate
>> over all files in a directory as key-value pairs.
>>
>> On Sat, Jan 7, 2012 at 3:01 AM, praneet mhatre <pr...@gmail.com>
>> wrote:
>> > Hi Abin and Petar,
>> >
>> > I tried the above approach with Dirichlet clustering. I am using the
>> > following code snippet after clustering is completed.
>> >
>> >        Configuration conf = new Configuration();
>> >        FileSystem fs = FileSystem.get(conf);
>> >        Path path = new
>> > Path("/home/praneet/Eclipse-Output/output/clusteredPoints");
>> >
>> >        SequenceFile.Reader reader = new
>> SequenceFile.Reader(fs,path,conf);
>> >        IntWritable key = new IntWritable();
>> >        WeightedVectorWritable value = new WeightedVectorWritable();
>> >        while(reader.next(key,value))
>> >        {
>> >         System.out.print(value.toString() +" is in cluster " +
>> > key.toString() );
>> >        }
>> >        System.out.println();
>> >
>> > But I am getting the following error:
>> >
>> > SLF4J: Class path contains multiple SLF4J bindings.
>> > SLF4J: Found binding in
>> >
>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: Found binding in
>> >
>> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> > explanation.
>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 1
>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 2
>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 3
>> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 4
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 5
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 6
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 7
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 8
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 9
>> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 10
>> > 12/01/06 18:47:47 INFO clustering.ClusterDumper: Wrote 10 clusters
>> > Exception in thread "main" java.io.FileNotFoundException:
>> > /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>> >    at java.io.FileInputStream.open(Native Method)
>> >    at java.io.FileInputStream.<init>(FileInputStream.java:137)
>> >    at
>> >
>> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>> >    at
>> >
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>> >    at
>> >
>> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>> >    at
>> >
>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>> >    at
>> >
>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>> >    at
>> >
>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>> >    at
>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>> >    at
>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>> >    at
>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>> >    at org.apache.mahout.clustering.Test.main(Test.java:46)
>> >
>> > Any suggestions?
>> >
>> > On Wed, Dec 28, 2011 at 12:25 AM, petar.mitrovic <
>> petarmitrovic@gmail.com>wrote:
>> >
>> >> Hi Abin,
>> >>
>> >> Thank you very much! Your suggestion helped me a lot.
>> >>
>> >> First, I've set named vector parameter (-nv) to Mahout's vector
>> generation
>> >> process (seq2sparse) in order to write more descriptive vectors.
>> >>
>> >> Later, I could use something like this:
>> >>
>> >> IntWritable key= new IntWritable();
>> >> WeightedVectorWritable vector = new WeightedVectorWritable();
>> >> while (reader.next(key, vector)) {
>> >>        NamedVector nv = (NamedVector) vector.getVector();
>> >>        System.out.println(nv.getName() + " belongs to cluster " +
>> >> key.toString());
>> >> }
>> >>
>> >> Hope this can be useful for someone else, too.
>> >>
>> >> Regards,
>> >> Petar
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://lucene.472066.n3.nabble.com/How-to-determine-which-cluster-an-item-belongs-to-tp3613013p3615979.html
>> >> Sent from the Mahout User List mailing list archive at Nabble.com.
>> >>
>> >
>> >
>> >
>> > --
>> > Praneet Mhatre
>> > Graduate Student
>> > Donald Bren School of ICS
>> > University of California, Irvine
>>
>
>
>
> --
> Praneet Mhatre
> Graduate Student
> Donald Bren School of ICS
> University of California, Irvine
>
>


-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine

Re: How to determine which cluster an item belongs to

Posted by praneet mhatre <pr...@gmail.com>.

Hi Sean,

I tried passing the file too. But doing so gives me the following error:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 1
12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 2
12/01/07 12:31:57 INFO dirichlet.DirichletDriver: Iteration 3
12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 4
12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 5
12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 6
12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 7
12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 8
12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 9
12/01/07 12:31:58 INFO dirichlet.DirichletDriver: Iteration 10
java.lang.IllegalStateException:
file:/home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10
    at
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:82)
    at
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:1)
    at com.google.common.collect.Iterators$8.next(Iterators.java:667)
    at com.google.common.collect.Iterators$5.hasNext(Iterators.java:475)
    at
com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:39)
    at
org.apache.mahout.clustering.dirichlet.DirichletClusterMapper.loadClusters(DirichletClusterMapper.java:68)
    at
org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterDataSeq(DirichletDriver.java:487)
    at
org.apache.mahout.clustering.dirichlet.DirichletDriver.clusterData(DirichletDriver.java:474)
    at
org.apache.mahout.clustering.dirichlet.DirichletDriver.run(DirichletDriver.java:172)
    at
org.apache.mahout.clustering.TestClusterDumper.testDirichlet2(TestClusterDumper.java:297)
    at org.apache.mahout.clustering.Test.main(Test.java:40)
Caused by: java.io.FileNotFoundException:
/home/praneet/Eclipse-Output/output/clusters-10-final/clusters-10 (Is a
directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:137)
    at
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
    at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
    at
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
    at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
    at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
    at
org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at
org.apache.mahout.common.iterator.sequencefile.SequenceFileValueIterator.<init>(SequenceFileValueIterator.java:51)
    at
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator$1.apply(SequenceFileDirValueIterator.java:78)
    ... 10 more

This is what I get when I try

Path path = new Path("/home/praneet/Eclipse-
Output/output/clusteredPoints/part-m-0");

instead of

Path path = new Path("/home/praneet/Eclipse-
Output/output/clusteredPoints");

Since the directory has only one file part-m-0, I do not need to read the
whole directory. But I'll still try the approach you suggested and see how
things work out.




On Fri, Jan 6, 2012 at 9:09 PM, Sean Owen <sr...@gmail.com> wrote:

> The error is right there:
>
> Exception in thread "main" java.io.FileNotFoundException:
> /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>
> You are passing a directory, not a file.
> Look at the class SequenceFileDirIterable for an easy way to iterate
> over all files in a directory as key-value pairs.
>
> On Sat, Jan 7, 2012 at 3:01 AM, praneet mhatre <pr...@gmail.com>
> wrote:
> > Hi Abin and Petar,
> >
> > I tried the above approach with Dirichlet clustering. I am using the
> > following code snippet after clustering is completed.
> >
> >        Configuration conf = new Configuration();
> >        FileSystem fs = FileSystem.get(conf);
> >        Path path = new
> > Path("/home/praneet/Eclipse-Output/output/clusteredPoints");
> >
> >        SequenceFile.Reader reader = new
> SequenceFile.Reader(fs,path,conf);
> >        IntWritable key = new IntWritable();
> >        WeightedVectorWritable value = new WeightedVectorWritable();
> >        while(reader.next(key,value))
> >        {
> >         System.out.print(value.toString() +" is in cluster " +
> > key.toString() );
> >        }
> >        System.out.println();
> >
> > But I am getting the following error:
> >
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> >
> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> >
> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 1
> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 2
> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 3
> > 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 4
> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 5
> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 6
> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 7
> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 8
> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 9
> > 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 10
> > 12/01/06 18:47:47 INFO clustering.ClusterDumper: Wrote 10 clusters
> > Exception in thread "main" java.io.FileNotFoundException:
> > /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
> >    at java.io.FileInputStream.open(Native Method)
> >    at java.io.FileInputStream.<init>(FileInputStream.java:137)
> >    at
> >
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
> >    at
> >
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
> >    at
> > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> >    at
> >
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> >    at
> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> >    at
> > org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
> >    at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
> >    at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> >    at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> >    at org.apache.mahout.clustering.Test.main(Test.java:46)
> >
> > Any suggestions?
> >
> > On Wed, Dec 28, 2011 at 12:25 AM, petar.mitrovic <
> petarmitrovic@gmail.com>wrote:
> >
> >> Hi Abin,
> >>
> >> Thank you very much! Your suggestion helped me a lot.
> >>
> >> First, I've set named vector parameter (-nv) to Mahout's vector
> generation
> >> process (seq2sparse) in order to write more descriptive vectors.
> >>
> >> Later, I could use something like this:
> >>
> >> IntWritable key= new IntWritable();
> >> WeightedVectorWritable vector = new WeightedVectorWritable();
> >> while (reader.next(key, vector)) {
> >>        NamedVector nv = (NamedVector) vector.getVector();
> >>        System.out.println(nv.getName() + " belongs to cluster " +
> >> key.toString());
> >> }
> >>
> >> Hope this can be useful for someone else, too.
> >>
> >> Regards,
> >> Petar
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/How-to-determine-which-cluster-an-item-belongs-to-tp3613013p3615979.html
> >> Sent from the Mahout User List mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > Praneet Mhatre
> > Graduate Student
> > Donald Bren School of ICS
> > University of California, Irvine
>



-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine

Re: How to determine which cluster an item belongs to

Posted by Sean Owen <sr...@gmail.com>.

The error is right there:

Exception in thread "main" java.io.FileNotFoundException:
/home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)

You are passing a directory, not a file.
Look at the class SequenceFileDirIterable for an easy way to iterate
over all files in a directory as key-value pairs.

On Sat, Jan 7, 2012 at 3:01 AM, praneet mhatre <pr...@gmail.com> wrote:
> Hi Abin and Petar,
>
> I tried the above approach with Dirichlet clustering. I am using the
> following code snippet after clustering is completed.
>
>        Configuration conf = new Configuration();
>        FileSystem fs = FileSystem.get(conf);
>        Path path = new
> Path("/home/praneet/Eclipse-Output/output/clusteredPoints");
>
>        SequenceFile.Reader reader = new SequenceFile.Reader(fs,path,conf);
>        IntWritable key = new IntWritable();
>        WeightedVectorWritable value = new WeightedVectorWritable();
>        while(reader.next(key,value))
>        {
>         System.out.print(value.toString() +" is in cluster " +
> key.toString() );
>        }
>        System.out.println();
>
> But I am getting the following error:
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/praneet/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 1
> 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 2
> 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 3
> 12/01/06 18:47:45 INFO dirichlet.DirichletDriver: Iteration 4
> 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 5
> 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 6
> 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 7
> 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 8
> 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 9
> 12/01/06 18:47:46 INFO dirichlet.DirichletDriver: Iteration 10
> 12/01/06 18:47:47 INFO clustering.ClusterDumper: Wrote 10 clusters
> Exception in thread "main" java.io.FileNotFoundException:
> /home/praneet/Eclipse-Output/output/clusteredPoints (Is a directory)
>    at java.io.FileInputStream.open(Native Method)
>    at java.io.FileInputStream.<init>(FileInputStream.java:137)
>    at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:70)
>    at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:106)
>    at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
>    at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>    at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>    at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>    at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>    at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>    at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>    at org.apache.mahout.clustering.Test.main(Test.java:46)
>
> Any suggestions?
>
> On Wed, Dec 28, 2011 at 12:25 AM, petar.mitrovic <pe...@gmail.com>wrote:
>
>> Hi Abin,
>>
>> Thank you very much! Your suggestion helped me a lot.
>>
>> First, I've set named vector parameter (-nv) to Mahout's vector generation
>> process (seq2sparse) in order to write more descriptive vectors.
>>
>> Later, I could use something like this:
>>
>> IntWritable key= new IntWritable();
>> WeightedVectorWritable vector = new WeightedVectorWritable();
>> while (reader.next(key, vector)) {
>>        NamedVector nv = (NamedVector) vector.getVector();
>>        System.out.println(nv.getName() + " belongs to cluster " +
>> key.toString());
>> }
>>
>> Hope this can be useful for someone else, too.
>>
>> Regards,
>> Petar
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/How-to-determine-which-cluster-an-item-belongs-to-tp3613013p3615979.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>
>
>
> --
> Praneet Mhatre
> Graduate Student
> Donald Bren School of ICS
> University of California, Irvine