You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by sarath pr <sa...@gmail.com> on 2011/04/03 10:27:54 UTC

Clustering Question

SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, new
Path(inputDir,"documents.seq"),Text.class, Text.class);

     for(int i=0;i<s.length;i++)
        {

             writer.append(new Text(s[i][0]), new Text(s[i][1]));
         }
      writer.close();

Here Text(s[i][0]) is a string value, which is the ID of a news
article and Text(s[i][1]) is the news article text . I have clustered
some 100+ news articles like this and i get the output in
clusteredPoints/part-m-00000. My question is that is it possible to
extract the article ID (ie Texts[i][0]), which i had appended) and
corresponding cluster id from the part-m-00000 file.

Anyone knows ???

-- 
Thank You..!!
Sarath Ramachandran
sarath.amrita@gmail.com
+919995024287

Re: Clustering Question

Posted by sarath pr <sa...@gmail.com>.

I am using Netbeans IDE.
I use CanopyDriver.run to create initial clusters and KmeansDriver.run
for clustering news articles.

On 4/6/11, Grant Ingersoll <gr...@gmail.com> wrote:
> What commands are you running to do the actual clustering?
>
>
> On Apr 3, 2011, at 4:27 AM, sarath pr wrote:
>
>> SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, new
>> Path(inputDir,"documents.seq"),Text.class, Text.class);
>>
>>     for(int i=0;i<s.length;i++)
>>        {
>>
>>             writer.append(new Text(s[i][0]), new Text(s[i][1]));
>>         }
>>      writer.close();
>>
>> Here Text(s[i][0]) is a string value, which is the ID of a news
>> article and Text(s[i][1]) is the news article text . I have clustered
>> some 100+ news articles like this and i get the output in
>> clusteredPoints/part-m-00000. My question is that is it possible to
>> extract the article ID (ie Texts[i][0]), which i had appended) and
>> corresponding cluster id from the part-m-00000 file.
>>
>> Anyone knows ???
>>
>> --
>> Thank You..!!
>> Sarath Ramachandran
>> sarath.amrita@gmail.com
>> +919995024287
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
>

-- 
Sent from my mobile device

Thank You..!!
Sarath Ramachandran
sarath.amrita@gmail.com
+919995024287

Re: Clustering Question

Posted by Grant Ingersoll <gr...@gmail.com>.

What commands are you running to do the actual clustering?


On Apr 3, 2011, at 4:27 AM, sarath pr wrote:

> SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, new
> Path(inputDir,"documents.seq"),Text.class, Text.class);
> 
>     for(int i=0;i<s.length;i++)
>        {
> 
>             writer.append(new Text(s[i][0]), new Text(s[i][1]));
>         }
>      writer.close();
> 
> Here Text(s[i][0]) is a string value, which is the ID of a news
> article and Text(s[i][1]) is the news article text . I have clustered
> some 100+ news articles like this and i get the output in
> clusteredPoints/part-m-00000. My question is that is it possible to
> extract the article ID (ie Texts[i][0]), which i had appended) and
> corresponding cluster id from the part-m-00000 file.
> 
> Anyone knows ???
> 
> -- 
> Thank You..!!
> Sarath Ramachandran
> sarath.amrita@gmail.com
> +919995024287

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/