You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by sarath pr <sa...@gmail.com> on 2011/04/03 10:27:54 UTC
Clustering Question
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, new
Path(inputDir,"documents.seq"),Text.class, Text.class);
for(int i=0;i<s.length;i++)
{
writer.append(new Text(s[i][0]), new Text(s[i][1]));
}
writer.close();
Here Text(s[i][0]) is a string value, which is the ID of a news
article and Text(s[i][1]) is the news article text . I have clustered
some 100+ news articles like this and i get the output in
clusteredPoints/part-m-00000. My question is that is it possible to
extract the article ID (ie Texts[i][0]), which i had appended) and
corresponding cluster id from the part-m-00000 file.
Anyone knows ???
--
Thank You..!!
Sarath Ramachandran
sarath.amrita@gmail.com
+919995024287
Re: Clustering Question
Posted by sarath pr <sa...@gmail.com>.
I am using Netbeans IDE.
I use CanopyDriver.run to create initial clusters and KmeansDriver.run
for clustering news articles.
On 4/6/11, Grant Ingersoll <gr...@gmail.com> wrote:
> What commands are you running to do the actual clustering?
>
>
> On Apr 3, 2011, at 4:27 AM, sarath pr wrote:
>
>> SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, new
>> Path(inputDir,"documents.seq"),Text.class, Text.class);
>>
>> for(int i=0;i<s.length;i++)
>> {
>>
>> writer.append(new Text(s[i][0]), new Text(s[i][1]));
>> }
>> writer.close();
>>
>> Here Text(s[i][0]) is a string value, which is the ID of a news
>> article and Text(s[i][1]) is the news article text . I have clustered
>> some 100+ news articles like this and i get the output in
>> clusteredPoints/part-m-00000. My question is that is it possible to
>> extract the article ID (ie Texts[i][0]), which i had appended) and
>> corresponding cluster id from the part-m-00000 file.
>>
>> Anyone knows ???
>>
>> --
>> Thank You..!!
>> Sarath Ramachandran
>> sarath.amrita@gmail.com
>> +919995024287
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
>
--
Sent from my mobile device
Thank You..!!
Sarath Ramachandran
sarath.amrita@gmail.com
+919995024287
Re: Clustering Question
Posted by Grant Ingersoll <gr...@gmail.com>.
What commands are you running to do the actual clustering?
On Apr 3, 2011, at 4:27 AM, sarath pr wrote:
> SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, new
> Path(inputDir,"documents.seq"),Text.class, Text.class);
>
> for(int i=0;i<s.length;i++)
> {
>
> writer.append(new Text(s[i][0]), new Text(s[i][1]));
> }
> writer.close();
>
> Here Text(s[i][0]) is a string value, which is the ID of a news
> article and Text(s[i][1]) is the news article text . I have clustered
> some 100+ news articles like this and i get the output in
> clusteredPoints/part-m-00000. My question is that is it possible to
> extract the article ID (ie Texts[i][0]), which i had appended) and
> corresponding cluster id from the part-m-00000 file.
>
> Anyone knows ???
>
> --
> Thank You..!!
> Sarath Ramachandran
> sarath.amrita@gmail.com
> +919995024287
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/