You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by David Hall <dl...@cs.stanford.edu> on 2009/06/19 10:00:31 UTC

Re: [jira] Updated: (MAHOUT-126) Prepare document vectors from the text

Ignore this. Wrong issue.

On Fri, Jun 19, 2009 at 12:59 AM, David Hall (JIRA)<ji...@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> David Hall updated MAHOUT-126:
> ------------------------------
>
>    Attachment: MAHOUT-123.patch
>
> Ok, I'm going to call this a mostly functional patch.
>
>> Prepare document vectors from the text
>> --------------------------------------
>>
>>                 Key: MAHOUT-126
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-126
>>             Project: Mahout
>>          Issue Type: New Feature
>>    Affects Versions: 0.2
>>            Reporter: Shashikant Kore
>>            Assignee: Grant Ingersoll
>>             Fix For: 0.2
>>
>>         Attachments: mahout-126-benson.patch, MAHOUT-126-no-normalization.patch, MAHOUT-126-no-normalization.patch, MAHOUT-126-null-entry.patch, MAHOUT-126-TF.patch, MAHOUT-126.patch, MAHOUT-126.patch, MAHOUT-126.patch, MAHOUT-126.patch
>>
>>
>> Clustering algorithms presently take the document vectors as input.  Generating these document vectors from the text can be broken in two tasks.
>> 1. Create lucene index of the input  plain-text documents
>> 2. From the index, generate the document vectors (sparse) with weights as TF-IDF values of the term. With lucene index, this value can be calculated very easily.
>> Presently, I have created two separate utilities, which could possibly be invoked from another class.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>