You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Eric Friedman <er...@spottedsnake.net> on 2012/08/01 01:05:40 UTC

Re: non-text NB classifiers?

Can you point me to the class I should look at to see how this is done?

On Tue, Jul 31, 2012 at 10:49 AM, Robin Anil <ro...@gmail.com> wrote:
> You can pass in any vector(not just a tfidf vector). For example the
> asf-email example script using Vectors generated using the randomized
> encoding.
> ------
> Robin Anil
>
>
> On Tue, Jul 31, 2012 at 12:26 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> I don't know this code too much, but, there is simply a step in front
>> I believe that vectorizes text with TF-IDF. The result are simple
>> vectors. You could just inject your vectors (i.e. real-value
>> attributes) at that stage and skip the TF-IDF. It may need a little
>> hacking.
>>
>> On Tue, Jul 31, 2012 at 6:21 PM, Eric Friedman <er...@spottedsnake.net>
>> wrote:
>> > All of the examples that I've found for training NB classifiers seem
>> > to have textual data as input.  Is there a way to build a classifier
>> > with more general attributes?
>> >
>> > I found this jira ticket
>> > (https://issues.apache.org/jira/browse/MAHOUT-286), but it's been
>> > closed:duplicate under
>> > https://issues.apache.org/jira/browse/MAHOUT-155, which doesn't seem
>> > to address the underlying question.
>> >
>> > I know that I can do this with weka, but not at scale -- is mahout
>> > only able to build textual classifiers?
>> >
>> > Thanks,
>> > Eric
>>

Re: non-text NB classifiers?

Posted by Robin Anil <ro...@gmail.com>.
its EncodedVectorsFromSequenceFiles.java I believe
------
Robin Anil


On Tue, Jul 31, 2012 at 6:05 PM, Eric Friedman <er...@spottedsnake.net>wrote:

> Can you point me to the class I should look at to see how this is done?
>
> On Tue, Jul 31, 2012 at 10:49 AM, Robin Anil <ro...@gmail.com> wrote:
> > You can pass in any vector(not just a tfidf vector). For example the
> > asf-email example script using Vectors generated using the randomized
> > encoding.
> > ------
> > Robin Anil
> >
> >
> > On Tue, Jul 31, 2012 at 12:26 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> I don't know this code too much, but, there is simply a step in front
> >> I believe that vectorizes text with TF-IDF. The result are simple
> >> vectors. You could just inject your vectors (i.e. real-value
> >> attributes) at that stage and skip the TF-IDF. It may need a little
> >> hacking.
> >>
> >> On Tue, Jul 31, 2012 at 6:21 PM, Eric Friedman <er...@spottedsnake.net>
> >> wrote:
> >> > All of the examples that I've found for training NB classifiers seem
> >> > to have textual data as input.  Is there a way to build a classifier
> >> > with more general attributes?
> >> >
> >> > I found this jira ticket
> >> > (https://issues.apache.org/jira/browse/MAHOUT-286), but it's been
> >> > closed:duplicate under
> >> > https://issues.apache.org/jira/browse/MAHOUT-155, which doesn't seem
> >> > to address the underlying question.
> >> >
> >> > I know that I can do this with weka, but not at scale -- is mahout
> >> > only able to build textual classifiers?
> >> >
> >> > Thanks,
> >> > Eric
> >>
>