You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Tharindu Mathew <mc...@gmail.com> on 2011/11/02 09:49:19 UTC

Embedding mahout in a java app

Hi,

Is there an API that is available to easily embed Mahout in a java app,
feed data and get output?

PS: Forgive me if this is a noob question. Still trying to figure out
Mahout.
-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Embedding mahout in a java app

Posted by Sean Owen <sr...@gmail.com>.
The wiki has examples of calling most of the code via Java, and javadoc
ought to cover the rest. What are you looking for specifically? Mahout is
not one thing. All of it is callable from Java.
On Nov 2, 2011 9:21 AM, "Tharindu Mathew" <mc...@gmail.com> wrote:

> Hi Sean,
>
> I guess with a proper API it just makes it easier. I was hoping you'd point
> me to a code sample or a tutorial.
>
> I only could find everything referring to quick starts which tell how to
> run a sample, such as
>
> https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
>
> I was looking for something like
> http://hadoop.apache.org/common/docs/r0.17.1/mapred_tutorial.html from the
> Hadoop folks.
>
> Obviously, I can dive into the code base, but I thought I was missing
> something as I was unable to find a way to hack up my own solution as yet.
> (i.e. load data, run an algo, get results)
>
> On Wed, Nov 2, 2011 at 2:24 PM, Sean Owen <sr...@gmail.com> wrote:
>
> > Mahout is written in Java, so 'yes' you can put it in any Java program
> > trivially. Why would it have anything to do with an API? I think you need
> > to be clearer about what you are doing, and probably first have a basic
> > look at the project.
> > On Nov 2, 2011 8:49 AM, "Tharindu Mathew" <mc...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Is there an API that is available to easily embed Mahout in a java app,
> > > feed data and get output?
> > >
> > > PS: Forgive me if this is a noob question. Still trying to figure out
> > > Mahout.
> > > --
> > > Regards,
> > >
> > > Tharindu
> > >
> > > blog: http://mackiemathew.com/
> > >
> >
>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>

Re: Embedding mahout in a java app

Posted by Lance Norskog <go...@gmail.com>.
Cool!

On Thu, Nov 3, 2011 at 7:47 AM, siem vaessen <si...@gmail.com> wrote:

> On Nov 2, 2011 12:17 PM, "Tharindu Mathew" <mc...@gmail.com> wrote:
> >
> > I want to create a java UI tool (based on a web app) that can pick and
> > apply different algorithms available in Mahout to different data sets.
> >
>
> we have developed an administrator dashboard which does just that, or to
> more precise a Web-gui to import collections, create specific templates and
> recommenders which are based on those templates and some core statistics on
> the recommenders created in the dashboard.
>
> We will be releasing a REST interface <3 months from now back to this
> community under a similar license. We are preparing a workflow to release
> this and the proper documentation as well. I will keep this group posted
> once we are close to releasing it.
>
> kind regards,
>
> siem
>
> > Hence the embedding with java. Obviously, I understand that everything is
> > callable from Java since it's written in Java :).
> >
> > For example, I want to do a apply a classification (ex: Bayesian)
> algorithm,
> > and train on a data set stored in Cassandra. I don't expect a sample for
> > Cassandra but at least a code sample that operates on a data set stored
> csv
> > file that applies an algorithm like Bayesian.
> >
> > I'd appreciate if you can point me to any code sample for this or
> something
> > similar?
> >
> > On Wed, Nov 2, 2011 at 3:32 PM, JAGANADH G <ja...@gmail.com> wrote:
> >
> > > On Wed, Nov 2, 2011 at 2:51 PM, Tharindu Mathew <mc...@gmail.com>
> > > wrote:
> > >
> > > > Hi Sean,
> > > >
> > > > I guess with a proper API it just makes it easier. I was hoping you'd
> > > point
> > > > me to a code sample or a tutorial.
> > > >
> > >
> > >
> > >
> > > Hi
> > >
> > > For detailed code samples and tutorials see the book "Mahout in
> Action".
> > > You will get a clear insight on how to use Mahout (in java in your case
> > > !!!!)
> > > --
> > > **********************************
> > > JAGANADH G
> > > http://jaganadhg.freeflux.net/blog
> > > *ILUGCBE*
> > > http://ilugcbe.psgkriya.org
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Tharindu
> >
> > blog: http://mackiemathew.com/
>



-- 
Lance Norskog
goksron@gmail.com

Re: Embedding mahout in a java app

Posted by siem vaessen <si...@gmail.com>.
On Nov 2, 2011 12:17 PM, "Tharindu Mathew" <mc...@gmail.com> wrote:
>
> I want to create a java UI tool (based on a web app) that can pick and
> apply different algorithms available in Mahout to different data sets.
>

we have developed an administrator dashboard which does just that, or to
more precise a Web-gui to import collections, create specific templates and
recommenders which are based on those templates and some core statistics on
the recommenders created in the dashboard.

We will be releasing a REST interface <3 months from now back to this
community under a similar license. We are preparing a workflow to release
this and the proper documentation as well. I will keep this group posted
once we are close to releasing it.

kind regards,

siem

> Hence the embedding with java. Obviously, I understand that everything is
> callable from Java since it's written in Java :).
>
> For example, I want to do a apply a classification (ex: Bayesian)
algorithm,
> and train on a data set stored in Cassandra. I don't expect a sample for
> Cassandra but at least a code sample that operates on a data set stored
csv
> file that applies an algorithm like Bayesian.
>
> I'd appreciate if you can point me to any code sample for this or
something
> similar?
>
> On Wed, Nov 2, 2011 at 3:32 PM, JAGANADH G <ja...@gmail.com> wrote:
>
> > On Wed, Nov 2, 2011 at 2:51 PM, Tharindu Mathew <mc...@gmail.com>
> > wrote:
> >
> > > Hi Sean,
> > >
> > > I guess with a proper API it just makes it easier. I was hoping you'd
> > point
> > > me to a code sample or a tutorial.
> > >
> >
> >
> >
> > Hi
> >
> > For detailed code samples and tutorials see the book "Mahout in Action".
> > You will get a clear insight on how to use Mahout (in java in your case
> > !!!!)
> > --
> > **********************************
> > JAGANADH G
> > http://jaganadhg.freeflux.net/blog
> > *ILUGCBE*
> > http://ilugcbe.psgkriya.org
> >
>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/

Re: Embedding mahout in a java app

Posted by Sean Owen <sr...@gmail.com>.
MahoutDriver is the closest thing to a single point of entry for all the
algorithms. It's for command line use but you can see what it does after
parsing args.

In general, most algorithms use Hadoop, so in general no there is not a
Hadoop free mode. Some bits have non Hadoop parts though that's the
exception.

Hadoop local mode works pretty fine, though it takes some work to package
it up for standalone use.
 On Nov 2, 2011 11:35 AM, "Tharindu Mathew" <mc...@gmail.com> wrote:

> Thanks Sean.
>
> Looks like I'll have to dig into the code will start from MahoutDriver.
>
> Is there a mode that will work for all algorithms. For example, all
> algorithms can run on a single node mode or all algorithms run on a hadoop
> mode ( I know Hadoop has a local mode, but that's not what I'm referring
> to) or something similar?
>
> I'd like to support the tool to run even without Hadoop as that will be
> great for small data sets for someone to try out and play around with.
> Maybe there's another java library that already does this.
>
> On Wed, Nov 2, 2011 at 4:51 PM, Sean Owen <sr...@gmail.com> wrote:
>
> > I see, the Java interfaces vary from area to area since different
> > algos are different things and sometimes take different input.
> >
> > Generally, the classifiers take in Mahout Vector input, and are
> > Hadoop-based, so you'd be writing some code to run Mahout jobs on
> > Hadoop from your GUI app. Not all are like this though.
> >
> > I don't think there's a one-stop easy interface already ready for you
> > here, no. You'd have to stitch together different parts of the code
> > and do some input transformation and Hadoop integration, I imagine.
> >
> > On Wed, Nov 2, 2011 at 11:17 AM, Tharindu Mathew <mc...@gmail.com>
> > wrote:
> > > I want to create a java UI tool (based on a web app) that can pick and
> > > apply different algorithms available in Mahout to different data sets.
> > >
> > > Hence the embedding with java. Obviously, I understand that everything
> is
> > > callable from Java since it's written in Java :).
> > >
> > > For example, I want to do a apply a classification (ex: Bayesian)
> > algorithm,
> > > and train on a data set stored in Cassandra. I don't expect a sample
> for
> > > Cassandra but at least a code sample that operates on a data set stored
> > csv
> > > file that applies an algorithm like Bayesian.
> > >
> > > I'd appreciate if you can point me to any code sample for this or
> > something
> > > similar?
> > >
> > > On Wed, Nov 2, 2011 at 3:32 PM, JAGANADH G <ja...@gmail.com>
> wrote:
> > >
> > >> On Wed, Nov 2, 2011 at 2:51 PM, Tharindu Mathew <mc...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Sean,
> > >> >
> > >> > I guess with a proper API it just makes it easier. I was hoping
> you'd
> > >> point
> > >> > me to a code sample or a tutorial.
> > >> >
> > >>
> > >>
> > >>
> > >> Hi
> > >>
> > >> For detailed code samples and tutorials see the book "Mahout in
> Action".
> > >> You will get a clear insight on how to use Mahout (in java in your
> case
> > >> !!!!)
> > >> --
> > >> **********************************
> > >> JAGANADH G
> > >> http://jaganadhg.freeflux.net/blog
> > >> *ILUGCBE*
> > >> http://ilugcbe.psgkriya.org
> > >>
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Tharindu
> > >
> > > blog: http://mackiemathew.com/
> > >
> >
>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>

Re: Embedding mahout in a java app

Posted by Tharindu Mathew <mc...@gmail.com>.
Thanks Sean.

Looks like I'll have to dig into the code will start from MahoutDriver.

Is there a mode that will work for all algorithms. For example, all
algorithms can run on a single node mode or all algorithms run on a hadoop
mode ( I know Hadoop has a local mode, but that's not what I'm referring
to) or something similar?

I'd like to support the tool to run even without Hadoop as that will be
great for small data sets for someone to try out and play around with.
Maybe there's another java library that already does this.

On Wed, Nov 2, 2011 at 4:51 PM, Sean Owen <sr...@gmail.com> wrote:

> I see, the Java interfaces vary from area to area since different
> algos are different things and sometimes take different input.
>
> Generally, the classifiers take in Mahout Vector input, and are
> Hadoop-based, so you'd be writing some code to run Mahout jobs on
> Hadoop from your GUI app. Not all are like this though.
>
> I don't think there's a one-stop easy interface already ready for you
> here, no. You'd have to stitch together different parts of the code
> and do some input transformation and Hadoop integration, I imagine.
>
> On Wed, Nov 2, 2011 at 11:17 AM, Tharindu Mathew <mc...@gmail.com>
> wrote:
> > I want to create a java UI tool (based on a web app) that can pick and
> > apply different algorithms available in Mahout to different data sets.
> >
> > Hence the embedding with java. Obviously, I understand that everything is
> > callable from Java since it's written in Java :).
> >
> > For example, I want to do a apply a classification (ex: Bayesian)
> algorithm,
> > and train on a data set stored in Cassandra. I don't expect a sample for
> > Cassandra but at least a code sample that operates on a data set stored
> csv
> > file that applies an algorithm like Bayesian.
> >
> > I'd appreciate if you can point me to any code sample for this or
> something
> > similar?
> >
> > On Wed, Nov 2, 2011 at 3:32 PM, JAGANADH G <ja...@gmail.com> wrote:
> >
> >> On Wed, Nov 2, 2011 at 2:51 PM, Tharindu Mathew <mc...@gmail.com>
> >> wrote:
> >>
> >> > Hi Sean,
> >> >
> >> > I guess with a proper API it just makes it easier. I was hoping you'd
> >> point
> >> > me to a code sample or a tutorial.
> >> >
> >>
> >>
> >>
> >> Hi
> >>
> >> For detailed code samples and tutorials see the book "Mahout in Action".
> >> You will get a clear insight on how to use Mahout (in java in your case
> >> !!!!)
> >> --
> >> **********************************
> >> JAGANADH G
> >> http://jaganadhg.freeflux.net/blog
> >> *ILUGCBE*
> >> http://ilugcbe.psgkriya.org
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Tharindu
> >
> > blog: http://mackiemathew.com/
> >
>



-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Embedding mahout in a java app

Posted by Sean Owen <sr...@gmail.com>.
I see, the Java interfaces vary from area to area since different
algos are different things and sometimes take different input.

Generally, the classifiers take in Mahout Vector input, and are
Hadoop-based, so you'd be writing some code to run Mahout jobs on
Hadoop from your GUI app. Not all are like this though.

I don't think there's a one-stop easy interface already ready for you
here, no. You'd have to stitch together different parts of the code
and do some input transformation and Hadoop integration, I imagine.

On Wed, Nov 2, 2011 at 11:17 AM, Tharindu Mathew <mc...@gmail.com> wrote:
> I want to create a java UI tool (based on a web app) that can pick and
> apply different algorithms available in Mahout to different data sets.
>
> Hence the embedding with java. Obviously, I understand that everything is
> callable from Java since it's written in Java :).
>
> For example, I want to do a apply a classification (ex: Bayesian) algorithm,
> and train on a data set stored in Cassandra. I don't expect a sample for
> Cassandra but at least a code sample that operates on a data set stored csv
> file that applies an algorithm like Bayesian.
>
> I'd appreciate if you can point me to any code sample for this or something
> similar?
>
> On Wed, Nov 2, 2011 at 3:32 PM, JAGANADH G <ja...@gmail.com> wrote:
>
>> On Wed, Nov 2, 2011 at 2:51 PM, Tharindu Mathew <mc...@gmail.com>
>> wrote:
>>
>> > Hi Sean,
>> >
>> > I guess with a proper API it just makes it easier. I was hoping you'd
>> point
>> > me to a code sample or a tutorial.
>> >
>>
>>
>>
>> Hi
>>
>> For detailed code samples and tutorials see the book "Mahout in Action".
>> You will get a clear insight on how to use Mahout (in java in your case
>> !!!!)
>> --
>> **********************************
>> JAGANADH G
>> http://jaganadhg.freeflux.net/blog
>> *ILUGCBE*
>> http://ilugcbe.psgkriya.org
>>
>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>

Re: Embedding mahout in a java app

Posted by Tharindu Mathew <mc...@gmail.com>.
Thanks everyone for the encouraging replies.

If it's possible I will work on and contribute a clean API that will ease
the learning curve of applying Mahout.

On Wed, Nov 2, 2011 at 9:40 PM, Matteo Moci <mo...@gmail.com> wrote:

> I just found this [1] project.
> It seems a bit old, and I don't know if it just works, but could work
> as inspiration maybe.
>
> [1] http://code.google.com/p/hadoop-ui/
>
> On Wed, Nov 2, 2011 at 3:53 PM, Grant Ingersoll <gs...@apache.org>
> wrote:
> >
> > On Nov 2, 2011, at 7:17 AM, Tharindu Mathew wrote:
> >
> >> I want to create a java UI tool (based on a web app) that can pick and
> >> apply different algorithms available in Mahout to different data sets.
> >
> > Very cool!  Keep us posted, as this would be immensely useful!  Any
> chance it will be donated back?  :-)
> >
> >>
> >> Hence the embedding with java. Obviously, I understand that everything
> is
> >> callable from Java since it's written in Java :).
> >>
> >> For example, I want to do a apply a classification (ex: Bayesian)
> algorithm,
> >> and train on a data set stored in Cassandra. I don't expect a sample for
> >> Cassandra but at least a code sample that operates on a data set stored
> csv
> >> file that applies an algorithm like Bayesian.
> >>
> >> I'd appreciate if you can point me to any code sample for this or
> something
> >> similar?
> >
> > As others have said, MahoutDriver is a common entry point and can run
> pretty much anything in Mahout that has a main().  You might also look in
> $MAHOUT_HOME/examples/bin at the various shell scripts we've put together
> that run different examples.  build-reuters, classify-20newsgroups and
> build-asf-email (all from trunk) demonstrate a fair amount of
> classification and clustering algorithms.  Finally, Unit tests are your
> friend.
> >
> > -Grant
>
>
>
> --
> Matteo Moci
> http://it.linkedin.com/in/matteomoci
> http://about.me/matteomoci/bio
>



-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Embedding mahout in a java app

Posted by Matteo Moci <mo...@gmail.com>.
I just found this [1] project.
It seems a bit old, and I don't know if it just works, but could work
as inspiration maybe.

[1] http://code.google.com/p/hadoop-ui/

On Wed, Nov 2, 2011 at 3:53 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Nov 2, 2011, at 7:17 AM, Tharindu Mathew wrote:
>
>> I want to create a java UI tool (based on a web app) that can pick and
>> apply different algorithms available in Mahout to different data sets.
>
> Very cool!  Keep us posted, as this would be immensely useful!  Any chance it will be donated back?  :-)
>
>>
>> Hence the embedding with java. Obviously, I understand that everything is
>> callable from Java since it's written in Java :).
>>
>> For example, I want to do a apply a classification (ex: Bayesian) algorithm,
>> and train on a data set stored in Cassandra. I don't expect a sample for
>> Cassandra but at least a code sample that operates on a data set stored csv
>> file that applies an algorithm like Bayesian.
>>
>> I'd appreciate if you can point me to any code sample for this or something
>> similar?
>
> As others have said, MahoutDriver is a common entry point and can run pretty much anything in Mahout that has a main().  You might also look in $MAHOUT_HOME/examples/bin at the various shell scripts we've put together that run different examples.  build-reuters, classify-20newsgroups and build-asf-email (all from trunk) demonstrate a fair amount of classification and clustering algorithms.  Finally, Unit tests are your friend.
>
> -Grant



-- 
Matteo Moci
http://it.linkedin.com/in/matteomoci
http://about.me/matteomoci/bio

Re: Embedding mahout in a java app

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 2, 2011, at 7:17 AM, Tharindu Mathew wrote:

> I want to create a java UI tool (based on a web app) that can pick and
> apply different algorithms available in Mahout to different data sets.

Very cool!  Keep us posted, as this would be immensely useful!  Any chance it will be donated back?  :-)

> 
> Hence the embedding with java. Obviously, I understand that everything is
> callable from Java since it's written in Java :).
> 
> For example, I want to do a apply a classification (ex: Bayesian) algorithm,
> and train on a data set stored in Cassandra. I don't expect a sample for
> Cassandra but at least a code sample that operates on a data set stored csv
> file that applies an algorithm like Bayesian.
> 
> I'd appreciate if you can point me to any code sample for this or something
> similar?

As others have said, MahoutDriver is a common entry point and can run pretty much anything in Mahout that has a main().  You might also look in $MAHOUT_HOME/examples/bin at the various shell scripts we've put together that run different examples.  build-reuters, classify-20newsgroups and build-asf-email (all from trunk) demonstrate a fair amount of classification and clustering algorithms.  Finally, Unit tests are your friend.

-Grant

Re: Embedding mahout in a java app

Posted by Tharindu Mathew <mc...@gmail.com>.
I want to create a java UI tool (based on a web app) that can pick and
apply different algorithms available in Mahout to different data sets.

Hence the embedding with java. Obviously, I understand that everything is
callable from Java since it's written in Java :).

For example, I want to do a apply a classification (ex: Bayesian) algorithm,
and train on a data set stored in Cassandra. I don't expect a sample for
Cassandra but at least a code sample that operates on a data set stored csv
file that applies an algorithm like Bayesian.

I'd appreciate if you can point me to any code sample for this or something
similar?

On Wed, Nov 2, 2011 at 3:32 PM, JAGANADH G <ja...@gmail.com> wrote:

> On Wed, Nov 2, 2011 at 2:51 PM, Tharindu Mathew <mc...@gmail.com>
> wrote:
>
> > Hi Sean,
> >
> > I guess with a proper API it just makes it easier. I was hoping you'd
> point
> > me to a code sample or a tutorial.
> >
>
>
>
> Hi
>
> For detailed code samples and tutorials see the book "Mahout in Action".
> You will get a clear insight on how to use Mahout (in java in your case
> !!!!)
> --
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog
> *ILUGCBE*
> http://ilugcbe.psgkriya.org
>



-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Embedding mahout in a java app

Posted by JAGANADH G <ja...@gmail.com>.
On Wed, Nov 2, 2011 at 2:51 PM, Tharindu Mathew <mc...@gmail.com> wrote:

> Hi Sean,
>
> I guess with a proper API it just makes it easier. I was hoping you'd point
> me to a code sample or a tutorial.
>



Hi

For detailed code samples and tutorials see the book "Mahout in Action".
You will get a clear insight on how to use Mahout (in java in your case
!!!!)
-- 
**********************************
JAGANADH G
http://jaganadhg.freeflux.net/blog
*ILUGCBE*
http://ilugcbe.psgkriya.org

Re: Embedding mahout in a java app

Posted by Tharindu Mathew <mc...@gmail.com>.
Hi Sean,

I guess with a proper API it just makes it easier. I was hoping you'd point
me to a code sample or a tutorial.

I only could find everything referring to quick starts which tell how to
run a sample, such as
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data

I was looking for something like
http://hadoop.apache.org/common/docs/r0.17.1/mapred_tutorial.html from the
Hadoop folks.

Obviously, I can dive into the code base, but I thought I was missing
something as I was unable to find a way to hack up my own solution as yet.
(i.e. load data, run an algo, get results)

On Wed, Nov 2, 2011 at 2:24 PM, Sean Owen <sr...@gmail.com> wrote:

> Mahout is written in Java, so 'yes' you can put it in any Java program
> trivially. Why would it have anything to do with an API? I think you need
> to be clearer about what you are doing, and probably first have a basic
> look at the project.
> On Nov 2, 2011 8:49 AM, "Tharindu Mathew" <mc...@gmail.com> wrote:
>
> > Hi,
> >
> > Is there an API that is available to easily embed Mahout in a java app,
> > feed data and get output?
> >
> > PS: Forgive me if this is a noob question. Still trying to figure out
> > Mahout.
> > --
> > Regards,
> >
> > Tharindu
> >
> > blog: http://mackiemathew.com/
> >
>



-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Embedding mahout in a java app

Posted by Sean Owen <sr...@gmail.com>.
Mahout is written in Java, so 'yes' you can put it in any Java program
trivially. Why would it have anything to do with an API? I think you need
to be clearer about what you are doing, and probably first have a basic
look at the project.
On Nov 2, 2011 8:49 AM, "Tharindu Mathew" <mc...@gmail.com> wrote:

> Hi,
>
> Is there an API that is available to easily embed Mahout in a java app,
> feed data and get output?
>
> PS: Forgive me if this is a noob question. Still trying to figure out
> Mahout.
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>