You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by qiaoresearcher <qi...@gmail.com> on 2013/01/17 22:23:46 UTC
question about machine learning on Hive
How to run machine learning algorithms (whatever ML algorithms) directly in
Hive? assume the input and output already stored as Hive tables.
ps: I know mahout is available there, but would prefer run machine learning
algorithms directly in Hive
many thanks,
Re: question about machine learning on Hive
Posted by Robin Morris <rd...@baynote.com>.
In a similar way, ML algorithms can be put into a Hive UDAF. I'm working on this at the moment, and it's proved quite straightforward to integrate liblinear into a UDAF. As Igor notes, by setting the number of reducers, you can set the number of parallel learners.
Robin
www.baynote.com
From: Igor Tatarinov <ig...@decide.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, January 17, 2013 1:29 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: question about machine learning on Hive
Here is how Twitter does it with Pig:
http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
We use a similar approach and I think that Pig, being somewhat lower-level with better support of nested objects, is a better tool than Hive. It should be possible to do something similar with Hive but we haven't tried. The trick is to implement the learner as a serializer. Then, the number of reducers will determine how many parallel learners (bags) you can run.
igor
decide.com<http://decide.com>
On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher <qi...@gmail.com>> wrote:
How to run machine learning algorithms (whatever ML algorithms) directly in Hive? assume the input and output already stored as Hive tables.
ps: I know mahout is available there, but would prefer run machine learning algorithms directly in Hive
many thanks,
Re: question about machine learning on Hive
Posted by Igor Tatarinov <ig...@decide.com>.
Here is how Twitter does it with Pig:
http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
We use a similar approach and I think that Pig, being somewhat lower-level
with better support of nested objects, is a better tool than Hive. It
should be possible to do something similar with Hive but we haven't tried.
The trick is to implement the learner as a serializer. Then, the number of
reducers will determine how many parallel learners (bags) you can run.
igor
decide.com
On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher <qi...@gmail.com>wrote:
>
> How to run machine learning algorithms (whatever ML algorithms) directly
> in Hive? assume the input and output already stored as Hive tables.
>
> ps: I know mahout is available there, but would prefer run machine
> learning algorithms directly in Hive
>
> many thanks,
>
>
>