You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2011/03/25 10:17:32 UTC

Does anyone use HBase with Mahout?

An idea came up to potentially remove the direct connection with
HBase, but, we'd keep it if it's of any use to anyone out there.
Mostly polling to see if anyone's using the HBase-related code in
Bayes classifiers?

Re: Does anyone use HBase with Mahout?

Posted by Grant Ingersoll <gs...@apache.org>.
On Mar 26, 2011, at 9:43 AM, Robin Anil wrote:

> Yes, making it optional is the right way to go. We should be using a
> connector like GORA. Seems to be in incubator stage, does that mean we have
> to wait till it graduates to use in active projects? (So that project is
> guaranteed to have active support)

No, we don't have to wait.

> 
> On Sat, Mar 26, 2011 at 2:31 AM, Ted Dunning <te...@gmail.com> wrote:
> 
>> Yes.
>> 
>> And that sounds good to me.  Much like deleting math code we don't use.
>> 
>> On Fri, Mar 25, 2011 at 1:57 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>> 
>>> Perhaps we should make it optional dependency at least. That's maven
>>> practice if we want to use API but not force actual version.
>>> 
>>> If we nuke it completely, we'd have to also decommission all the code
>> that
>>> depends on it as well.
>>> 
>> 


Re: Does anyone use HBase with Mahout?

Posted by Robin Anil <ro...@gmail.com>.
Yes, making it optional is the right way to go. We should be using a
connector like GORA. Seems to be in incubator stage, does that mean we have
to wait till it graduates to use in active projects? (So that project is
guaranteed to have active support)

On Sat, Mar 26, 2011 at 2:31 AM, Ted Dunning <te...@gmail.com> wrote:

> Yes.
>
> And that sounds good to me.  Much like deleting math code we don't use.
>
> On Fri, Mar 25, 2011 at 1:57 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > Perhaps we should make it optional dependency at least. That's maven
> > practice if we want to use API but not force actual version.
> >
> > If we nuke it completely, we'd have to also decommission all the code
> that
> > depends on it as well.
> >
>

Re: Does anyone use HBase with Mahout?

Posted by Ted Dunning <te...@gmail.com>.
Yes.

And that sounds good to me.  Much like deleting math code we don't use.

On Fri, Mar 25, 2011 at 1:57 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Perhaps we should make it optional dependency at least. That's maven
> practice if we want to use API but not force actual version.
>
> If we nuke it completely, we'd have to also decommission all the code that
> depends on it as well.
>

Re: Does anyone use HBase with Mahout?

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Perhaps we should make it optional dependency at least. That's maven
practice if we want to use API but not force actual version.

If we nuke it completely, we'd have to also decommission all the code that
depends on it as well.

apologies for brevity.

Sent from my android.
-Dmitriy
On Mar 25, 2011 1:31 PM, "Ted Dunning" <te...@gmail.com> wrote:
> Good example of how our inclusion of hbase causes problems even for those
> who use hbase.
>
> On Fri, Mar 25, 2011 at 12:05 PM, Michael Kurze <mk...@mozilla.com>
wrote:
>
>> Using hbase+Mahout myself, I still think it's not a good idea to tie
>> it in by default. I even had to exclude the library from my repackaged
>> assembly.
>>

Re: Does anyone use HBase with Mahout?

Posted by Ted Dunning <te...@gmail.com>.
Good example of how our inclusion of hbase causes problems even for those
who use hbase.

On Fri, Mar 25, 2011 at 12:05 PM, Michael Kurze <mk...@mozilla.com> wrote:

> Using hbase+Mahout myself, I still think it's not a good idea to tie
> it in by default. I even had to exclude the library from my repackaged
> assembly.
>

Re: Does anyone use HBase with Mahout?

Posted by Michael Kurze <mk...@mozilla.com>.
Using hbase+Mahout myself, I still think it's not a good idea to tie
it in by default. I even had to exclude the library from my repackaged 
assembly.

A generic interface seems like a better idea. Gora looks promising for
future releases, as it is designed with sequential access patterns in 
mind. Looking at gora's hadoop abstractions, a (non-CLI) programmer API
might still just be expressed in hadoop formats.

Ggora could then be used to keep mahout-interna stuff l (like that 
bayesian classifier) portable.

On Mar 25, 2011, at 11:26 AM, Sean Owen wrote:
> ...
> It seems infeasible to connect all m implementations to all n data
> sources. I imagine what we'd wish to do at best is a) leave that
> abstraction to Hadoop, to read from HBase or HDFS or a file system, or
> b) provide tools to copy data into HDFS from many sources.

> On Fri, Mar 25, 2011 at 9:45 AM, Dave Stuart
> <da...@progressivealliance.co.uk> wrote:
>> I was about to kick off a project using Hbase and Mahout so some form of connectivity JDBC or otherwise would be significantly beneficial


Re: Does anyone use HBase with Mahout?

Posted by Sean Owen <sr...@gmail.com>.
Is it for Bayesian clustering?

One of the issues here is that there isn't any particular connector to
anything but HDFS at the moment. Some pieces have some additional
connectivity to other data stores, like this one.

It seems infeasible to connect all m implementations to all n data
sources. I imagine what we'd wish to do at best is a) leave that
abstraction to Hadoop, to read from HBase or HDFS or a file system, or
b) provide tools to copy data into HDFS from many sources.

On Fri, Mar 25, 2011 at 9:45 AM, Dave Stuart
<da...@progressivealliance.co.uk> wrote:
> I was about to kick off a project using Hbase and Mahout so some form of connectivity JDBC or otherwise would be significantly beneficial

Re: Does anyone use HBase with Mahout?

Posted by Dave Stuart <da...@progressivealliance.co.uk>.
I was about to kick off a project using Hbase and Mahout so some form of connectivity JDBC or otherwise would be significantly beneficial 


On 25 Mar 2011, at 09:26, Lance Norskog wrote:

> What other connectors are available? This a JDBC connector for HBase.
> Apache-licensed, someone's personal project.
> http://www.hbql.com/
> https://github.com/pambrose/HBql/
> 
> Lance
> 
> On Fri, Mar 25, 2011 at 2:17 AM, Sean Owen <sr...@gmail.com> wrote:
>> An idea came up to potentially remove the direct connection with
>> HBase, but, we'd keep it if it's of any use to anyone out there.
>> Mostly polling to see if anyone's using the HBase-related code in
>> Bayes classifiers?
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com


Re: Does anyone use HBase with Mahout?

Posted by Ted Dunning <te...@gmail.com>.
The dependency in question is even more tight.

How many people are using Naive Bayes Classifier with Hbase 0.20.0 ?


(I thought not)


On Fri, Mar 25, 2011 at 4:26 AM, Dave Stuart <
david.stuart@progressivealliance.co.uk> wrote:

> Gora would work as I use that for Nutch 2.0 so it would be consistent
> across the products
>
>
> On 25 Mar 2011, at 11:08, Sean Owen wrote:
>
> > Indeed that sounds something much more like what's needed. That is in
> > itself a nice new JIRA, and perhaps for after 0.5.
> >
> > So the immediate question in my mind is, is it useful to 'clean up' in
> > anticipation of this by removing the direct dependency on HBase? While
> > I hear that people use HBase with Mahout, I'm not yet hearing anyone
> > uses it in the one way that's supported so far.
> >
> > On Fri, Mar 25, 2011 at 11:02 AM, Julien Nioche
> > <li...@gmail.com> wrote:
> >> What about using GORA?
> >> It is under Apache license and it provides a unified API to use various
> >> backends for mapreduce. We already have drivers for Hbase, Cassandra,
> SQL
> >> and the first release should be imminent.
> >> See http://incubator.apache.org/gora/ for more details
> >>
> >> Julien
> >>
> >> On 25 March 2011 09:26, Lance Norskog <go...@gmail.com> wrote:
> >>
> >>> What other connectors are available? This a JDBC connector for HBase.
> >>> Apache-licensed, someone's personal project.
> >>> http://www.hbql.com/
> >>> https://github.com/pambrose/HBql/
> >>>
> >>> Lance
> >>>
> >>> On Fri, Mar 25, 2011 at 2:17 AM, Sean Owen <sr...@gmail.com> wrote:
> >>>> An idea came up to potentially remove the direct connection with
> >>>> HBase, but, we'd keep it if it's of any use to anyone out there.
> >>>> Mostly polling to see if anyone's using the HBase-related code in
> >>>> Bayes classifiers?
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Lance Norskog
> >>> goksron@gmail.com
> >>>
> >>
> >>
> >>
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >>
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
> >>
>
>

Re: Does anyone use HBase with Mahout?

Posted by Dave Stuart <da...@progressivealliance.co.uk>.
Gora would work as I use that for Nutch 2.0 so it would be consistent across the products


On 25 Mar 2011, at 11:08, Sean Owen wrote:

> Indeed that sounds something much more like what's needed. That is in
> itself a nice new JIRA, and perhaps for after 0.5.
> 
> So the immediate question in my mind is, is it useful to 'clean up' in
> anticipation of this by removing the direct dependency on HBase? While
> I hear that people use HBase with Mahout, I'm not yet hearing anyone
> uses it in the one way that's supported so far.
> 
> On Fri, Mar 25, 2011 at 11:02 AM, Julien Nioche
> <li...@gmail.com> wrote:
>> What about using GORA?
>> It is under Apache license and it provides a unified API to use various
>> backends for mapreduce. We already have drivers for Hbase, Cassandra, SQL
>> and the first release should be imminent.
>> See http://incubator.apache.org/gora/ for more details
>> 
>> Julien
>> 
>> On 25 March 2011 09:26, Lance Norskog <go...@gmail.com> wrote:
>> 
>>> What other connectors are available? This a JDBC connector for HBase.
>>> Apache-licensed, someone's personal project.
>>> http://www.hbql.com/
>>> https://github.com/pambrose/HBql/
>>> 
>>> Lance
>>> 
>>> On Fri, Mar 25, 2011 at 2:17 AM, Sean Owen <sr...@gmail.com> wrote:
>>>> An idea came up to potentially remove the direct connection with
>>>> HBase, but, we'd keep it if it's of any use to anyone out there.
>>>> Mostly polling to see if anyone's using the HBase-related code in
>>>> Bayes classifiers?
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>> 
>> 
>> 
>> 
>> --
>> *
>> *Open Source Solutions for Text Engineering
>> 
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> 


Re: Does anyone use HBase with Mahout?

Posted by Sean Owen <sr...@gmail.com>.
Indeed that sounds something much more like what's needed. That is in
itself a nice new JIRA, and perhaps for after 0.5.

So the immediate question in my mind is, is it useful to 'clean up' in
anticipation of this by removing the direct dependency on HBase? While
I hear that people use HBase with Mahout, I'm not yet hearing anyone
uses it in the one way that's supported so far.

On Fri, Mar 25, 2011 at 11:02 AM, Julien Nioche
<li...@gmail.com> wrote:
> What about using GORA?
> It is under Apache license and it provides a unified API to use various
> backends for mapreduce. We already have drivers for Hbase, Cassandra, SQL
> and the first release should be imminent.
> See http://incubator.apache.org/gora/ for more details
>
> Julien
>
> On 25 March 2011 09:26, Lance Norskog <go...@gmail.com> wrote:
>
>> What other connectors are available? This a JDBC connector for HBase.
>> Apache-licensed, someone's personal project.
>> http://www.hbql.com/
>> https://github.com/pambrose/HBql/
>>
>> Lance
>>
>> On Fri, Mar 25, 2011 at 2:17 AM, Sean Owen <sr...@gmail.com> wrote:
>> > An idea came up to potentially remove the direct connection with
>> > HBase, but, we'd keep it if it's of any use to anyone out there.
>> > Mostly polling to see if anyone's using the HBase-related code in
>> > Bayes classifiers?
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Re: Does anyone use HBase with Mahout?

Posted by Julien Nioche <li...@gmail.com>.
What about using GORA?
It is under Apache license and it provides a unified API to use various
backends for mapreduce. We already have drivers for Hbase, Cassandra, SQL
and the first release should be imminent.
See http://incubator.apache.org/gora/ for more details

Julien

On 25 March 2011 09:26, Lance Norskog <go...@gmail.com> wrote:

> What other connectors are available? This a JDBC connector for HBase.
> Apache-licensed, someone's personal project.
> http://www.hbql.com/
> https://github.com/pambrose/HBql/
>
> Lance
>
> On Fri, Mar 25, 2011 at 2:17 AM, Sean Owen <sr...@gmail.com> wrote:
> > An idea came up to potentially remove the direct connection with
> > HBase, but, we'd keep it if it's of any use to anyone out there.
> > Mostly polling to see if anyone's using the HBase-related code in
> > Bayes classifiers?
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Does anyone use HBase with Mahout?

Posted by Lance Norskog <go...@gmail.com>.
What other connectors are available? This a JDBC connector for HBase.
Apache-licensed, someone's personal project.
http://www.hbql.com/
https://github.com/pambrose/HBql/

Lance

On Fri, Mar 25, 2011 at 2:17 AM, Sean Owen <sr...@gmail.com> wrote:
> An idea came up to potentially remove the direct connection with
> HBase, but, we'd keep it if it's of any use to anyone out there.
> Mostly polling to see if anyone's using the HBase-related code in
> Bayes classifiers?
>



-- 
Lance Norskog
goksron@gmail.com