You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by ኣርዓዶም <aa...@gmail.com> on 2012/03/20 10:23:33 UTC

multiple Database-based data with Mahout

Hi,

My colleague and I are using Apache Mahout in our research. Thanks to your
Apache Mahout Library, we have been able to easily taste different
Collaborative Filtering algorithms. Besides, the Mahout Library has saved
us from reinventing the wheel by implementing most of the things that we
will be using in our research. So, thank you for providing us with such a
decent collaborative filtering library.

But in our research, the requirement is that the input
data(user-item-value) comes from several databases. We are not sure if
Mahout provides an interface to get the input data from multiple database
servers at a time(i.e concurrent connection to several DBMS). As far as we
understand, it doesn't. And we are planning to develop that interface. But
we are not seasoned programmers so we may not come up with a clean
interface. Of-course we'll be happy to give the code to the Apache Mahout
community once we have done it.

But we want to know if the Apache Mahout community have started working on
a similar functionality. If that is the case, we will wait and make use of
your API(if it would be available soon), which, in our opinion would be a
cleaner implementation than ours.

Lastly, I have read the Apache License and I think we can change the Mahout
Library by adding some things and removing some of its parts. We are from
academia and everything we do with the library is for non-profit purpose. I
would be very glad to know if, by doing so,  we are infringing any of the
Apache Foundation License agreements.


P.S-  I'm not sure if this is the right place to ask this question so sorry
if I am putting it in the wrong place.


With regards

Re: multiple Database-based data with Mahout

Posted by ኣርዓዶም <aa...@gmail.com>.
Thanks for your response Sean Owen. It was very helpful and we'll share
continue to share our progress with the community.


On Tue, Mar 20, 2012 at 6:51 PM, Sean Owen <sr...@gmail.com> wrote:

> No there is not such support right now.
>
> The most useful piece of code would be a DataModel implementation that
> combines the data in several other DataModels. That would easily let you
> read from several databases.
>
> The hard part there is merging data sets (what if two DBs have data for one
> user-item pair?) and dealing with writes (which database gets the new
> datum?) You can probably ignore writes for now and make a read-only
> version.
>
> It is fine to modify Apache-licensed code any way you like. It just entails
> a few obligations when you distribute the code, like also distributing a
> copy of the Apache license and NOTICE file to show that the underlying code
> was licensed this way.
>
> The license is not hard to understand, it's not complex legalese:
> http://www.apache.org/licenses/LICENSE-2.0.html
>
>
>
> On Tue, Mar 20, 2012 at 9:23 AM, ኣርዓዶም <aa...@gmail.com> wrote:
>
> > Hi,
> >
> > My colleague and I are using Apache Mahout in our research. Thanks to
> your
> > Apache Mahout Library, we have been able to easily taste different
> > Collaborative Filtering algorithms. Besides, the Mahout Library has saved
> > us from reinventing the wheel by implementing most of the things that we
> > will be using in our research. So, thank you for providing us with such a
> > decent collaborative filtering library.
> >
> > But in our research, the requirement is that the input
> > data(user-item-value) comes from several databases. We are not sure if
> > Mahout provides an interface to get the input data from multiple database
> > servers at a time(i.e concurrent connection to several DBMS). As far as
> we
> > understand, it doesn't. And we are planning to develop that interface.
> But
> > we are not seasoned programmers so we may not come up with a clean
> > interface. Of-course we'll be happy to give the code to the Apache Mahout
> > community once we have done it.
> >
> > But we want to know if the Apache Mahout community have started working
> on
> > a similar functionality. If that is the case, we will wait and make use
> of
> > your API(if it would be available soon), which, in our opinion would be a
> > cleaner implementation than ours.
> >
> > Lastly, I have read the Apache License and I think we can change the
> Mahout
> > Library by adding some things and removing some of its parts. We are from
> > academia and everything we do with the library is for non-profit
> purpose. I
> > would be very glad to know if, by doing so,  we are infringing any of the
> > Apache Foundation License agreements.
> >
> >
> > P.S-  I'm not sure if this is the right place to ask this question so
> sorry
> > if I am putting it in the wrong place.
> >
> >
> > With regards
> >
>

Re: multiple Database-based data with Mahout

Posted by Sean Owen <sr...@gmail.com>.
No there is not such support right now.

The most useful piece of code would be a DataModel implementation that
combines the data in several other DataModels. That would easily let you
read from several databases.

The hard part there is merging data sets (what if two DBs have data for one
user-item pair?) and dealing with writes (which database gets the new
datum?) You can probably ignore writes for now and make a read-only version.

It is fine to modify Apache-licensed code any way you like. It just entails
a few obligations when you distribute the code, like also distributing a
copy of the Apache license and NOTICE file to show that the underlying code
was licensed this way.

The license is not hard to understand, it's not complex legalese:
http://www.apache.org/licenses/LICENSE-2.0.html



On Tue, Mar 20, 2012 at 9:23 AM, ኣርዓዶም <aa...@gmail.com> wrote:

> Hi,
>
> My colleague and I are using Apache Mahout in our research. Thanks to your
> Apache Mahout Library, we have been able to easily taste different
> Collaborative Filtering algorithms. Besides, the Mahout Library has saved
> us from reinventing the wheel by implementing most of the things that we
> will be using in our research. So, thank you for providing us with such a
> decent collaborative filtering library.
>
> But in our research, the requirement is that the input
> data(user-item-value) comes from several databases. We are not sure if
> Mahout provides an interface to get the input data from multiple database
> servers at a time(i.e concurrent connection to several DBMS). As far as we
> understand, it doesn't. And we are planning to develop that interface. But
> we are not seasoned programmers so we may not come up with a clean
> interface. Of-course we'll be happy to give the code to the Apache Mahout
> community once we have done it.
>
> But we want to know if the Apache Mahout community have started working on
> a similar functionality. If that is the case, we will wait and make use of
> your API(if it would be available soon), which, in our opinion would be a
> cleaner implementation than ours.
>
> Lastly, I have read the Apache License and I think we can change the Mahout
> Library by adding some things and removing some of its parts. We are from
> academia and everything we do with the library is for non-profit purpose. I
> would be very glad to know if, by doing so,  we are infringing any of the
> Apache Foundation License agreements.
>
>
> P.S-  I'm not sure if this is the right place to ask this question so sorry
> if I am putting it in the wrong place.
>
>
> With regards
>