You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Fredrik Hedberg <fr...@avafan.com> on 2008/01/06 16:08:33 UTC

Hadoop MapReduce + MySQL

Hi,

In order to simplify some data crunching for a client, I threw
together some code that allows you to run MapReduce jobs over data in
a MySQL table.

The code is heavily inspired by the MapReduce layer for HBase and
works much like it. However, it's mainly meant to be used for
development, as in it's current form, but could potentially be of use
for people that must keep their data in a relational database and
cannot migrate to HBase for some reason (without all the benefits of
HBase of course).

Needless to say, the code is a hack and has a lot of issues. Code is here [1].

If people find it useful, I can clean it up somewhat and put it in JIRA.

 - Fredrik


[1] http://www.avafan.com/~fredrik/hadoop/

Re: Hadoop MapReduce + MySQL

Posted by Fredrik Hedberg <fr...@avafan.com>.
Thanks for the input. The code is now attached to HADOOP-2536 [1] for
those who are not on hadoop-dev along with a simple example and some
basic documentation.

The code is self-contained and should be runnable by just dropping it
into your existing jar (except the MySQL connector that is).

Fredrik

[1] https://issues.apache.org/jira/browse/HADOOP-2536

On 1/7/08, Arun C Murthy <ar...@yahoo-inc.com> wrote:
> On Sun, Jan 06, 2008 at 04:08:33PM +0100, Fredrik Hedberg wrote:
> >Hi,
> >
> >In order to simplify some data crunching for a client, I threw
> >together some code that allows you to run MapReduce jobs over data in
> >a MySQL table.
> >
> >The code is heavily inspired by the MapReduce layer for HBase and
> >works much like it. However, it's mainly meant to be used for
> >development, as in it's current form, but could potentially be of use
> >for people that must keep their data in a relational database and
> >cannot migrate to HBase for some reason (without all the benefits of
> >HBase of course).
> >
> >Needless to say, the code is a hack and has a lot of issues. Code is here [1].
> >
> >If people find it useful, I can clean it up somewhat and put it in JIRA.
>
> Sure. The best bet is to propose a jira and let your consumers get a shot at it. I'd think you might get more interesting requirements too. Feel free to publicise the proposal on hadoop-user if you feel the need to get more eye-balls than on hadoop-dev. Oh, and some documentation would help! *smile*
> http://wiki.apache.org/lucene-hadoop/HowToContribute
>
> Doug - should we put up these in mapred.lib? Come to think of it, I'd say we could move mapred.lib to contrib and let users go wild with their own mappers/reducers/{input|output}formats etc.; and encourage them to contribute back. This could help build a nice eco-system around map-reduce, while offering lesser guarantees about it's feasibility/usability etc. Thoughts? If that makes sense I'll open a jira for this.
>
> Arun
>
> >
> > - Fredrik
> >
> >
> >[1] http://www.avafan.com/~fredrik/hadoop/
>

Re: Hadoop MapReduce + MySQL

Posted by Arun C Murthy <ar...@yahoo-inc.com>.
On Sun, Jan 06, 2008 at 04:08:33PM +0100, Fredrik Hedberg wrote:
>Hi,
>
>In order to simplify some data crunching for a client, I threw
>together some code that allows you to run MapReduce jobs over data in
>a MySQL table.
>
>The code is heavily inspired by the MapReduce layer for HBase and
>works much like it. However, it's mainly meant to be used for
>development, as in it's current form, but could potentially be of use
>for people that must keep their data in a relational database and
>cannot migrate to HBase for some reason (without all the benefits of
>HBase of course).
>
>Needless to say, the code is a hack and has a lot of issues. Code is here [1].
>
>If people find it useful, I can clean it up somewhat and put it in JIRA.

Sure. The best bet is to propose a jira and let your consumers get a shot at it. I'd think you might get more interesting requirements too. Feel free to publicise the proposal on hadoop-user if you feel the need to get more eye-balls than on hadoop-dev. Oh, and some documentation would help! *smile*
http://wiki.apache.org/lucene-hadoop/HowToContribute

Doug - should we put up these in mapred.lib? Come to think of it, I'd say we could move mapred.lib to contrib and let users go wild with their own mappers/reducers/{input|output}formats etc.; and encourage them to contribute back. This could help build a nice eco-system around map-reduce, while offering lesser guarantees about it's feasibility/usability etc. Thoughts? If that makes sense I'll open a jira for this.

Arun

>
> - Fredrik
>
>
>[1] http://www.avafan.com/~fredrik/hadoop/