You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Muthu Pandi <mu...@gmail.com> on 2014/09/03 06:45:48 UTC

Mysql - Hive Sync

Dear All

     Am developing a prototype of syncing tables from mysql to Hive using
python and JDBC. Is it a good idea using the JDBC for this purpose.

My usecase will be generating the sales report using the hive, data pulled
from mysql using the prototype tool.My data will be around 2GB/day.



*Regards Muthupandi.K*

 [image: Picture (Device Independent Bitmap)]

Re: Mysql - Hive Sync

Posted by Stephen Sprague <sp...@gmail.com>.
interesting. thanks Muthu.

a colleague of mine pointed out this one too, linkedin's databus (
https://github.com/linkedin/databus/wiki)  this one looks extremely heavy
weight and again not sure its worth the headache.

i like the idea of a trigger on the mysql table and then broadcasting the
data to a another app via udp message.

cf. https://code.google.com/p/mysql-message-api/

the thing is you'll need to batch the records over say 5 minutes (or
whatever) then write the batch as one file to hdfs.

This seems infinitely simpler and more maintainable to me. :)




On Fri, Sep 5, 2014 at 11:53 PM, Muthu Pandi <mu...@gmail.com> wrote:

> Yeah installing Mysql hadoop applier took lot of time when building and
> installing GCC 4.6, and its working but its not serving the exact purpose.
> So now am trying with my own python scripting.
>
> Idea is reading insert query from binlog and save it under hive warehouse
> as table and query from there.
>
>
>
> *RegardsMuthupandi.K*
>
> [image: Picture (Device Independent Bitmap)]
>
>
>
> On Sat, Sep 6, 2014 at 4:47 AM, Stephen Sprague <sp...@gmail.com>
> wrote:
>
>> great find, Muthu.  I would be interested in hearing any about any
>> success or failures using this adapter. almost sounds too good to be true.
>>
>> After reading the blog (
>> http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.html)
>> about it i see it comes with caveats and it looks a little rough around the
>> edges for installing.  Not sure i'd bet the farm on this product but YMMV.
>>
>> Anyway, curious to know how it works out for you.
>>
>>
>>
>> On Tue, Sep 2, 2014 at 11:03 PM, Muthu Pandi <mu...@gmail.com> wrote:
>>
>>> This cant be done since insert update delete are not supported in hive.
>>>
>>> Mysql Applier for Hadoop package servers the same purpose of the
>>> prototype tool which i intended to develop.
>>>
>>> link for "Mysql Applier for Hadoop"
>>> http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html
>>>
>>>
>>>
>>> *Regards Muthupandi.K*
>>>
>>>  [image: Picture (Device Independent Bitmap)]
>>>
>>>
>>>
>>> On Wed, Sep 3, 2014 at 10:35 AM, Muthu Pandi <mu...@gmail.com>
>>> wrote:
>>>
>>>> Yeah but we cant make it to work as near real time. Also my table
>>>> doesnt have like 'ID' to use for --check-column that's why opted out of
>>>> sqoop.
>>>>
>>>>
>>>>
>>>> *Regards Muthupandi.K*
>>>>
>>>>  [image: Picture (Device Independent Bitmap)]
>>>>
>>>>
>>>>
>>>> On Wed, Sep 3, 2014 at 10:28 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> have you looked at sqoop?
>>>>>
>>>>>
>>>>> On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi <mu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear All
>>>>>>
>>>>>>      Am developing a prototype of syncing tables from mysql to Hive
>>>>>> using python and JDBC. Is it a good idea using the JDBC for this purpose.
>>>>>>
>>>>>> My usecase will be generating the sales report using the hive, data
>>>>>> pulled from mysql using the prototype tool.My data will be around 2GB/day.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Regards Muthupandi.K*
>>>>>>
>>>>>>  [image: Picture (Device Independent Bitmap)]
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Mysql - Hive Sync

Posted by Muthu Pandi <mu...@gmail.com>.
Yeah installing Mysql hadoop applier took lot of time when building and
installing GCC 4.6, and its working but its not serving the exact purpose.
So now am trying with my own python scripting.

Idea is reading insert query from binlog and save it under hive warehouse
as table and query from there.



*RegardsMuthupandi.K*

[image: Picture (Device Independent Bitmap)]



On Sat, Sep 6, 2014 at 4:47 AM, Stephen Sprague <sp...@gmail.com> wrote:

> great find, Muthu.  I would be interested in hearing any about any success
> or failures using this adapter. almost sounds too good to be true.
>
> After reading the blog (
> http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.html)
> about it i see it comes with caveats and it looks a little rough around the
> edges for installing.  Not sure i'd bet the farm on this product but YMMV.
>
> Anyway, curious to know how it works out for you.
>
>
>
> On Tue, Sep 2, 2014 at 11:03 PM, Muthu Pandi <mu...@gmail.com> wrote:
>
>> This cant be done since insert update delete are not supported in hive.
>>
>> Mysql Applier for Hadoop package servers the same purpose of the
>> prototype tool which i intended to develop.
>>
>> link for "Mysql Applier for Hadoop"
>> http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html
>>
>>
>>
>> *Regards Muthupandi.K*
>>
>>  [image: Picture (Device Independent Bitmap)]
>>
>>
>>
>> On Wed, Sep 3, 2014 at 10:35 AM, Muthu Pandi <mu...@gmail.com> wrote:
>>
>>> Yeah but we cant make it to work as near real time. Also my table doesnt
>>> have like 'ID' to use for --check-column that's why opted out of sqoop.
>>>
>>>
>>>
>>> *Regards Muthupandi.K*
>>>
>>>  [image: Picture (Device Independent Bitmap)]
>>>
>>>
>>>
>>> On Wed, Sep 3, 2014 at 10:28 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> have you looked at sqoop?
>>>>
>>>>
>>>> On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi <mu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Dear All
>>>>>
>>>>>      Am developing a prototype of syncing tables from mysql to Hive
>>>>> using python and JDBC. Is it a good idea using the JDBC for this purpose.
>>>>>
>>>>> My usecase will be generating the sales report using the hive, data
>>>>> pulled from mysql using the prototype tool.My data will be around 2GB/day.
>>>>>
>>>>>
>>>>>
>>>>> *Regards Muthupandi.K*
>>>>>
>>>>>  [image: Picture (Device Independent Bitmap)]
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>

Re: Mysql - Hive Sync

Posted by Stephen Sprague <sp...@gmail.com>.
great find, Muthu.  I would be interested in hearing any about any success
or failures using this adapter. almost sounds too good to be true.

After reading the blog (
http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.html)
about it i see it comes with caveats and it looks a little rough around the
edges for installing.  Not sure i'd bet the farm on this product but YMMV.

Anyway, curious to know how it works out for you.



On Tue, Sep 2, 2014 at 11:03 PM, Muthu Pandi <mu...@gmail.com> wrote:

> This cant be done since insert update delete are not supported in hive.
>
> Mysql Applier for Hadoop package servers the same purpose of the prototype
> tool which i intended to develop.
>
> link for "Mysql Applier for Hadoop"
> http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html
>
>
>
> *Regards Muthupandi.K*
>
>  [image: Picture (Device Independent Bitmap)]
>
>
>
> On Wed, Sep 3, 2014 at 10:35 AM, Muthu Pandi <mu...@gmail.com> wrote:
>
>> Yeah but we cant make it to work as near real time. Also my table doesnt
>> have like 'ID' to use for --check-column that's why opted out of sqoop.
>>
>>
>>
>> *Regards Muthupandi.K*
>>
>>  [image: Picture (Device Independent Bitmap)]
>>
>>
>>
>> On Wed, Sep 3, 2014 at 10:28 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> have you looked at sqoop?
>>>
>>>
>>> On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi <mu...@gmail.com>
>>> wrote:
>>>
>>>> Dear All
>>>>
>>>>      Am developing a prototype of syncing tables from mysql to Hive
>>>> using python and JDBC. Is it a good idea using the JDBC for this purpose.
>>>>
>>>> My usecase will be generating the sales report using the hive, data
>>>> pulled from mysql using the prototype tool.My data will be around 2GB/day.
>>>>
>>>>
>>>>
>>>> *Regards Muthupandi.K*
>>>>
>>>>  [image: Picture (Device Independent Bitmap)]
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: Mysql - Hive Sync

Posted by Muthu Pandi <mu...@gmail.com>.
This cant be done since insert update delete are not supported in hive.

Mysql Applier for Hadoop package servers the same purpose of the prototype
tool which i intended to develop.

link for "Mysql Applier for Hadoop"
http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html



*Regards Muthupandi.K*

 [image: Picture (Device Independent Bitmap)]



On Wed, Sep 3, 2014 at 10:35 AM, Muthu Pandi <mu...@gmail.com> wrote:

> Yeah but we cant make it to work as near real time. Also my table doesnt
> have like 'ID' to use for --check-column that's why opted out of sqoop.
>
>
>
> *Regards Muthupandi.K*
>
>  [image: Picture (Device Independent Bitmap)]
>
>
>
> On Wed, Sep 3, 2014 at 10:28 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> have you looked at sqoop?
>>
>>
>> On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi <mu...@gmail.com> wrote:
>>
>>> Dear All
>>>
>>>      Am developing a prototype of syncing tables from mysql to Hive
>>> using python and JDBC. Is it a good idea using the JDBC for this purpose.
>>>
>>> My usecase will be generating the sales report using the hive, data
>>> pulled from mysql using the prototype tool.My data will be around 2GB/day.
>>>
>>>
>>>
>>> *Regards Muthupandi.K*
>>>
>>>  [image: Picture (Device Independent Bitmap)]
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: Mysql - Hive Sync

Posted by Muthu Pandi <mu...@gmail.com>.
Yeah but we cant make it to work as near real time. Also my table doesnt
have like 'ID' to use for --check-column that's why opted out of sqoop.



*Regards Muthupandi.K*

 [image: Picture (Device Independent Bitmap)]



On Wed, Sep 3, 2014 at 10:28 AM, Nitin Pawar <ni...@gmail.com>
wrote:

> have you looked at sqoop?
>
>
> On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi <mu...@gmail.com> wrote:
>
>> Dear All
>>
>>      Am developing a prototype of syncing tables from mysql to Hive using
>> python and JDBC. Is it a good idea using the JDBC for this purpose.
>>
>> My usecase will be generating the sales report using the hive, data
>> pulled from mysql using the prototype tool.My data will be around 2GB/day.
>>
>>
>>
>> *Regards Muthupandi.K*
>>
>>  [image: Picture (Device Independent Bitmap)]
>>
>>
>
>
> --
> Nitin Pawar
>

Re: Mysql - Hive Sync

Posted by Nitin Pawar <ni...@gmail.com>.
have you looked at sqoop?


On Wed, Sep 3, 2014 at 10:15 AM, Muthu Pandi <mu...@gmail.com> wrote:

> Dear All
>
>      Am developing a prototype of syncing tables from mysql to Hive using
> python and JDBC. Is it a good idea using the JDBC for this purpose.
>
> My usecase will be generating the sales report using the hive, data pulled
> from mysql using the prototype tool.My data will be around 2GB/day.
>
>
>
> *Regards Muthupandi.K*
>
>  [image: Picture (Device Independent Bitmap)]
>
>


-- 
Nitin Pawar