You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Brian Forney <bf...@integral7.com> on 2009/04/18 00:30:22 UTC

Replicating data into HBase

Hi all,

I'd like to replicate a large dataset from a relational database into  
HBase for better throughput of MapReduce jobs. Has anyone had success  
replicating from a relational database (in my case SQL Server) to HBase?

Thanks,
Brian

Re: Replicating data into HBase

Posted by Tim Sell <tr...@gmail.com>.

That script depends on pgq, which is a postgres specific event queue.
It's handy for tracking table changes. If there is something similar
for sql server it might be helpful.

2009/4/18 stack <st...@duboce.net>:
> You might take a look at Tim Sells' postgres to hbase uploader scripts here
> for ideas:
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/examples/uploaders/
> St.Ack
>
> 2009/4/18 Billy Pearson <sa...@pearsonwholesale.com>
>
>> If you data is not to complex with multi fields etc. you could try to use
>> mysql bin logs just use
>> mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to
>> process bin logs and generate
>> a text version of the logs and process them with a map and then reduce in
>> to the table. this
>> would not provide live data but you could run a simple shell script to
>> process
>> the bin logs then delete or move them if you needed to sync up you could
>> call mysql to start a new bin log the shell
>> script could be ran as a cron job and it would pick up the latest bin log
>> and start the job.
>>
>> I would use linux command
>> find /binlog/location/*.bin -mmin +5
>> to find the logs that are ready to process.
>> That will give you all the bin logs that have not been modflyed in 5 mins
>>
>> If your insert/update querys are not to complex to process it would be
>> simple
>>
>> Billy
>>
>>
>>
>> "Brian Forney" <bf...@integral7.com> wrote in message
>> news:FDE7BB03-3A6B-41E3-B31B-E5FE577B1589@integral7.com...
>>
>>  Ryan,
>>>
>>> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and understand
>>> that HBase and Bigtable are essentially large maps and do  not use the
>>> relational model.
>>>
>>> Still interested in hearing if others have successfully done this.  (I'm
>>> mostly looking for ways to speed up the implementation of a one- way
>>> replication: from a relational DB to HBase.)
>>>
>>> Thanks,
>>> Brian
>>>
>>> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
>>>
>>>  HBase is not a relational database, so many things that are in a SQL
>>>> database dont exist.
>>>>
>>>> eg:
>>>> - sequences
>>>> - secondary declarative keys
>>>> - joins
>>>> - advance query features such as order by, group by
>>>> - operators of any kind
>>>>
>>>> Given conventions (eg: naming of index tables), it might be possible  to
>>>> semi-automatedly convert data, but it might not efficiently take
>>>> advantage
>>>> of HBase's unique schema-less design.
>>>>
>>>> I suggest you have a look at the Google's bigtable paper, as it has  the
>>>> same
>>>> underlying model that HBase does.
>>>>
>>>> Good luck!
>>>>
>>>>
>>>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <bf...@integral7.com>
>>>> wrote:
>>>>
>>>>  Hi all,
>>>>>
>>>>> I'd like to replicate a large dataset from a relational database  into
>>>>> HBase
>>>>> for better throughput of MapReduce jobs. Has anyone had success
>>>>> replicating
>>>>> from a relational database (in my case SQL Server) to HBase?
>>>>>
>>>>> Thanks,
>>>>> Brian
>>>>>
>>>>>
>>>
>>>
>>
>>
>

Re: Replicating data into HBase

Posted by stack <st...@duboce.net>.

You might take a look at Tim Sells' postgres to hbase uploader scripts here
for ideas:
http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/examples/uploaders/
St.Ack

2009/4/18 Billy Pearson <sa...@pearsonwholesale.com>

> If you data is not to complex with multi fields etc. you could try to use
> mysql bin logs just use
> mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to
> process bin logs and generate
> a text version of the logs and process them with a map and then reduce in
> to the table. this
> would not provide live data but you could run a simple shell script to
> process
> the bin logs then delete or move them if you needed to sync up you could
> call mysql to start a new bin log the shell
> script could be ran as a cron job and it would pick up the latest bin log
> and start the job.
>
> I would use linux command
> find /binlog/location/*.bin -mmin +5
> to find the logs that are ready to process.
> That will give you all the bin logs that have not been modflyed in 5 mins
>
> If your insert/update querys are not to complex to process it would be
> simple
>
> Billy
>
>
>
> "Brian Forney" <bf...@integral7.com> wrote in message
> news:FDE7BB03-3A6B-41E3-B31B-E5FE577B1589@integral7.com...
>
>  Ryan,
>>
>> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and understand
>> that HBase and Bigtable are essentially large maps and do  not use the
>> relational model.
>>
>> Still interested in hearing if others have successfully done this.  (I'm
>> mostly looking for ways to speed up the implementation of a one- way
>> replication: from a relational DB to HBase.)
>>
>> Thanks,
>> Brian
>>
>> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
>>
>>  HBase is not a relational database, so many things that are in a SQL
>>> database dont exist.
>>>
>>> eg:
>>> - sequences
>>> - secondary declarative keys
>>> - joins
>>> - advance query features such as order by, group by
>>> - operators of any kind
>>>
>>> Given conventions (eg: naming of index tables), it might be possible  to
>>> semi-automatedly convert data, but it might not efficiently take
>>> advantage
>>> of HBase's unique schema-less design.
>>>
>>> I suggest you have a look at the Google's bigtable paper, as it has  the
>>> same
>>> underlying model that HBase does.
>>>
>>> Good luck!
>>>
>>>
>>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <bf...@integral7.com>
>>> wrote:
>>>
>>>  Hi all,
>>>>
>>>> I'd like to replicate a large dataset from a relational database  into
>>>> HBase
>>>> for better throughput of MapReduce jobs. Has anyone had success
>>>> replicating
>>>> from a relational database (in my case SQL Server) to HBase?
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>>
>>
>>
>
>

Re: Replicating data into HBase

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

If you data is not to complex with multi fields etc. you could try to use 
mysql bin logs just use
mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to 
process bin logs and generate
a text version of the logs and process them with a map and then reduce in to 
the table. this
would not provide live data but you could run a simple shell script to 
process
the bin logs then delete or move them if you needed to sync up you could 
call mysql to start a new bin log the shell
script could be ran as a cron job and it would pick up the latest bin log 
and start the job.

I would use linux command
find /binlog/location/*.bin -mmin +5
to find the logs that are ready to process.
That will give you all the bin logs that have not been modflyed in 5 mins

If your insert/update querys are not to complex to process it would be 
simple

Billy

"Brian Forney" <bf...@integral7.com> wrote in 
message news:FDE7BB03-3A6B-41E3-B31B-E5FE577B1589@integral7.com...
> Ryan,
>
> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and 
> understand that HBase and Bigtable are essentially large maps and do  not 
> use the relational model.
>
> Still interested in hearing if others have successfully done this.  (I'm 
> mostly looking for ways to speed up the implementation of a one- way 
> replication: from a relational DB to HBase.)
>
> Thanks,
> Brian
>
> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
>
>> HBase is not a relational database, so many things that are in a SQL
>> database dont exist.
>>
>> eg:
>> - sequences
>> - secondary declarative keys
>> - joins
>> - advance query features such as order by, group by
>> - operators of any kind
>>
>> Given conventions (eg: naming of index tables), it might be possible  to
>> semi-automatedly convert data, but it might not efficiently take 
>> advantage
>> of HBase's unique schema-less design.
>>
>> I suggest you have a look at the Google's bigtable paper, as it has  the 
>> same
>> underlying model that HBase does.
>>
>> Good luck!
>>
>>
>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney 
>> <bf...@integral7.com> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to replicate a large dataset from a relational database  into 
>>> HBase
>>> for better throughput of MapReduce jobs. Has anyone had success 
>>> replicating
>>> from a relational database (in my case SQL Server) to HBase?
>>>
>>> Thanks,
>>> Brian
>>>
>
>

Re: Replicating data into HBase

Posted by Brian Forney <bf...@integral7.com>.

Ryan,

Thanks. Yep, I've read the Bigtable paper (now and in 2006) and  
understand that HBase and Bigtable are essentially large maps and do  
not use the relational model.

Still interested in hearing if others have successfully done this.  
(I'm mostly looking for ways to speed up the implementation of a one- 
way replication: from a relational DB to HBase.)

Thanks,
Brian

On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:

> HBase is not a relational database, so many things that are in a SQL
> database dont exist.
>
> eg:
> - sequences
> - secondary declarative keys
> - joins
> - advance query features such as order by, group by
> - operators of any kind
>
> Given conventions (eg: naming of index tables), it might be possible  
> to
> semi-automatedly convert data, but it might not efficiently take  
> advantage
> of HBase's unique schema-less design.
>
> I suggest you have a look at the Google's bigtable paper, as it has  
> the same
> underlying model that HBase does.
>
> Good luck!
>
>
> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney  
> <bf...@integral7.com> wrote:
>
>> Hi all,
>>
>> I'd like to replicate a large dataset from a relational database  
>> into HBase
>> for better throughput of MapReduce jobs. Has anyone had success  
>> replicating
>> from a relational database (in my case SQL Server) to HBase?
>>
>> Thanks,
>> Brian
>>

Re: Replicating data into HBase

Posted by Ryan Rawson <ry...@gmail.com>.

HBase is not a relational database, so many things that are in a SQL
database dont exist.

eg:
- sequences
- secondary declarative keys
- joins
- advance query features such as order by, group by
- operators of any kind

Given conventions (eg: naming of index tables), it might be possible to
semi-automatedly convert data, but it might not efficiently take advantage
of HBase's unique schema-less design.

I suggest you have a look at the Google's bigtable paper, as it has the same
underlying model that HBase does.

Good luck!

On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <bf...@integral7.com> wrote:

> Hi all,
>
> I'd like to replicate a large dataset from a relational database into HBase
> for better throughput of MapReduce jobs. Has anyone had success replicating
> from a relational database (in my case SQL Server) to HBase?
>
> Thanks,
> Brian
>