You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Brian Forney <bf...@integral7.com> on 2009/04/18 00:30:22 UTC
Replicating data into HBase
Hi all,
I'd like to replicate a large dataset from a relational database into
HBase for better throughput of MapReduce jobs. Has anyone had success
replicating from a relational database (in my case SQL Server) to HBase?
Thanks,
Brian
Re: Replicating data into HBase
Posted by Tim Sell <tr...@gmail.com>.
That script depends on pgq, which is a postgres specific event queue.
It's handy for tracking table changes. If there is something similar
for sql server it might be helpful.
2009/4/18 stack <st...@duboce.net>:
> You might take a look at Tim Sells' postgres to hbase uploader scripts here
> for ideas:
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/examples/uploaders/
> St.Ack
>
> 2009/4/18 Billy Pearson <sa...@pearsonwholesale.com>
>
>> If you data is not to complex with multi fields etc. you could try to use
>> mysql bin logs just use
>> mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to
>> process bin logs and generate
>> a text version of the logs and process them with a map and then reduce in
>> to the table. this
>> would not provide live data but you could run a simple shell script to
>> process
>> the bin logs then delete or move them if you needed to sync up you could
>> call mysql to start a new bin log the shell
>> script could be ran as a cron job and it would pick up the latest bin log
>> and start the job.
>>
>> I would use linux command
>> find /binlog/location/*.bin -mmin +5
>> to find the logs that are ready to process.
>> That will give you all the bin logs that have not been modflyed in 5 mins
>>
>> If your insert/update querys are not to complex to process it would be
>> simple
>>
>> Billy
>>
>>
>>
>> "Brian Forney" <bf...@integral7.com> wrote in message
>> news:FDE7BB03-3A6B-41E3-B31B-E5FE577B1589@integral7.com...
>>
>> Ryan,
>>>
>>> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and understand
>>> that HBase and Bigtable are essentially large maps and do not use the
>>> relational model.
>>>
>>> Still interested in hearing if others have successfully done this. (I'm
>>> mostly looking for ways to speed up the implementation of a one- way
>>> replication: from a relational DB to HBase.)
>>>
>>> Thanks,
>>> Brian
>>>
>>> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
>>>
>>> HBase is not a relational database, so many things that are in a SQL
>>>> database dont exist.
>>>>
>>>> eg:
>>>> - sequences
>>>> - secondary declarative keys
>>>> - joins
>>>> - advance query features such as order by, group by
>>>> - operators of any kind
>>>>
>>>> Given conventions (eg: naming of index tables), it might be possible to
>>>> semi-automatedly convert data, but it might not efficiently take
>>>> advantage
>>>> of HBase's unique schema-less design.
>>>>
>>>> I suggest you have a look at the Google's bigtable paper, as it has the
>>>> same
>>>> underlying model that HBase does.
>>>>
>>>> Good luck!
>>>>
>>>>
>>>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <bf...@integral7.com>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>>
>>>>> I'd like to replicate a large dataset from a relational database into
>>>>> HBase
>>>>> for better throughput of MapReduce jobs. Has anyone had success
>>>>> replicating
>>>>> from a relational database (in my case SQL Server) to HBase?
>>>>>
>>>>> Thanks,
>>>>> Brian
>>>>>
>>>>>
>>>
>>>
>>
>>
>
Re: Replicating data into HBase
Posted by stack <st...@duboce.net>.
You might take a look at Tim Sells' postgres to hbase uploader scripts here
for ideas:
http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/examples/uploaders/
St.Ack
2009/4/18 Billy Pearson <sa...@pearsonwholesale.com>
> If you data is not to complex with multi fields etc. you could try to use
> mysql bin logs just use
> mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to
> process bin logs and generate
> a text version of the logs and process them with a map and then reduce in
> to the table. this
> would not provide live data but you could run a simple shell script to
> process
> the bin logs then delete or move them if you needed to sync up you could
> call mysql to start a new bin log the shell
> script could be ran as a cron job and it would pick up the latest bin log
> and start the job.
>
> I would use linux command
> find /binlog/location/*.bin -mmin +5
> to find the logs that are ready to process.
> That will give you all the bin logs that have not been modflyed in 5 mins
>
> If your insert/update querys are not to complex to process it would be
> simple
>
> Billy
>
>
>
> "Brian Forney" <bf...@integral7.com> wrote in message
> news:FDE7BB03-3A6B-41E3-B31B-E5FE577B1589@integral7.com...
>
> Ryan,
>>
>> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and understand
>> that HBase and Bigtable are essentially large maps and do not use the
>> relational model.
>>
>> Still interested in hearing if others have successfully done this. (I'm
>> mostly looking for ways to speed up the implementation of a one- way
>> replication: from a relational DB to HBase.)
>>
>> Thanks,
>> Brian
>>
>> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
>>
>> HBase is not a relational database, so many things that are in a SQL
>>> database dont exist.
>>>
>>> eg:
>>> - sequences
>>> - secondary declarative keys
>>> - joins
>>> - advance query features such as order by, group by
>>> - operators of any kind
>>>
>>> Given conventions (eg: naming of index tables), it might be possible to
>>> semi-automatedly convert data, but it might not efficiently take
>>> advantage
>>> of HBase's unique schema-less design.
>>>
>>> I suggest you have a look at the Google's bigtable paper, as it has the
>>> same
>>> underlying model that HBase does.
>>>
>>> Good luck!
>>>
>>>
>>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <bf...@integral7.com>
>>> wrote:
>>>
>>> Hi all,
>>>>
>>>> I'd like to replicate a large dataset from a relational database into
>>>> HBase
>>>> for better throughput of MapReduce jobs. Has anyone had success
>>>> replicating
>>>> from a relational database (in my case SQL Server) to HBase?
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>>
>>
>>
>
>
Re: Replicating data into HBase
Posted by Billy Pearson <sa...@pearsonwholesale.com>.
If you data is not to complex with multi fields etc. you could try to use
mysql bin logs just use
mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to
process bin logs and generate
a text version of the logs and process them with a map and then reduce in to
the table. this
would not provide live data but you could run a simple shell script to
process
the bin logs then delete or move them if you needed to sync up you could
call mysql to start a new bin log the shell
script could be ran as a cron job and it would pick up the latest bin log
and start the job.
I would use linux command
find /binlog/location/*.bin -mmin +5
to find the logs that are ready to process.
That will give you all the bin logs that have not been modflyed in 5 mins
If your insert/update querys are not to complex to process it would be
simple
Billy
"Brian Forney" <bf...@integral7.com> wrote in
message news:FDE7BB03-3A6B-41E3-B31B-E5FE577B1589@integral7.com...
> Ryan,
>
> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and
> understand that HBase and Bigtable are essentially large maps and do not
> use the relational model.
>
> Still interested in hearing if others have successfully done this. (I'm
> mostly looking for ways to speed up the implementation of a one- way
> replication: from a relational DB to HBase.)
>
> Thanks,
> Brian
>
> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
>
>> HBase is not a relational database, so many things that are in a SQL
>> database dont exist.
>>
>> eg:
>> - sequences
>> - secondary declarative keys
>> - joins
>> - advance query features such as order by, group by
>> - operators of any kind
>>
>> Given conventions (eg: naming of index tables), it might be possible to
>> semi-automatedly convert data, but it might not efficiently take
>> advantage
>> of HBase's unique schema-less design.
>>
>> I suggest you have a look at the Google's bigtable paper, as it has the
>> same
>> underlying model that HBase does.
>>
>> Good luck!
>>
>>
>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney
>> <bf...@integral7.com> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to replicate a large dataset from a relational database into
>>> HBase
>>> for better throughput of MapReduce jobs. Has anyone had success
>>> replicating
>>> from a relational database (in my case SQL Server) to HBase?
>>>
>>> Thanks,
>>> Brian
>>>
>
>
Re: Replicating data into HBase
Posted by Brian Forney <bf...@integral7.com>.
Ryan,
Thanks. Yep, I've read the Bigtable paper (now and in 2006) and
understand that HBase and Bigtable are essentially large maps and do
not use the relational model.
Still interested in hearing if others have successfully done this.
(I'm mostly looking for ways to speed up the implementation of a one-
way replication: from a relational DB to HBase.)
Thanks,
Brian
On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
> HBase is not a relational database, so many things that are in a SQL
> database dont exist.
>
> eg:
> - sequences
> - secondary declarative keys
> - joins
> - advance query features such as order by, group by
> - operators of any kind
>
> Given conventions (eg: naming of index tables), it might be possible
> to
> semi-automatedly convert data, but it might not efficiently take
> advantage
> of HBase's unique schema-less design.
>
> I suggest you have a look at the Google's bigtable paper, as it has
> the same
> underlying model that HBase does.
>
> Good luck!
>
>
> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney
> <bf...@integral7.com> wrote:
>
>> Hi all,
>>
>> I'd like to replicate a large dataset from a relational database
>> into HBase
>> for better throughput of MapReduce jobs. Has anyone had success
>> replicating
>> from a relational database (in my case SQL Server) to HBase?
>>
>> Thanks,
>> Brian
>>
Re: Replicating data into HBase
Posted by Ryan Rawson <ry...@gmail.com>.
HBase is not a relational database, so many things that are in a SQL
database dont exist.
eg:
- sequences
- secondary declarative keys
- joins
- advance query features such as order by, group by
- operators of any kind
Given conventions (eg: naming of index tables), it might be possible to
semi-automatedly convert data, but it might not efficiently take advantage
of HBase's unique schema-less design.
I suggest you have a look at the Google's bigtable paper, as it has the same
underlying model that HBase does.
Good luck!
On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <bf...@integral7.com> wrote:
> Hi all,
>
> I'd like to replicate a large dataset from a relational database into HBase
> for better throughput of MapReduce jobs. Has anyone had success replicating
> from a relational database (in my case SQL Server) to HBase?
>
> Thanks,
> Brian
>