You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vidhyashankar Venkataraman <vi...@yahoo-inc.com> on 2011/01/13 01:40:26 UTC

Ruby Bulk Load tool in 0.90

Is load_table.rb deprecated in 0.90?

I was trying to use load_table.rb to create a new table and bulk load files into it. It worked partly in the sense that the META table got populated, the files were moved to the appropriate location, but the server assignment did not happen until I restarted HBase. Is this a consequence of the master rewrite?

V

Re: Ruby Bulk Load tool in 0.90

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
>> I was figuring you can use the existing HTable create API that takes a list
>> of boundaries.
Oh great! Then this is pretty straightforward. I will send a patch soon on this for a review. I need this for my system asap anyways.

V

On 1/13/11 11:05 AM, "Todd Lipcon" <to...@cloudera.com> wrote:

On Thu, Jan 13, 2011 at 11:00 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> >> Nicolas actually did the multi-column-family patch for trunk a few weeks
> >> ago, so no need to upload that patch.
> That's great!
>
> >> If you want to have a go that would be great!
> Yes, I can take a shot at it. By "create these file boundaries", did you
> mean creating ZK state and a meta table entry for these boundaries? Because,
> after that both inc load and bulk load become the same.
>

I was figuring you can use the existing HTable create API that takes a list
of boundaries. Then you don't need to deal with ZK or META manually in any
way, and if any of that stuff changes you'll be using a supported public
API.

-Todd


> On 1/13/11 10:30 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
>
> Hey Vidhya,
>
> Nicolas actually did the multi-column-family patch for trunk a few weeks
> ago, so no need to upload that patch. Hopefully his will work for you.
>
> I'd love to see the ruby script gotten rid of, and improve the
> LoadIncrementalHFiles/completebulkload tool to also support loading into
> new
> tables. It should be pretty simple - if the table doesn't exist, scan the
> hfiles to find their boundaries, create a table with those boundaries, and
> then treat it like the incremental case. If you want to have a go that
> would
> be great!
>
> -Todd
>
> On Thu, Jan 13, 2011 at 8:53 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
> > So I was using 0.89 till last week and the tables created/bulk loaded
> just
> > fine with the script (one can only use the ruby script while creating the
> > table) and yesterday was when I ran our system creating a table from
> > scratch.
> >
> > I have to fix this thing anyways very soon so do let me know if you want
> me
> > to take a stab it. And thanks for creating the jira.
> >
> > Oh, and I have my custom patch that I use for our current system for
> > supporting multiple column families in bulk loads which is a little too
> > non-generic to be submitted as a patch for the open source. Let me clean
> > that up as well and upload it soon.
> >
> > Cheers,
> > V
> >
> > On 1/12/11 10:03 PM, "Stack" <st...@duboce.net> wrote:
> >
> > I think you are right Vidhya.  The new master has a different
> > mechanism assigning regions so the load_tables.rb won't work with new
> > master (Let me clean out the load_table.rb in 0.90 -- I filed
> > HBASE-3440 to fix).  It looks like the completebulkload from
> > http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html should
> > work.  It loads into pre-existing regions (I wonder why you've not
> > been using this script anyways?)
> >
> > Good on you Vidhya,
> > St.Ack
> >
> > On Wed, Jan 12, 2011 at 5:25 PM, Vidhyashankar Venkataraman
> > <vi...@yahoo-inc.com> wrote:
> > > I guess the master doesn't scan META periodically hence skips doing
> > anything with the updated META table.
> > >
> > > The ruby bulk load tool then needs some repair (the tool should write
> ZK
> > state for the new regions?).
> > >
> > >
> > > On 1/12/11 4:40 PM, "Vidhyashankar Venkataraman" <
> vidhyash@yahoo-inc.com>
> > wrote:
> > >
> > > Is load_table.rb deprecated in 0.90?
> > >
> > > I was trying to use load_table.rb to create a new table and bulk load
> > files into it. It worked partly in the sense that the META table got
> > populated, the files were moved to the appropriate location, but the
> server
> > assignment did not happen until I restarted HBase. Is this a consequence
> of
> > the master rewrite?
> > >
> > > V
> > >
> > >
> >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>


--
Todd Lipcon
Software Engineer, Cloudera


Re: Ruby Bulk Load tool in 0.90

Posted by Todd Lipcon <to...@cloudera.com>.
On Thu, Jan 13, 2011 at 11:00 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> >> Nicolas actually did the multi-column-family patch for trunk a few weeks
> >> ago, so no need to upload that patch.
> That's great!
>
> >> If you want to have a go that would be great!
> Yes, I can take a shot at it. By "create these file boundaries", did you
> mean creating ZK state and a meta table entry for these boundaries? Because,
> after that both inc load and bulk load become the same.
>

I was figuring you can use the existing HTable create API that takes a list
of boundaries. Then you don't need to deal with ZK or META manually in any
way, and if any of that stuff changes you'll be using a supported public
API.

-Todd


> On 1/13/11 10:30 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
>
> Hey Vidhya,
>
> Nicolas actually did the multi-column-family patch for trunk a few weeks
> ago, so no need to upload that patch. Hopefully his will work for you.
>
> I'd love to see the ruby script gotten rid of, and improve the
> LoadIncrementalHFiles/completebulkload tool to also support loading into
> new
> tables. It should be pretty simple - if the table doesn't exist, scan the
> hfiles to find their boundaries, create a table with those boundaries, and
> then treat it like the incremental case. If you want to have a go that
> would
> be great!
>
> -Todd
>
> On Thu, Jan 13, 2011 at 8:53 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
> > So I was using 0.89 till last week and the tables created/bulk loaded
> just
> > fine with the script (one can only use the ruby script while creating the
> > table) and yesterday was when I ran our system creating a table from
> > scratch.
> >
> > I have to fix this thing anyways very soon so do let me know if you want
> me
> > to take a stab it. And thanks for creating the jira.
> >
> > Oh, and I have my custom patch that I use for our current system for
> > supporting multiple column families in bulk loads which is a little too
> > non-generic to be submitted as a patch for the open source. Let me clean
> > that up as well and upload it soon.
> >
> > Cheers,
> > V
> >
> > On 1/12/11 10:03 PM, "Stack" <st...@duboce.net> wrote:
> >
> > I think you are right Vidhya.  The new master has a different
> > mechanism assigning regions so the load_tables.rb won't work with new
> > master (Let me clean out the load_table.rb in 0.90 -- I filed
> > HBASE-3440 to fix).  It looks like the completebulkload from
> > http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html should
> > work.  It loads into pre-existing regions (I wonder why you've not
> > been using this script anyways?)
> >
> > Good on you Vidhya,
> > St.Ack
> >
> > On Wed, Jan 12, 2011 at 5:25 PM, Vidhyashankar Venkataraman
> > <vi...@yahoo-inc.com> wrote:
> > > I guess the master doesn't scan META periodically hence skips doing
> > anything with the updated META table.
> > >
> > > The ruby bulk load tool then needs some repair (the tool should write
> ZK
> > state for the new regions?).
> > >
> > >
> > > On 1/12/11 4:40 PM, "Vidhyashankar Venkataraman" <
> vidhyash@yahoo-inc.com>
> > wrote:
> > >
> > > Is load_table.rb deprecated in 0.90?
> > >
> > > I was trying to use load_table.rb to create a new table and bulk load
> > files into it. It worked partly in the sense that the META table got
> > populated, the files were moved to the appropriate location, but the
> server
> > assignment did not happen until I restarted HBase. Is this a consequence
> of
> > the master rewrite?
> > >
> > > V
> > >
> > >
> >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Ruby Bulk Load tool in 0.90

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
>> Nicolas actually did the multi-column-family patch for trunk a few weeks
>> ago, so no need to upload that patch.
That's great!

>> If you want to have a go that would be great!
Yes, I can take a shot at it. By "create these file boundaries", did you mean creating ZK state and a meta table entry for these boundaries? Because, after that both inc load and bulk load become the same.

Thank you
Vidhya

On 1/13/11 10:30 AM, "Todd Lipcon" <to...@cloudera.com> wrote:

Hey Vidhya,

Nicolas actually did the multi-column-family patch for trunk a few weeks
ago, so no need to upload that patch. Hopefully his will work for you.

I'd love to see the ruby script gotten rid of, and improve the
LoadIncrementalHFiles/completebulkload tool to also support loading into new
tables. It should be pretty simple - if the table doesn't exist, scan the
hfiles to find their boundaries, create a table with those boundaries, and
then treat it like the incremental case. If you want to have a go that would
be great!

-Todd

On Thu, Jan 13, 2011 at 8:53 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> So I was using 0.89 till last week and the tables created/bulk loaded just
> fine with the script (one can only use the ruby script while creating the
> table) and yesterday was when I ran our system creating a table from
> scratch.
>
> I have to fix this thing anyways very soon so do let me know if you want me
> to take a stab it. And thanks for creating the jira.
>
> Oh, and I have my custom patch that I use for our current system for
> supporting multiple column families in bulk loads which is a little too
> non-generic to be submitted as a patch for the open source. Let me clean
> that up as well and upload it soon.
>
> Cheers,
> V
>
> On 1/12/11 10:03 PM, "Stack" <st...@duboce.net> wrote:
>
> I think you are right Vidhya.  The new master has a different
> mechanism assigning regions so the load_tables.rb won't work with new
> master (Let me clean out the load_table.rb in 0.90 -- I filed
> HBASE-3440 to fix).  It looks like the completebulkload from
> http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html should
> work.  It loads into pre-existing regions (I wonder why you've not
> been using this script anyways?)
>
> Good on you Vidhya,
> St.Ack
>
> On Wed, Jan 12, 2011 at 5:25 PM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
> > I guess the master doesn't scan META periodically hence skips doing
> anything with the updated META table.
> >
> > The ruby bulk load tool then needs some repair (the tool should write ZK
> state for the new regions?).
> >
> >
> > On 1/12/11 4:40 PM, "Vidhyashankar Venkataraman" <vi...@yahoo-inc.com>
> wrote:
> >
> > Is load_table.rb deprecated in 0.90?
> >
> > I was trying to use load_table.rb to create a new table and bulk load
> files into it. It worked partly in the sense that the META table got
> populated, the files were moved to the appropriate location, but the server
> assignment did not happen until I restarted HBase. Is this a consequence of
> the master rewrite?
> >
> > V
> >
> >
>
>


--
Todd Lipcon
Software Engineer, Cloudera


Re: Ruby Bulk Load tool in 0.90

Posted by Todd Lipcon <to...@cloudera.com>.
Hey Vidhya,

Nicolas actually did the multi-column-family patch for trunk a few weeks
ago, so no need to upload that patch. Hopefully his will work for you.

I'd love to see the ruby script gotten rid of, and improve the
LoadIncrementalHFiles/completebulkload tool to also support loading into new
tables. It should be pretty simple - if the table doesn't exist, scan the
hfiles to find their boundaries, create a table with those boundaries, and
then treat it like the incremental case. If you want to have a go that would
be great!

-Todd

On Thu, Jan 13, 2011 at 8:53 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> So I was using 0.89 till last week and the tables created/bulk loaded just
> fine with the script (one can only use the ruby script while creating the
> table) and yesterday was when I ran our system creating a table from
> scratch.
>
> I have to fix this thing anyways very soon so do let me know if you want me
> to take a stab it. And thanks for creating the jira.
>
> Oh, and I have my custom patch that I use for our current system for
> supporting multiple column families in bulk loads which is a little too
> non-generic to be submitted as a patch for the open source. Let me clean
> that up as well and upload it soon.
>
> Cheers,
> V
>
> On 1/12/11 10:03 PM, "Stack" <st...@duboce.net> wrote:
>
> I think you are right Vidhya.  The new master has a different
> mechanism assigning regions so the load_tables.rb won't work with new
> master (Let me clean out the load_table.rb in 0.90 -- I filed
> HBASE-3440 to fix).  It looks like the completebulkload from
> http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html should
> work.  It loads into pre-existing regions (I wonder why you've not
> been using this script anyways?)
>
> Good on you Vidhya,
> St.Ack
>
> On Wed, Jan 12, 2011 at 5:25 PM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
> > I guess the master doesn't scan META periodically hence skips doing
> anything with the updated META table.
> >
> > The ruby bulk load tool then needs some repair (the tool should write ZK
> state for the new regions?).
> >
> >
> > On 1/12/11 4:40 PM, "Vidhyashankar Venkataraman" <vi...@yahoo-inc.com>
> wrote:
> >
> > Is load_table.rb deprecated in 0.90?
> >
> > I was trying to use load_table.rb to create a new table and bulk load
> files into it. It worked partly in the sense that the META table got
> populated, the files were moved to the appropriate location, but the server
> assignment did not happen until I restarted HBase. Is this a consequence of
> the master rewrite?
> >
> > V
> >
> >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Ruby Bulk Load tool in 0.90

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
So I was using 0.89 till last week and the tables created/bulk loaded just fine with the script (one can only use the ruby script while creating the table) and yesterday was when I ran our system creating a table from scratch.

I have to fix this thing anyways very soon so do let me know if you want me to take a stab it. And thanks for creating the jira.

Oh, and I have my custom patch that I use for our current system for supporting multiple column families in bulk loads which is a little too non-generic to be submitted as a patch for the open source. Let me clean that up as well and upload it soon.

Cheers,
V

On 1/12/11 10:03 PM, "Stack" <st...@duboce.net> wrote:

I think you are right Vidhya.  The new master has a different
mechanism assigning regions so the load_tables.rb won't work with new
master (Let me clean out the load_table.rb in 0.90 -- I filed
HBASE-3440 to fix).  It looks like the completebulkload from
http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html should
work.  It loads into pre-existing regions (I wonder why you've not
been using this script anyways?)

Good on you Vidhya,
St.Ack

On Wed, Jan 12, 2011 at 5:25 PM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> I guess the master doesn't scan META periodically hence skips doing anything with the updated META table.
>
> The ruby bulk load tool then needs some repair (the tool should write ZK state for the new regions?).
>
>
> On 1/12/11 4:40 PM, "Vidhyashankar Venkataraman" <vi...@yahoo-inc.com> wrote:
>
> Is load_table.rb deprecated in 0.90?
>
> I was trying to use load_table.rb to create a new table and bulk load files into it. It worked partly in the sense that the META table got populated, the files were moved to the appropriate location, but the server assignment did not happen until I restarted HBase. Is this a consequence of the master rewrite?
>
> V
>
>


Re: Ruby Bulk Load tool in 0.90

Posted by Stack <st...@duboce.net>.
I think you are right Vidhya.  The new master has a different
mechanism assigning regions so the load_tables.rb won't work with new
master (Let me clean out the load_table.rb in 0.90 -- I filed
HBASE-3440 to fix).  It looks like the completebulkload from
http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html should
work.  It loads into pre-existing regions (I wonder why you've not
been using this script anyways?)

Good on you Vidhya,
St.Ack

On Wed, Jan 12, 2011 at 5:25 PM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> I guess the master doesn't scan META periodically hence skips doing anything with the updated META table.
>
> The ruby bulk load tool then needs some repair (the tool should write ZK state for the new regions?).
>
>
> On 1/12/11 4:40 PM, "Vidhyashankar Venkataraman" <vi...@yahoo-inc.com> wrote:
>
> Is load_table.rb deprecated in 0.90?
>
> I was trying to use load_table.rb to create a new table and bulk load files into it. It worked partly in the sense that the META table got populated, the files were moved to the appropriate location, but the server assignment did not happen until I restarted HBase. Is this a consequence of the master rewrite?
>
> V
>
>

Re: Ruby Bulk Load tool in 0.90

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
I guess the master doesn't scan META periodically hence skips doing anything with the updated META table.

The ruby bulk load tool then needs some repair (the tool should write ZK state for the new regions?).


On 1/12/11 4:40 PM, "Vidhyashankar Venkataraman" <vi...@yahoo-inc.com> wrote:

Is load_table.rb deprecated in 0.90?

I was trying to use load_table.rb to create a new table and bulk load files into it. It worked partly in the sense that the META table got populated, the files were moved to the appropriate location, but the server assignment did not happen until I restarted HBase. Is this a consequence of the master rewrite?

V