You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Elsif (JIRA)" <ji...@apache.org> on 2009/09/25 01:50:15 UTC

[jira] Created: (HBASE-1867) Tool to regenerate an hbase table from the data files

Tool to regenerate an hbase table from the data files
-----------------------------------------------------

                 Key: HBASE-1867
                 URL: https://issues.apache.org/jira/browse/HBASE-1867
             Project: Hadoop HBase
          Issue Type: New Feature
          Components: util
    Affects Versions: 0.20.0
            Reporter: Elsif 
            Priority: Minor


The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.

Here are some comments from stack on this subject from the hbase-user mailing list:

Well, in the bin directory, there are scripts that do various things with
the .META. (copy a table, move a table, load a table whose source is hfiles
written by a mapreduce job; i.e. hbase-48).

So, to 'regenerate an hbase table from the data files', you'd need to do
something like the following:

+ delete all exisiting table references from .META.
+ move the backuped up table into position under hbase.rootdir
+ per region under hbase.rootdir, add an entry to .META.  Do this by opening
the .regioninfo file.  Its content is needed to generate the rowid for
.META. and its value becomes the info:regioninfo cell value.

HBase does not need to be down.  On next .META. scan, the newly added
regions will be noticed. They won't have associated info:server and
info:startcode entries so master will go ahead and assign them and  you
should be up and running.

Code-wise, a study of copy_table.rb (this uses old api ... needs updating
but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

    Attachment:     (was: addtable.rb)

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

    Attachment: addtable.rb

Here's a start.  Reads in args., sets up filesystem, moves aside extant table directory, moves into place pointed to directory.   Untested.  Still a bunch to do.  (If someone wants to take over, be my guest).

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>         Attachments: addtable.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

    Attachment: add_table.rb

Here's a first cut.  Needs a bit of testing.  Seems to work fine on small table of ten regions IFF the passed directory is a copied-aside table.  Does not yet work with the passing of arbitrary table name (Needs a bit of messy reconstructing of HTableDescriptor with new name then renaming of region directory with recalcuation of region encoded name).

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.2
           Status: Resolved  (was: Patch Available)

Added this script.  Its of general utility.

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.20.2, 0.21.0
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-1867:
----------------------------

    Assignee: stack

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

    Attachment: add_table.rb

This should fix the issue you were seeing... 

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>             Fix For: 0.20.1
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

    Attachment:     (was: add_table.rb)

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>             Fix For: 0.20.1
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "Woosuk Suh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761567#action_12761567 ] 

Woosuk Suh commented on HBASE-1867:
-----------------------------------

@stack
Yes, I`m using the 0.20.0 version of HBase with 0.20.0 version of Hadoop.
I`m going to enable the DEBUG log level as I witnessed the .META. table problem several times.
I`ll give my feedback when the problem happens to the mailing-list with the log attached.
Thanks!

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>             Fix For: 0.20.1
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "Woosuk Suh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762670#action_12762670 ] 

Woosuk Suh commented on HBASE-1867:
-----------------------------------

Cool! You are definitely thrilling me! :)
I will test your fixed version when it`s possible and give here a feedback.

I also trying to catch the .META. table problem, but that error no more happens for since last problem.
What a very typical characteristic of bug.. When you need it happen, it never happens. When you don`t, it happens :(

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759376#action_12759376 ] 

stack commented on HBASE-1867:
------------------------------

Above is fine except the bit about users being prompted for instructions to move the table aside.. .how about we just move it aside and tell user we did it rather than do user interaction messing in script

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

    Status: Patch Available  (was: Open)

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>             Fix For: 0.20.1
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "Woosuk Suh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761478#action_12761478 ] 

Woosuk Suh commented on HBASE-1867:
-----------------------------------

Your script worked perfectly with the HBase cluster with 4 running machines having 1011 regions.
I used your script because our .META. region evaporated for some unknown reason.
FYI, each server has 5GB memory and intel quad core CPU.

But sometimes I was not able to run the script because the errors happened.
So I`m going to give you all the process that I took to make the script work.

1. Our table had the structure on HDFS like this.
/hbase/TABLENAME

2. So I moved hbase to hbase_backup
/hbase_backup/TABLENAME

3. And then started hbase so the broken .META. table would regenerate cleanly. After starting hbase,
/hbase/
/hbase_backup/TABLENAME

4. And then I ran the script like this
bin/hbase org.jruby.Main add_table.rb hdfs://our.server.addr:port/hbase_backup/TABLENAME
Then I got the error from line 105, statuses were "nil" objects. Unable to iterate nil objects.

5. I`m not familiar to ruby but python, so I think it was impossible to iterate through None objects.
I printed the tableDir with LOG.info(tableDir.toString()) and I got following.
our.server.addr:port/hbase_backup/TABLENAME

6. So, I tried to copy the hbase_backup/TABLENAME to hbase/TABLENAME like following
bin/hadoop dfs -cp hbase_backup/TABLENAME hbase/TABLENAME

7. After a long time, copy process finished. And I tried to run the script again with following command.
bin/hbase org.jruby.Main add_table.rb hdfs://our.server.addr:port/hbase/TABLENAME
And it worked without any error or problem and all the regions were restored!

I hope this usage information helps your code improved.
Thanks for fabulous script!

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

    Fix Version/s:     (was: 0.20.1)
                   0.21.0

This doesn't have to be in 0.20.1.  We can point anyone who needs this script to this issue.  Moving it out.

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761549#action_12761549 ] 

stack commented on HBASE-1867:
------------------------------

@wooksuh Thanks for the report.  Let me try with table in a different location.  It must be something to do w/ qualified names in hdfs.  Let me figure it.  Also, can we figure what happened to your .META. table?  This is 0.20.0 hbase?  Enable DEBUG log level in case it happens again.

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1867:
-------------------------

    Fix Version/s: 0.20.1

Fix the wooksuh issue for 0.20.1

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>             Fix For: 0.20.1
>
>         Attachments: add_table.rb
>
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "Jon Graham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759329#action_12759329 ] 

Jon Graham commented on HBASE-1867:
-----------------------------------

Thanks Elsif for creating this JIRA


> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: Elsif 
>            Priority: Minor
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "elsif (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759354#action_12759354 ] 

elsif  commented on HBASE-1867:
-------------------------------

The input arguments would be the hdfs path and optionally a new name for the table:

    regenerate_table.rb HDFS_URL [TABLE_NAME]

If the table already exists the user would be prompted for instructions to move the table aside, remove it, or cancel the operation.  

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: elsif 
>            Priority: Minor
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1867) Tool to regenerate an hbase table from the data files

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759338#action_12759338 ] 

stack commented on HBASE-1867:
------------------------------

So, for this script... what will we pass it?  The location of a table in hdfs?  This table could be under hbase.rootdir or elsewhere.  If elsewhere, the script would also do the move into place?

If there is already a table under /hbase of same name... script could move it aside.

> Tool to regenerate an hbase table from the data files
> -----------------------------------------------------
>
>                 Key: HBASE-1867
>                 URL: https://issues.apache.org/jira/browse/HBASE-1867
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: Elsif 
>            Priority: Minor
>
> The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.
> Here are some comments from stack on this subject from the hbase-user mailing list:
> Well, in the bin directory, there are scripts that do various things with
> the .META. (copy a table, move a table, load a table whose source is hfiles
> written by a mapreduce job; i.e. hbase-48).
> So, to 'regenerate an hbase table from the data files', you'd need to do
> something like the following:
> + delete all exisiting table references from .META.
> + move the backuped up table into position under hbase.rootdir
> + per region under hbase.rootdir, add an entry to .META.  Do this by opening
> the .regioninfo file.  Its content is needed to generate the rowid for
> .META. and its value becomes the info:regioninfo cell value.
> HBase does not need to be down.  On next .META. scan, the newly added
> regions will be noticed. They won't have associated info:server and
> info:startcode entries so master will go ahead and assign them and  you
> should be up and running.
> Code-wise, a study of copy_table.rb (this uses old api ... needs updating
> but the concepts are the same) and loadtable.rb would probably be fruitful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.