You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Samuel Guo (JIRA)" <ji...@apache.org> on 2009/01/09 03:56:59 UTC

[jira] Created: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

we will lose data if the table name happens to be the logs' dir name
--------------------------------------------------------------------

                 Key: HBASE-1112
                 URL: https://issues.apache.org/jira/browse/HBASE-1112
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: Samuel Guo
            Priority: Minor


If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.

I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

Posted by "Samuel Guo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Samuel Guo updated HBASE-1112:
------------------------------

    Attachment: HBASE-1112.patch

attach a simple patch.
now store the HRS logs in the dir '@LOG@'. 

> we will lose data if the table name happens to be the logs' dir name
> --------------------------------------------------------------------
>
>                 Key: HBASE-1112
>                 URL: https://issues.apache.org/jira/browse/HBASE-1112
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Samuel Guo
>            Priority: Minor
>         Attachments: HBASE-1112.patch
>
>
> If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.
> I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667436#action_12667436 ] 

stack commented on HBASE-1112:
------------------------------

I like this change also because it cleans up our root dir moving the logs into a subdir -- a cluster of 100 could have 64 max each -- thats 6400 logs at toplevel  (We'll need to have a migration that screams if logs exist at rootdir)

> we will lose data if the table name happens to be the logs' dir name
> --------------------------------------------------------------------
>
>                 Key: HBASE-1112
>                 URL: https://issues.apache.org/jira/browse/HBASE-1112
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Samuel Guo
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1112.patch
>
>
> If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.
> I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1112:
-------------------------

    Fix Version/s: 0.20.0

Lets fix in 0.20.

> we will lose data if the table name happens to be the logs' dir name
> --------------------------------------------------------------------
>
>                 Key: HBASE-1112
>                 URL: https://issues.apache.org/jira/browse/HBASE-1112
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Samuel Guo
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1112.patch
>
>
> If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.
> I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1112.
--------------------------

    Resolution: Fixed

Committed.  Thanks for the patch Samuel.

> we will lose data if the table name happens to be the logs' dir name
> --------------------------------------------------------------------
>
>                 Key: HBASE-1112
>                 URL: https://issues.apache.org/jira/browse/HBASE-1112
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Samuel Guo
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1112.patch
>
>
> If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.
> I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667444#action_12667444 ] 

stack commented on HBASE-1112:
------------------------------

oh... .didn't realize.  Thats good.  One dir per server.   No harm moving all these into a subdir though I'd say, named @LOG@ or @LOGS@ as Samuel suggests.

Also, yes, highly unlikely that there'd be a table with a clashing name but I suppose no harm doing the little work to ensure it doesn't happen.

> we will lose data if the table name happens to be the logs' dir name
> --------------------------------------------------------------------
>
>                 Key: HBASE-1112
>                 URL: https://issues.apache.org/jira/browse/HBASE-1112
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Samuel Guo
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1112.patch
>
>
> If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.
> I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667432#action_12667432 ] 

Jim Kellerman commented on HBASE-1112:
--------------------------------------

How likely is this really?

The HLog directory name is:

"log_" + ipaddress-of-server + "_" + server-start-code + "_" + server-port-number

Whereas region region directories are just the region name. For example, region directory names:

{code}
drwxr-xr-x   - jim supergroup          0 2009-01-26 19:26 /hbase/-ROOT-
drwxr-xr-x   - jim supergroup          0 2009-01-26 19:26 /hbase/.META.
drwxr-xr-x   - jim supergroup          0 2009-01-26 19:53 /hbase/TestTable
{code}

Examples of log directory names:

{code}
drwxr-xr-x   - jim supergroup          0 2009-01-26 21:15 /hbase/log_208.76.44.139_1233004558648_8020
drwxr-xr-x   - jim supergroup          0 2009-01-26 21:15 /hbase/log_208.76.44.140_1233004558283_8020
drwxr-xr-x   - jim supergroup          0 2009-01-26 21:15 /hbase/log_208.76.44.141_1233004558140_8020
{code}

Seems to me the chance for collision is pretty small.

> we will lose data if the table name happens to be the logs' dir name
> --------------------------------------------------------------------
>
>                 Key: HBASE-1112
>                 URL: https://issues.apache.org/jira/browse/HBASE-1112
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Samuel Guo
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1112.patch
>
>
> If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.
> I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662253#action_12662253 ] 

stack commented on HBASE-1112:
------------------------------

Agreed.

Or keep logs in a subdir that has a name made with characters disallowed for table names (See HTableDescriptor#isLegalTableName).

> we will lose data if the table name happens to be the logs' dir name
> --------------------------------------------------------------------
>
>                 Key: HBASE-1112
>                 URL: https://issues.apache.org/jira/browse/HBASE-1112
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Samuel Guo
>            Priority: Minor
>
> If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.
> I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1112) we will lose data if the table name happens to be the logs' dir name

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667441#action_12667441 ] 

Jim Kellerman commented on HBASE-1112:
--------------------------------------

@Stack

The logs are in subdirectories. Note in the above that /hbase/log_208.76.44.139_1233004558648_8020, etc are all
directories (one per region server) and the individual log files are contained in those directories.

> we will lose data if the table name happens to be the logs' dir name
> --------------------------------------------------------------------
>
>                 Key: HBASE-1112
>                 URL: https://issues.apache.org/jira/browse/HBASE-1112
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Samuel Guo
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1112.patch
>
>
> If the tablename happens to equal with the logs' dir name of a certain regionserver, the table will store table's data into the same dir in HDFS shared with the regionserver's log dir. If the specified region server fails, the dir may be removed after the logs are replayed. And here, we lose the data.
> I suggest that a special char like '_' could be added before the logdir's name, just as what root region and meta region have done. So we can prevent the user table's data from being stored in a log dir. for example, 'log_10.132.15.1_1231465024534_60020' will be changed to '_log_10.132.15.1_1231465024534_60020'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.