You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Kirill Balyasnikov (JIRA)" <ji...@apache.org> on 2009/07/30 14:13:14 UTC

[jira] Created: (HBASE-1724) Data loss after `kill -9` region server

Data loss after `kill -9` region server
---------------------------------------

                 Key: HBASE-1724
                 URL: https://issues.apache.org/jira/browse/HBASE-1724
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.19.3
         Environment: OpenSUSE 11.1, HBase 0.19.3, Hadoop HDFS 0.19.1
            Reporter: Kirill Balyasnikov


I have a 3 node cluster setup each running hadoop and hbase. I have created 'accounts' table and loaded some data into it (about 3000 rows).
Some days later one of the region servers died and after i restarted it there were no records in the table at all. I saw HLOG file with my records
in HDFS but as I understand the file was not used by HBase to recover the table.

I tried to emulate the situation and uploaded another 2000 records into my table and killed the region server holding 'accounts' table region.
In the HDFS I found file with my records but some time later it was replaced by another empty directory. As i suspected after killed region 
server startup the data was not recovered.

Everything is lost again and there is no any exceptions in the logs...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1724) Data loss after `kill -9` region server

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737078#action_12737078 ] 

Jean-Daniel Cryans commented on HBASE-1724:
-------------------------------------------

This is already fixed in branch 0.19 and trunk. Please either fetch the latest in your branch from SVN or try out the new release candidate for 0.20 http://people.apache.org/~stack/hbase-0.20.0-candidate-1/

> Data loss after `kill -9` region server
> ---------------------------------------
>
>                 Key: HBASE-1724
>                 URL: https://issues.apache.org/jira/browse/HBASE-1724
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.3
>         Environment: OpenSUSE 11.1, HBase 0.19.3, Hadoop HDFS 0.19.1
>            Reporter: Kirill Balyasnikov
>         Attachments: log_fragment.txt
>
>
> I have a 3 node cluster setup each running hadoop and hbase. I have created 'accounts' table and loaded some data into it (about 3000 rows).
> Some days later one of the region servers died and after i restarted it there were no records in the table at all. I saw HLOG file with my records
> in HDFS but as I understand the file was not used by HBase to recover the table.
> I tried to emulate the situation and uploaded another 2000 records into my table and killed the region server holding 'accounts' table region.
> In the HDFS I found file with my records but some time later it was replaced by another empty directory. As i suspected after killed region 
> server startup the data was not recovered.
> Everything is lost again and there is no any exceptions in the logs...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1724) Data loss after `kill -9` region server

Posted by "Kirill Balyasnikov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737079#action_12737079 ] 

Kirill Balyasnikov commented on HBASE-1724:
-------------------------------------------

Thanks, i will give it a try!

> Data loss after `kill -9` region server
> ---------------------------------------
>
>                 Key: HBASE-1724
>                 URL: https://issues.apache.org/jira/browse/HBASE-1724
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.3
>         Environment: OpenSUSE 11.1, HBase 0.19.3, Hadoop HDFS 0.19.1
>            Reporter: Kirill Balyasnikov
>         Attachments: log_fragment.txt
>
>
> I have a 3 node cluster setup each running hadoop and hbase. I have created 'accounts' table and loaded some data into it (about 3000 rows).
> Some days later one of the region servers died and after i restarted it there were no records in the table at all. I saw HLOG file with my records
> in HDFS but as I understand the file was not used by HBase to recover the table.
> I tried to emulate the situation and uploaded another 2000 records into my table and killed the region server holding 'accounts' table region.
> In the HDFS I found file with my records but some time later it was replaced by another empty directory. As i suspected after killed region 
> server startup the data was not recovered.
> Everything is lost again and there is no any exceptions in the logs...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1724) Data loss after `kill -9` region server

Posted by "Kirill Balyasnikov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirill Balyasnikov updated HBASE-1724:
--------------------------------------

    Attachment: log_fragment.txt

I have attached a log fragment after region server killed. 192.168.0.220 is killed server. 192.168.0.222 is a new server holding 'accounts' table region.

> Data loss after `kill -9` region server
> ---------------------------------------
>
>                 Key: HBASE-1724
>                 URL: https://issues.apache.org/jira/browse/HBASE-1724
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.3
>         Environment: OpenSUSE 11.1, HBase 0.19.3, Hadoop HDFS 0.19.1
>            Reporter: Kirill Balyasnikov
>         Attachments: log_fragment.txt
>
>
> I have a 3 node cluster setup each running hadoop and hbase. I have created 'accounts' table and loaded some data into it (about 3000 rows).
> Some days later one of the region servers died and after i restarted it there were no records in the table at all. I saw HLOG file with my records
> in HDFS but as I understand the file was not used by HBase to recover the table.
> I tried to emulate the situation and uploaded another 2000 records into my table and killed the region server holding 'accounts' table region.
> In the HDFS I found file with my records but some time later it was replaced by another empty directory. As i suspected after killed region 
> server startup the data was not recovered.
> Everything is lost again and there is no any exceptions in the logs...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1724) Data loss after `kill -9` region server

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737081#action_12737081 ] 

Jean-Daniel Cryans commented on HBASE-1724:
-------------------------------------------

FYI, there will always be some level of dataloss in HBase as long as the appends in HDFS are not fully functional (suppose to be ready for 0.21). Until that time there are some knobs you can play with but the situation you described will at least not happen with the latest fixes. For example, we were not rolling the logs so you could easily lost data (as it did to you) after even days. Now it happens every hour and it is configurable. Also there is a max size a log can have until it is rolled. 

> Data loss after `kill -9` region server
> ---------------------------------------
>
>                 Key: HBASE-1724
>                 URL: https://issues.apache.org/jira/browse/HBASE-1724
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.3
>         Environment: OpenSUSE 11.1, HBase 0.19.3, Hadoop HDFS 0.19.1
>            Reporter: Kirill Balyasnikov
>         Attachments: log_fragment.txt
>
>
> I have a 3 node cluster setup each running hadoop and hbase. I have created 'accounts' table and loaded some data into it (about 3000 rows).
> Some days later one of the region servers died and after i restarted it there were no records in the table at all. I saw HLOG file with my records
> in HDFS but as I understand the file was not used by HBase to recover the table.
> I tried to emulate the situation and uploaded another 2000 records into my table and killed the region server holding 'accounts' table region.
> In the HDFS I found file with my records but some time later it was replaced by another empty directory. As i suspected after killed region 
> server startup the data was not recovered.
> Everything is lost again and there is no any exceptions in the logs...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.