You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2010/02/20 03:09:28 UTC

[jira] Created: (HBASE-2244) META get inconsistent in a number of crash scenarios

META get inconsistent in a number of crash scenarios
----------------------------------------------------

                 Key: HBASE-2244
                 URL: https://issues.apache.org/jira/browse/HBASE-2244
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: Kannan Muthukkaruppan
            Priority: Critical
             Fix For: 0.20.4


(Forking this issue off from HBASE-2235).

During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 

But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.

For example, for a particular startkey, I see all these entries:

{code}
test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
                             1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
                             19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
                              [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
                             , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
 test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
 test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
 test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
                             1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
                             05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
                             00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
                             \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
                             ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
                             L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
                             \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
                             rueh\x0FQ\xCF
 test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
                             1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
                             73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
                              [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
                             , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
 test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
 test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
 test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
                             1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
                             x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
                             x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
                             0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
                             x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
                             00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
                             0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
                             \xB9\xBD\xFEO
 test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
                             1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
                             x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
                             x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
                             0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
                             x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
                             00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
                             0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
                             \xE1\xDF\xF8p
 test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
                             1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
                             34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
                             > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
                             Y => 'false', BLOCKCACHE => 'true'}]}}
{code}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2244:
-------------------------

    Attachment: 2244.patch

Patch that skips opening of new daughter regions post creation.  Its not necessary.  Should make splits run a little faster and take a little load off hdfs/nn.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836428#action_12836428 ] 

Jean-Daniel Cryans commented on HBASE-2244:
-------------------------------------------

bq. It would still be good to understand why parent row deletion takes time and fix any known inconsistencies in this area.

It's by design, we only clear the parent when the daughters no longer have references since we don't split the HFiles during the split process but after during compactions.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840485#action_12840485 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

Looks good to me. Thanks. 

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244-v6.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836322#action_12836322 ] 

stack commented on HBASE-2244:
------------------------------

Within the scope of this issue we should a couple of things for the 0.20 branch specifically.

First, add fixup to the metascanner for case where offlined parent but no daughters present because HRS crashed and didn't add daughters... or HRS carrying .META. crashed and we only recovered the parent offlining edit, but not the daughter additions.   This inconsistency is clean so easy to recognize.  Also, data needed to do the repair is in the offlined parent as columns splitA and splitB.

Here are a few notes.

+ On split, the HRS does three updates: 1. offline parent and add splitA and splitB columns that hold the HRegionInfo of daughter split regions, 2. add daughter A, and 3., add daughter B.  The updates are not done atomically.  Before we send the messages, the HRS has offlined (closed) the parent and created two new daughters.  The parent is already unavailable.
++ Reading code, there are issues to address in here.  If we crash after parent close, thats ok.  The parent will be assigned to a new HRS.  But subsequently, as the split goes forward, we do an open of the new daughter regions BEFORE we add them to the .META..  This seems like it could be avoided (speeding the split); only open once assigned in new location (Moving the location of where we do the split work should be all that is needed).  Also, if already a daughter region of same name in the FS, we'll fail the split rather than overwrite as it seems we should do (only reason for a pre-existing daughter is a split failed mid-way).  I can add a check of .META.  If daughter not there, its for sure a failed split.  Let me see if I can improve stuff in here in general for 0.20 as part of this patch.  I need to study  some more.
+ The HRS, after making updates in the .META., then sends a message to the master telling it about the split.  The master adds the new daughters to his assignment list and they are assigned out on next report-in by a cluster-member.  If this message is missed, the daughters are assigned the next time the metascanner runs.

In the .META. listing posted above, there are some interesting issues.  We still have a reference to a daughter, splitB, in the first offlined (row) region, yet the next row is a daughter that has been offlined itself.  There may be a race in here if we're splitting fast.  Let me check it out and see if a fix.

The other inconsistency is that there seems to be a row missing of the end, the splitB from test1,1204765,1266581233447.  Is that possible?

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836422#action_12836422 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

Stack wrote: <<<< In the .META. listing posted above, there are some interesting issues. We still have a reference to a daughter, splitB, in the first offlined (row) region, yet the next row is a daughter that has been offlined itself. There may be a race in here if we're splitting fast. Let me check it out and see if a fix.>>>

Yes, I see several times that nested splits are happening, but the offlined parent row hasn't been reaped. But perhaps that in itself isn't an issue.  For example, corresponding to my first .META. snippet in this JIRA:

The split of test1,1204765,1266569946560 was announced @4:08:

{code}
2010-02-19 04:08:23,764 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: test1,1204765,1266569946560: Daughters; test1,1204765,1266581233447, test1,1290703,1266581233447 from test013.abcxyz.com,60020,1266562597546; 1 of 3
{code}

But reclaiming the offlined parent row from .META. took time. First we  detected one of the daughters no longer reference it @ about 11:53:
{code}
2010-02-19 11:53:46,673 DEBUG org.apache.hadoop.hbase.master.BaseScanner: test1,1204765,1266581233447/1373493090 no longer has references to test1,1204765,1266569946560
{code}

And the second daughter at about 14:01. It is only at this point we delete the offlined parent row:
{code}
2010-02-19 14:01:48,283 DEBUG org.apache.hadoop.hbase.master.BaseScanner: test1,1290703,1266581233447/580635726 no longer has references to test1,1204765,1266569946560
2010-02-19 14:01:48,299 INFO org.apache.hadoop.hbase.master.BaseScanner: Deleting region test1,1204765,1266569946560 (encoded=1819368969) because daughter splits no longer hold references
{code}

Naturally, given this wide window it is not uncommon to see rows corresponding to nested splits in .META. In most of these cases, eventually the .META. seems to fix itself. But it still seems odd to me that it takes so much time. 

During one of these situations, I saw the client get errors of the form:

10/02/19 09:09:37 INFO tests.MultiThreadedWriter: [22] Users = 1052116, mails = 1M, time = 10:10:53
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.129.68.212:60020 for region\
 test1,1204765,1266581233447, row '1232785', but failed after 10 attempts.

and assumed that this was related to the .META. being in a wierd state (i.e. offlined parent not being deleted). But looking at the logs, these client errors happened during a smaller period (8:49 to 9:09). And were likely due to other load issues on the particular region server. I will post any findings from that RS'es logs shortly.






> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837674#action_12837674 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

@Todd/@Stack: So my previous run didn't have HDFS-927. But we have now applied the fix for that. We are also in the process of applying/testing a fix for HBASE-2234 (which in turn depends on HDFS-836). Will look into your other recommened HDFS patches as well.

Stack wrote: <<< Do you think hdfs-826 would have helped wih "Error Recovery for block blk_9144926768183088527_186431 bad datanode[0] nodes == null"? Looks like we ran out of replicas? Is that your take?>>>

Analyzed the various logs with Dhruba today.  Turns out this was happening because the region server started getting NotReplicatedYetException from HDFS for a particular block. When the client (region server) requests the HDFS namenode for a new block, the namenode first checks to see if datanodes for the penultimate block have sent in their "blocks received" confirmation. If not, the NameNode rejects the new block request with a NotReplicatedYetException.  The client retries a configurable number of times... (I think the default is 4), and in our case  eventually gave up after about 10 seconds.  The data nodes in question took about 30 seconds to send in their block received confirmation for the penultimate block. We don't yet have a good theory on why the data nodes were running slow. We have put in some more instrumentation on the HDFS side that might give us some clues if this happens again.




> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836906#action_12836906 ] 

Todd Lipcon commented on HBASE-2244:
------------------------------------

Kannan: have you applied HDFS-630 on your hadoop test cluster? There's a patch on that JIRA that's compatible with 0.20.
I also recommend HDFS-793, HDFS-101, and HDFS-927 which are in 0.20.2's release candidate but perhaps not in your version.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META get inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836100#action_12836100 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

Stack wrote <<<The report of split is not atomic; we are sending 3 separate puts. We don't have means of making a cross-row transaction out of this operation>>>.

Would keeping all region info for a table as columns under a single row key in .META. be such a crazy idea? With that you can do atomic mutations to a given table's region info. The cons would be that entire "rowkey" would be hosted on a single server... but for most tables that'd probably already be the case unless it has an exorbitant number of regions.

To which Stack responded with: <<<< For sure, lets do something in 0.20. I think metascanner when it runs can check an offlined region. If its not followed by two daughters - it has the necessary info as columns splitA and splitB - it can add them. Let me take a closer look. Will report back.

Having all in the one row is a bit radical. Gets and puts would each take out a row lock. I think this might slow down all .META. lookups. We also have a mechanism for getting the row closest to the asked for one. Its used for figuring out which region a row sits in. This would have to be recast if all was in the one row.>>>>



> META get inconsistent in a number of crash scenarios
> ----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840458#action_12840458 ] 

stack commented on HBASE-2244:
------------------------------

@Kannan I committed below to address your last comment.  Let me know if insufficient.

{code}
Index: core/src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java
===================================================================
--- core/src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java  (revision 918324)
+++ core/src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java  (working copy)
@@ -342,9 +342,8 @@
     if (!references) return references;
     if (!verifyDaughterRowPresent(rowContent, qualifier, srvr, metaRegionName,
         hri, parent)) {
-      // If we got here, we added a daughter region to metatable. Update
-      // parent row that daughter has been verified present so we don't check
-      // for it by doing a get each time through here.
+      // If we got here, then the parent row does not yet have the
+      // "daughter row verified present" marker present. Add it.
       addDaughterRowChecked(metaRegionName, srvr, parent.getRegionName(), hri,
         qualifier);
     }
@@ -360,8 +359,8 @@
    * @param metaRegionName
    * @param daughterHRI
    * @throws IOException
-   * @return True, if the daughter row is present in meta.  If false, this
-   * method just added it to meta.
+   * @return True, if parent row has marker for "daughter row verified present"
+   * else, false (and will do fixup adding daughter if daughter not present).
    */
   private boolean verifyDaughterRowPresent(final Result rowContent,
       final byte [] daughter, final HRegionInterface srvr,
{code}

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244-v6.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836280#action_12836280 ] 

stack commented on HBASE-2244:
------------------------------

I'm on it.  Patch soon.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838326#action_12838326 ] 

stack commented on HBASE-2244:
------------------------------

@Kannan I've seen NotReplicatedYetException.  Adding instrumentation to figure why datanode is slow reporting sounds excellent.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2244:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Applied branch and trunk.  Resolving.  Please reopen if you think this issue not done (Kannan suggested it a non-issue a good while back)

That said, there is still good stuff/discussion up in this issue.  Lets open new issues for the still unresolve for stuff like figuring why notreplicatedyet is happening, etc.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244-v6.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836390#action_12836390 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

Stack: The splitB from test1,1204765,1266581233447, namely  "test1,1226169,1266609171581" was probably there. Not entirely sure. I should have cut-pasted more from the scan of .META. 

I just wanted to add that during this state, the client was receiving errors of the form:

{code}
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.129.68.212:60020 for region\
 test1,1204765,1266581233447, row '1232762', but failed after 10 attempts. 
{code}

Note: key "1232762" would fall in the "splitB" range for the original test1,1204765,1266581233447.

(BTW, are the JIRA time zones always in GMT? Is there a way to configure it to local timezones?)

When I look at the same cluster's .META. now, the state for the related META regions is as follows:

{code}

 test1,1204765,1266616432091 column=info:regioninfo, timestamp=1266616432230, value=REGION => {NAME => 'test1,
                             1204765,1266616432091', STARTKEY => '1204765', ENDKEY => '1215466', ENCODED => 41
                             6976676, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =>
                              '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY
                              => 'false', BLOCKCACHE => 'true'}]}}
 test1,1204765,1266616432091 column=info:server, timestamp=1266616433943, value=10.129.68.213:60020
 test1,1204765,1266616432091 column=info:serverstartcode, timestamp=1266616433943, value=1266562597511
 test1,1215466,1266616432091 column=info:regioninfo, timestamp=1266616432232, value=REGION => {NAME => 'test1,
                             1215466,1266616432091', STARTKEY => '1215466', ENDKEY => '1226169', ENCODED => 40
                             3995950, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =>
                              '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY
                              => 'false', BLOCKCACHE => 'true'}]}}
 test1,1215466,1266616432091 column=info:server, timestamp=1266616434963, value=10.129.68.213:60020
 test1,1215466,1266616432091 column=info:serverstartcode, timestamp=1266616434963, value=1266562597511
 test1,1226169,1266609171581 column=info:regioninfo, timestamp=1266621116341, value=REGION => {NAME => 'test1,
                             1226169,1266609171581', STARTKEY => '1226169', ENDKEY => '1290703', ENCODED => 45
                             9318323, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
                             [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647',
                              BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
 test1,1226169,1266609171581 column=info:server, timestamp=1266613546335, value=10.129.68.214:60020
 test1,1226169,1266609171581 column=info:serverstartcode, timestamp=1266613546335, value=1266562596451
 test1,1226169,1266609171581 column=info:splitA, timestamp=1266621116341, value=\x00\x0512790\x00\x00\x00\x01\
                             x26\xE8\x80l\xCD\x1Btest1,1226169,1266621115597\x00\x071226169\x00\x00\x00\x05\x0
                             5test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\x0
                             0\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x00\
                             x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\x0
                             0\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x00
                             \x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x00\
                             x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04trueb\
                             xD0\x21\xBC
 test1,1226169,1266609171581 column=info:splitB, timestamp=1266621116341, value=\x00\x071290703\x00\x00\x00\x0
                             1\x26\xE8\x80l\xCD\x19test1,12790,1266621115597\x00\x0512790\x00\x00\x00\x05\x05t
                             est1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\x00\
                             x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x00\x0
                             7\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\x00\
                             x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x00\x
                             00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x00\x0
                             9IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true\x5B
                             \x7D\x1A1
 test1,1226169,1266621115597 column=info:regioninfo, timestamp=1266621116358, value=REGION => {NAME => 'test1,
                             1226169,1266621115597', STARTKEY => '1226169', ENDKEY => '12790', ENCODED => 1988
                             459280, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =>
                             '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY
                             => 'false', BLOCKCACHE => 'true'}]}}
 test1,1226169,1266621115597 column=info:server, timestamp=1266768596714, value=10.129.68.212:60020
 test1,1226169,1266621115597 column=info:serverstartcode, timestamp=1266768596714, value=1266768408023
 test1,12790,1266621115597   column=info:regioninfo, timestamp=1266621116361, value=REGION => {NAME => 'test1,
                             12790,1266621115597', STARTKEY => '12790', ENDKEY => '1290703', ENCODED => 179081
                             1553, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS => '3
                             ', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
                              'false', BLOCKCACHE => 'true'}]}}
 test1,12790,1266621115597   column=info:server, timestamp=1266768555610, value=10.129.68.214:60020
 test1,12790,1266621115597   column=info:serverstartcode, timestamp=1266768555610, value=1266768408015
{code}


Note: that both the daughters of  the parent region test1,1226169,1266609171581 have been installed. But the offlined parent row itself (for test1,1226169,1266609171581) is still present. Not sure if it is in these situations that the client starts seeing errors... but curious why the offline parent row hasn't been reaped yet from .META.

I repro'ed a similar problem yesterday. Might have more detailed logs and errors. Will share them shortly.




> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2244:
-------------------------

    Attachment: 2244-v3.patch

This version adds to the above fixup in BaseScanner.  Currently BaseScanner, checks offlined parents.  If no references by daughters, then we let the parent go.   This patch adds a bit of code in here to check -- if there are references stil -- that the daughters of the split are in the metatable.  If not, it adds them and marks the parent row so we don't do expensive check each time through metascan.

Need to add a test and run on cluster yet.  Just posting what I have so far.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836907#action_12836907 ] 

stack commented on HBASE-2244:
------------------------------

I took a look at our split code.  Its not as bad as I thought in that the fail I mention above should never happen (on open of a region, we clean up any old split working dirs so there should never be a clash creating daughter dirs -- if there is then something very seriously wrong is going on and we should fail).

@Kannan Do you think hdfs-826 would have helped wih "Error Recovery for block blk_9144926768183088527_186431 bad datanode[0] nodes == null"?  Looks like we ran out of replicas?  Is that your take?

I'd like to keep this issue open for the fixup I'd like to add to .META. that will defend against case where we get the parent offlining edit but somehow the daughter additions don't go in.  Let me post a fuller patch soon.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-2244:
----------------------------

    Assignee: stack

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836478#action_12836478 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

@stack, jdcryans:  The exact implication of the "references" -- i.e. the 
daughter regions still sharing HFile with the parent for a period of 
time -- hadn't sunk in.  But it makes sense now. I think we can close 
this issue as not a bug.

@stack: If you found other opportunities (as part of your recent code walk 
through) for making the general handling of splits more bullet proof,
perhaps we should file that as a separate issue so that it doesn't get
muddled with this non-issue.

@ What was happening during the time the client failed? 

In this particular case the RS was shutting down. The DNs were getting overwhelmed in my small cluster test case, and not able to keep up. It started with a lot of:

{code}
2010-02-19 08:48:57,462 WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping /hbase-kannan1/test1/580635726/actions/133921297096924993\
7 retries left 1
{code}

Followed by:
{code}
2010-02-19 08:49:07,102 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_9144926768183088527_186431 bad datanode[0] nodes == null
2010-02-19 08:49:07,102 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/hbase-kannan1/test1/580635726/actions/133921297\
0969249937" - Aborting...
2010-02-19 08:49:07,117 FATAL org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog required. Forcing server shutdown
{code}





> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838536#action_12838536 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

@Stack, JD: I haven't had a chance to review the patch. I'll do that as well as try it out on my test setup. So perhaps just hold off on committing it for a few days...

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840428#action_12840428 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

Also,

{code}
    if (!verifyDaughterRowPresent(rowContent, qualifier, srvr, metaRegionName,
        hri, parent)) {
      // If we got here, we added a daughter region to metatable. Update
      // parent row that daughter has been verified present so we don't check
      // for it by doing a get each time through here.
      addDaughterRowChecked(metaRegionName, srvr, parent.getRegionName(), hri,
        qualifier);
    }
{code}

the comment above isn't strictly correct. We'll do a addDaughterRowChecked() even if the daughter region was already present in the meta table, correct?

Similarly, the description for the "return" param of verifyDaughterRowPresent() needs to be tweaked accordingly. The function returns TRUE if we have already checked this daughter and FALSE otherwise.




> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244-v6.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840427#action_12840427 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

There is one minor issue still with the patch that:

in hasReferences(), this condition:

      if (ps != null && ps.length > 0) {
        result = true;
        break;
      }

should be changed to:

      if (ps != null && ps.length > 0) {
        return true;
      }

We don't want to fall into removeDaughterFromParent() under the above. The older version of the code had a:

   if (result) {
      return result;
    }
    
to handle the same.



> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244-v6.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2244:
-------------------------

    Attachment: 2244-v5.patch

Can't add unit test for this explicit case.  The actors are the BaseScanner up in Master and a failing split over in a RegionServer.  The split region needs to be completely coherent; that is, after split it has to have references in place else fixup won't cut in (that there are references and there is no entry for a daughter is how we identify this pathology).  Then we need to have regionserver fail at an explicit location in a cluster test, after offlining parent but but somewhere in the
middle of adding the daughters to meta.  Our current testing vocabulary doesn't allow us do this kind of detailed fail (Needs to be fixed but not as part of this patch).

So, I can see the repair cutting in if I doctor the RegionServer with the following patch to CompactSplitThread:

{code}
Index: src/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
===================================================================
--- src/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java       (revision 915869)
+++ src/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java       (working copy)
@@ -199,6 +199,13 @@
         Writables.getBytes(newRegions[1].getRegionInfo()));
     t.put(put);
     
+    // If we crash here, then the daughters will not be added and we'll have
+    // and offlined parent but no daughters to take up the slack.  hbase-2244
+    // adds fixup to the metascanners.
+
+    // REMOVE
+    if (true) throw new IOException("Fail to add other regions and to send split to master");
+    
     // Add new regions to META
     for (int i = 0; i < newRegions.length; i++) {
       put = new Put(newRegions[i].getRegionName());

{code}

Notice the throw IOE after parent has been offlined.  This makes it so meta does not get daughter entries and the master does not get split message.

In TestForceSplit, you'll see the following:

{code}
2010-02-24 19:45:16,685 ERROR [RegionServer:0.compactor] regionserver.CompactSplitThread(104): Compaction/Split failed for region test,,1267069500020
java.io.IOException: Fail to add other regions and to send split to master
	at org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:207)
	at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:95)
2010-02-24 19:45:17,101 DEBUG [WaitOnSplit] client.TestForceSplit$WaitOnSplit(72): Cycle waiting on split
2010-02-24 19:45:18,105 DEBUG [WaitOnSplit] client.TestForceSplit$WaitOnSplit(72): Cycle waiting on split
2010-02-24 19:45:19,111 DEBUG [WaitOnSplit] client.TestForceSplit$WaitOnSplit(72): Cycle waiting on split
2010-02-24 19:45:20,117 DEBUG [WaitOnSplit] client.TestForceSplit$WaitOnSplit(72): Cycle waiting on split
2010-02-24 19:45:21,122 DEBUG [WaitOnSplit] client.TestForceSplit$WaitOnSplit(72): Cycle waiting on split
2010-02-24 19:45:22,127 DEBUG [WaitOnSplit] client.TestForceSplit$WaitOnSplit(72): Cycle waiting on split
2010-02-24 19:45:23,133 DEBUG [WaitOnSplit] client.TestForceSplit$WaitOnSplit(72): Cycle waiting on split
2010-02-24 19:45:24,140 DEBUG [WaitOnSplit] client.TestForceSplit$WaitOnSplit(72): Cycle waiting on split
2010-02-24 19:45:24,655 INFO  [Thread-103] master.ServerManager$ServerMonitor(129): 1 region servers, 0 dead, average load 2.0
2010-02-24 19:45:24,920 INFO  [RegionManager.metaScanner] master.BaseScanner(163): RegionManager.metaScanner scanning meta region {server: 127.0.0.1:59090, regionname: .META.,,1, startKey: <>}
2010-02-24 19:45:24,921 INFO  [RegionManager.rootScanner] master.BaseScanner(163): RegionManager.rootScanner scanning meta region {server: 127.0.0.1:59090, regionname: -ROOT-,,0, startKey: <>}
2010-02-24 19:45:24,941 INFO  [RegionManager.rootScanner] master.BaseScanner(242): RegionManager.rootScanner scan of 1 row(s) of meta region {server: 127.0.0.1:59090, regionname: -ROOT-,,0, startKey: <>} complete
2010-02-24 19:45:24,942 WARN  [RegionManager.metaScanner] master.BaseScanner(379): Fixup broke split: Add missing split daughter to meta, daughter=REGION => {NAME => 'test,,1267069516539', STARTKEY => '', ENDKEY => 'lqg', ENCODED => 904299994, TABLE => {{NAME => 'test', FAMILIES => [{NAME => 'a', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, parent=REGION => {NAME => 'test,,1267069500020', STARTKEY => '', ENDKEY => '', ENCODED => 970247455, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test', FAMILIES => [{NAME => 'a', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
2010-02-24 19:45:24,947 WARN  [RegionManager.metaScanner] master.BaseScanner(379): Fixup broke split: Add missing split daughter to meta, daughter=REGION => {NAME => 'test,lqg,1267069516539', STARTKEY => 'lqg', ENDKEY => '', ENCODED => 1813617937, TABLE => {{NAME => 'test', FAMILIES => [{NAME => 'a', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, parent=REGION => {NAME => 'test,,1267069500020', STARTKEY => '', ENDKEY => '', ENCODED => 970247455, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test', FAMILIES => [{NAME => 'a', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
2010-02-24 19:45:24,948 INFO  [RegionManager.metaScanner] master.BaseScanner(242): RegionManager.metaScanner scan of 1 row(s) of meta region {server: 127.0.0.1:59090, regionname: .META.,,1, startKey: <>} complete
2010-02-24 19:45:24,949 INFO  [RegionManager.metaScanner] master.MetaScanner(132): All 1 .META. region(s) scanned
2010-02-24 19:45:25,145 DEBUG [WaitOnSplit] client.HConnectionManager$TableServers(780): Cache hit for row <> in tableName .META.: location server 127.0.0.1:59090, location region name .META.,,1
2010-02-24 19:45:25,149 DEBUG [WaitOnSplit] client.HTable$ClientScanner(1832): Creating scanner over .META. starting at key ''
2010-02-24 19:45:25,151 DEBUG [WaitOnSplit] client.HTable$ClientScanner(1928): Advancing internal scanner to startKey at ''
2010-02-24 19:45:25,151 DEBUG [WaitOnSplit] client.HConnectionManager$TableServers(780): Cache hit for row <> in tableName .META.: location server 127.0.0.1:59090, location region name .META.,,1
2010-02-24 19:45:25,157 INFO  [WaitOnSplit] client.TestForceSplit$WaitOnSplit(79): keyvalues={test,,1267069500020/info:regioninfo/1267069516682/Put/vlen=254, test,,1267069500020/info:server/1267069501165/Put/vlen=15, test,,1267069500020/info:serverstartcode/1267069501165/Put/vlen=8, test,,1267069500020/info:splitA/1267069516682/Put/vlen=257, test,,1267069500020/info:splitA_checked/1267069524944/Put/vlen=1, test,,1267069500020/info:splitB/1267069516682/Put/vlen=260, test,,1267069500020/info:splitB_checked/1267069524948/Put/vlen=1}
2010-02-24 19:45:25,158 INFO  [WaitOnSplit] client.TestForceSplit$WaitOnSplit(79): keyvalues={test,,1267069516539/info:regioninfo/1267069524943/Put/vlen=257}
2010-02-24 19:45:25,158 INFO  [WaitOnSplit] client.TestForceSplit$WaitOnSplit(79): keyvalues={test,lqg,1267069516539/info:regioninfo/1267069524948/Put/vlen=260}
2010-02-24 19:45:25,159 DEBUG [WaitOnSplit] client.HTable$ClientScanner(1915): Finished with scanning at REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2010-02-24 19:45:25,160 ERROR [IPC Server handler 4 on 59090] regionserver.HRegionServer(862): 
org.apache.hadoop.hbase.NotServingRegionException: test,,1267069500020 is closing=true or closed=true
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1787)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1907)
	at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-02-24 19:45:25,175 DEBUG [main] client.HTable$ClientScanner(1928): Advancing internal scanner to startKey at 'aaa'
2010-02-24 19:45:25,177 ERROR [IPC Server handler 2 on 59090] regionserver.HRegionServer(864): Failed openScanner
org.apache.hadoop.hbase.NotServingRegionException: test,,1267069500020
	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2279)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1858)
	at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-02-24 19:45:30,186 DEBUG [main] client.HConnectionManager$TableServers(857): Removed test,,1267069500020 for tableName=test from cache because of aaa
2010-02-24 19:45:30,188 DEBUG [main] client.HConnectionManager$TableServers(734): locateRegionInMeta attempt 0 of 3 failed; retrying after sleep of 5000 because: No server address listed in .META. for region test,,1267069516539
2010-02-24 19:45:34,656 INFO  [Thread-103] master.ServerManager$ServerMonitor(129): 1 region servers, 0 dead, average load 2.0
2010-02-24 19:45:34,920 INFO  [RegionManager.rootScanner] master.BaseScanner(163): RegionManager.rootScanner scanning meta region {server: 127.0.0.1:59090, regionname: -ROOT-,,0, startKey: <>}
2010-02-24 19:45:34,920 INFO  [RegionManager.metaScanner] master.BaseScanner(163): RegionManager.metaScanner scanning meta region {server: 127.0.0.1:59090, regionname: .META.,,1, startKey: <>}
2010-02-24 19:45:34,932 DEBUG [RegionManager.metaScanner] master.BaseScanner(572): Current assignment of test,,1267069516539 is not valid;  serverAddress=, startCode=0 unknown.
2010-02-24 19:45:34,934 DEBUG [RegionManager.metaScanner] master.BaseScanner(572): Current assignment of test,lqg,1267069516539 is not valid;  serverAddress=, startCode=0 unknown.
2010-02-24 19:45:34,940 INFO  [RegionManager.rootScanner] master.BaseScanner(242): RegionManager.rootScanner scan of 1 row(s) of meta region {server: 127.0.0.1:59090, regionname: -ROOT-,,0, startKey: <>} complete
2010-02-24 19:45:34,941 INFO  [RegionManager.metaScanner] master.BaseScanner(242): RegionManager.metaScanner scan of 3 row(s) of meta region {server: 127.0.0.1:59090, regionname: .META.,,1, startKey: <>} complete
2010-02-24 19:45:34,941 INFO  [RegionManager.metaScanner] master.MetaScanner(132): All 1 .META. region(s) scanned
2010-02-24 19:45:35,189 DEBUG [main] client.HConnectionManager$TableServers(734): locateRegionInMeta attempt 1 of 3 failed; retrying after sleep of 5000 because: No server address listed in .META. for region test,,1267069516539
2010-02-24 19:45:35,315 INFO  [IPC Server handler 4 on 60000] master.RegionManager(337): Assigning region test,,1267069516539 to localhost,59090,1267069494948
2010-02-24 19:45:35,315 INFO  [IPC Server handler 4 on 60000] master.RegionManager(337): Assigning region test,lqg,1267069516539 to localhost,59090,1267069494948
2010-02-24 19:45:35,316 INFO  [RegionServer:0] regionserver.HRegionServer(499): MSG_REGION_OPEN: test,,1267069516539
2010-02-24 19:45:35,317 INFO  [RegionServer:0] regionserver.HRegionServer(499): MSG_REGION_OPEN: test,lqg,1267069516539
{code}

2010-02-24 19:26:50,203 and 2010-02-24 19:26:50,209 are the new stuff.  See how subsequently the daughter rows added by the fixup are assigned and the verification scan -- the test is to prove scans can ride across splits -- completes and the unit tests fails.

I'm going to commit the patch unless objection.

This patch adds some load on .META.: two new Gets after every split.  Shouldn't be too bad.  Splits are rare enough.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840488#action_12840488 ] 

Andrew Purtell commented on HBASE-2244:
---------------------------------------

This has been committed, and I updated CHANGES.txt to reflect that.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244-v6.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838472#action_12838472 ] 

Jean-Daniel Cryans commented on HBASE-2244:
-------------------------------------------

Patch itself looks good. Could Kannan try to destroy it before you commit?

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2244:
-------------------------

    Attachment: 2244-v6.patch

@Kannan Sweet! Thanks for great review.  Attached patch has all your changes.  Your first observation is particularly egregious.  Thanks for pointing it out.  I'm running tests to make sure it still works as its supposed.... will commit if all pass.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244-v6.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840302#action_12840302 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

Stack: The changes look very good. Like all the helpful comments as well.

Some small comments on the patch:

Previously, the first time hasReferences() finds that there are no references
from a daughter region, it would delete the corresponding "info:splitA or B" row
from the parent. On subsequent calls, hasReferences would bail early because "split"
would be null.

Now, the deletion of the daughter column (info:splitA or B) in the parent row
happens in checkDaughter() (when it calls removeDaughterFromParent()). hasReferences()
will keep returning false for a daughter that no longer has references, and it seems
we'll unnecessarily be calling removeDaughterFromParent() on more than one
occasion. Delete of a non-existent column should be a no-op-- so this is probably
not a major issue, but it would be good to avoid introducing these wasteful deletes.

Minor/Cosmetic:
~~~~~~~~~~~~~~~

* HRegion.java:createDaughterDirInSplitDir()

Doesn't actually create the daughter dir, but only constructs a path
to the daughter dir, correct? Perhaps the function could be name
getSplitDirForDaughter() or something.

* typo: BaseScanner.java
 * the filesystem, then a daughters was not added __o__ .META. -- must have been

* typo: HRegion.java
      // __Crate__ a region instance and then move the splits into place under

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836426#action_12836426 ] 

stack commented on HBASE-2244:
------------------------------

The offlined parent is deleted only after daughters have let go of all references to the parent.  If you want a quick lesson on how references work just say.   So yes the offlined parent can stick around for a while.  It's the metascanner that does the parent removal. On the retried issue grep the region in the master logs to an idea of it's history.  What was happening during the time the client failed?

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836425#action_12836425 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

(continued)

Looking at the logs for the concerned RS, yes, at 8:49 due to some DFS errors the region server shut itself down. Basically this confirms that META's state wasn't the cause for the client errors. It would still be good to understand why parent row deletion takes time and fix any known inconsistencies in this area. But the original snippet I posted of .META. in and of itself doesn't seem to indicate a problem.

But the comment Stack made about: <<<Also, if already a daughter region of same name in the FS, we'll fail the split rather than overwrite as it seems we should do (only reason for a pre-existing daughter is a split failed mid-way). I can add a check of .META. If daughter not there, its for sure a failed split. Let me see if I can improve stuff in here in general for 0.20 as part of this patch.>>>> seems would be worth addressing.


> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2244:
-------------------------

    Status: Patch Available  (was: Open)

Marking patch available; review would be ok.  It ain't too big.  Mostly its comments.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-2244:
-----------------------------------------

    Summary: META gets inconsistent in a number of crash scenarios  (was: META get inconsistent in a number of crash scenarios)

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2244) META gets inconsistent in a number of crash scenarios

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840443#action_12840443 ] 

stack commented on HBASE-2244:
------------------------------

@Kannan Really appreciate the review.  What I committed actually does as you suggest up in "03/Mar/10 02:23 AM" (My extra testing unveiled the condition you found by reading code).  I did what old code did to avoid return in middle of a method.  On the incorrect comment.  Fixing now.  Thanks.

> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>         Attachments: 2244-v3.patch, 2244-v5.patch, 2244-v6.patch, 2244.patch
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region server deaths) etc., we notice that META can get inconsistent. This primarily happens for regions which are in the process of being split. Manually running add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
>                              1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
>                              1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.