You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Ivan Kelly (JIRA)" <ji...@apache.org> on 2012/05/09 17:13:49 UTC

[jira] [Created] (BOOKKEEPER-247) Recording of under replication

Ivan Kelly created BOOKKEEPER-247:
-------------------------------------

             Summary: Recording of under replication
                 Key: BOOKKEEPER-247
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
             Project: Bookkeeper
          Issue Type: Sub-task
            Reporter: Ivan Kelly




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406641#comment-13406641 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Hi Ivan,

Here is one boundary case came across. When client written single entry and waiting, at this time if one BK goes down, then Ledger checker is not able to find that as underReplicated fragment.

I think it should detect that as under replicated, then I can wait in PendingReplicationWorker for grace period and fence the ledger. If it is not able to detect as underReplicated, we can not know whether really there is no fragments underReplicated or some one else already replicated them.

Here is test to reproduce:

{code}
/**
     * Tests that LedgerChecker should one fragment as underReplicated
     * if there is an open ledger with single entry written.
     */
    @Test(timeout = 3000)
    public void testShouldGetOneFragmentWithSingleEntryOpenedLedger() throws Exception {
        LedgerHandle lh = bkc.createLedger(3, 3, BookKeeper.DigestType.CRC32,
                TEST_LEDGER_PASSWORD);
        lh.addEntry(TEST_LEDGER_ENTRY_DATA);
        ArrayList<InetSocketAddress> firstEnsemble = lh.getLedgerMetadata()
                .getEnsembles().get(0L);
        InetSocketAddress lastBookieFromEnsemble = firstEnsemble.get(0);
        LOG.info("Killing " + lastBookieFromEnsemble + " from ensemble="
                + firstEnsemble);
        killBookie(lastBookieFromEnsemble);

        startNewBookie();
        
        //Open ledger separately for Ledger checker.
        LedgerHandle lh1 =bkc.openLedgerNoRecovery(lh.getId(), BookKeeper.DigestType.CRC32,
                TEST_LEDGER_PASSWORD);
        
        Set<LedgerFragment> result = getUnderReplicatedFragments(lh1);
        assertNotNull("Result shouldn't be null", result);
        assertEquals("There should be 1 fragment. But returned fragments are "
                + result, 1, result.size());
    }

   private Set<LedgerFragment> getUnderReplicatedFragments(LedgerHandle lh)
            throws InterruptedException {
        LedgerChecker checker = new LedgerChecker(bkc);
        CheckerCallback cb = new CheckerCallback();
        checker.checkLedger(lh, cb);
        Set<LedgerFragment> result = cb.waitAndGetResult();
        return result;
    }
{code}

I think the problem is, when ledger is not closed then getLastConfirmed may not give real last entry. we will get one lesser than real last entry confirmed. If the ledger is closed, then only we can get real last entry. In this case also, it has written only one entry and it was in open state. so, it may get last confirmed is nothing. Finally it is not detecting ledger any fragments from the ledger as underReplicated.

If I write one more entry extra, then it can detect as underReplicated.

Thanks
Uma
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287552#comment-13287552 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

I've put a first pass at the detection algorithm up onto github[1][2].

It doesn't make any assumption about where it'll be run from, as it runs through the standard bookkeeper client code. The entry point is LedgerChecker#checkLedger which you pass a LedgerHandle and a callback. On completion, the callback is given a set of LedgerFragmentReplicas, which are the fragments which are underrepliced.

[1] https://github.com/ivankelly/bookkeeper/tree/BOOKKEEPER-247
[2] https://github.com/ivankelly/bookkeeper/commit/5b2d079b8792f7bdb63f4f9ae7d78cede85c58b7
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402960#comment-13402960 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------


I think we have to handle one special case in LedgerChecker.

Take a case, creating the ledger with ensemble 3 and quorum is 2.

Add a first entry:
 Now ensemble should look like '0 A B C'
Entry should have added to A, B.  Now kill the bookie C.

Add one more entry. Now Writer will get the exception when writing to C and will lead to ensemble updation.
Now new ensemble should look like '1 A B D'


Writer can continue with this ensemble util there is no failure again.

Now if you run the ledger checker on this Ledger, It will consider '0 A B C' as UnderReplicated Fragment. But here first entry already met the quorum, we need not reoplicate any entries.

I think we should skip such cases here.

Some grepped logs related to this issue:

{noformat}

First entry write:

2012-06-28 14:23:46,797 - INFO  - [main:BookKeeperClusterTestCase@336] - New bookie on port 5002 has been created.
2012-06-28 14:23:46,970 - INFO  - [New I/O client worker #1-1:PerChannelBookieClient$1@146] - Successfully connected to bookie: /10.18.47.127:5000
2012-06-28 14:23:46,970 - INFO  - [New I/O client worker #1-2:PerChannelBookieClient$1@146] - Successfully connected to bookie: /10.18.47.127:5001
2012-06-28 14:23:47,064 - INFO  - [main:TestLedgerChecker@137] - Killing /10.18.47.127:5002 from ensemble=[/10.18.47.127:5000, /10.18.47.127:5001, /10.18.47.127:5002]
Ensembles after first entry : {0=[/10.18.47.127:5000, /10.18.47.127:5001, /10.18.47.127:5002]}
.......................
.......................


2012-06-28 14:23:47,549 - INFO  - [main:BookKeeperClusterTestCase@336] - New bookie on port 5003 has been created.


Second erntry write:


First entry write:

2012-06-28 14:23:46,797 - INFO  - [main:BookKeeperClusterTestCase@336] - New bookie on port 5002 has been created.
2012-06-28 14:23:46,970 - INFO  - [New I/O client worker #1-1:PerChannelBookieClient$1@146] - Successfully connected to bookie: /XX.XX.XX.127:5000
2012-06-28 14:23:46,970 - INFO  - [New I/O client worker #1-2:PerChannelBookieClient$1@146] - Successfully connected to bookie: /XX.XX.XX.127:5001
2012-06-28 14:23:47,064 - INFO  - [main:TestLedgerChecker@137] - Killing /XX.XX.XX.127:5002 from ensemble=[/XX.XX.XX.127:5000, /XX.XX.XX.127:5001, /XX.XX.XX.127:5002]
Ensembles after first entry : {0=[/XX.XX.XX.127:5000, /XX.XX.XX.127:5001, /XX.XX.XX.127:5002]}
.......................
.......................


2012-06-28 14:23:47,549 - INFO  - [main:BookKeeperClusterTestCase@336] - New bookie on port 5003 has been created.


Second erntry write:

{noformat}
2012-06-28 14:23:48,537 - ERROR - [New I/O client boss #1:PerChannelBookieClient$1@151] - Could not connect to bookie: /XX.XX.XX.127:5002
2012-06-28 14:23:48,537 - WARN  - [New I/O client boss #1:PendingAddOp@146] - Write did not succeed: 3, 1
2012-06-28 14:23:48,584 - INFO  - [New I/O client worker #1-4:PerChannelBookieClient$1@146] - Successfully connected to bookie: /XX.XX.XX.127:5003
Ensembles after second entry : {0=[/XX.XX.XX.127:5000, /XX.XX.XX.127:5001, /XX.XX.XX.127:5002], 1=[/XX.XX.XX.127:5000, /XX.XX.XX.127:5001, /XX.XX.XX.127:5003]}
2012-06-28 14:23:48,631 - ERROR - [pool-4-thread-1:PerChannelBookieClient@618] - Unexpected read response received from bookie: /XX.XX.XX.127:5000 for ledger: 3, entry: 0 , ignoring
2012-06-28 14:23:49,633 - ERROR - [New I/O client boss #1:PerChannelBookieClient$1@151] - Could not connect to bookie: /XX.XX.XX.127:5002
2012-06-28 14:23:49,633 - INFO  - [main:TestLedgerChecker@160] - unreplicated fragment: Fragment(LedgerID: 3, FirstEntryID: 1[2], LastEntryID: 1[0], Host: /XX.XX.XX.127:5000)
2012-06-28 14:23:49,633 - INFO  - [main:TestLedgerChecker@160] - unreplicated fragment: Fragment(LedgerID: 3, FirstEntryID: 0[1], LastEntryID: 0[-1], Host: /XX.XX.XX.127:5002)
2012-06-28 14:23:49,633 - INFO  - [main:BookKeeperClusterTestCase@92] - TearDown{noformat}



                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-247:
----------------------------------

    Attachment: BOOKKEEPER-247.diff
    
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404894#comment-13404894 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Hi Ivan, Is it possible for you to mark the last fragment in ledger from LedgerChecker?
So, that i can use this information in ReplicationWorker for fencing the current writer.

This is the point what we have discussed in mailing list recently.
Also we have seen that case rarely 2 or 3 times in our testing till now.
So, I will try to fence if last fragment is underReplicated fragment.

Still I am thinking about introduce some delay(say 30sec, can be configarable) for this kind of ledgers and if still ledger is in that situation then I will fence.
Here hope is that, within that delay period, client may write the entries and may reform the ensemble with good bookies, then that fragment would not be last fragment in ensemble.

                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uma Maheswara Rao G updated BOOKKEEPER-247:
-------------------------------------------

    Attachment: BOOKKEEPER-247.patch

Ivan, I have generated a patch with your GitHub code and attached patch with Test, that will fail in the above explained scenario.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uma Maheswara Rao G updated BOOKKEEPER-247:
-------------------------------------------

    Attachment: BOOKKEEPER-247.patch
    
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-247:
----------------------------------

    Attachment: BOOKKEEPER-247.diff
    
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286451#comment-13286451 ] 

Rakesh R commented on BOOKKEEPER-247:
-------------------------------------

@Ivan, presently I'm working on the BOOKKEEPER-272, as it is the beginning. I'have done the initial draft version and likely will upload the patch to BOOKKEEPER-272. I think we would be able to do the regression on this. 

Please take a look on the comments and attached docs on umbrella JIRA and would like to know your suggestions.

Thanks,
Rakesh
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396613#comment-13396613 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Hi Ivan, Are you planning to make LedgerChecker as a patch?
BOOKKEEPER-299 is using LedgerChecker for finding the missed fragments.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407287#comment-13407287 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

Fixed the empty ledger case.

https://github.com/ivankelly/bookkeeper/commit/b85c94930a325b1cc275fee26d7d62e9e0cdc778

Regarding the namenode, even without this fix, you dont need to worry about that, as the namenode always writes a START_SEGMENT entry after starting a new log segment.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (BOOKKEEPER-247) Detection of under replication

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R reassigned BOOKKEEPER-247:
-----------------------------------

    Assignee: Rakesh R
    
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406918#comment-13406918 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Current writer only can get the lastConfirmed entry correctly. 
Here LedgerChecker will not be a Writer in any case, considering one entry extra will the option to go i feel.

{code}
 long lastAddConfirmed = lh
                        .getLastAddConfirmed();
                if(lh.metadata.isClosed() == false){
                    lastAddConfirmed++;
                }
                fragments.add(new LedgerFragment(lh.getId(), curEntryId, lastAddConfirmed, i, curEnsemble, lh
                        .getDistributionSchedule()));
{code}

with this, above given test passed. Infact all other tests also should modify to use different LedgerHandle. otherwise this will make the test failures , because writer will anyway get lastConfirmed correctly.


                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436658#comment-13436658 ] 

Hudson commented on BOOKKEEPER-247:
-----------------------------------

Integrated in bookkeeper-trunk #651 (See [https://builds.apache.org/job/bookkeeper-trunk/651/])
    BOOKKEEPER-247: Detection of under replication (ivank) (Revision 1374195)

     Result = ABORTED
ivank : 
Files : 
* /zookeeper/bookkeeper/trunk/CHANGES.txt
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerFragment.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestLedgerChecker.java

                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0
>
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-247:
----------------------------------

    Attachment: BOOKKEEPER-247.diff

New patch is simply the old patch, rebased onto trunk and one findbug error fixed.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406472#comment-13406472 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Thanks Ivan, Make sense to me. I have filed a JIRA for grace period delay in replication for opned underReplicated(last fragment) ledgers. BK-325. I will handle along with that JIRA.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly reassigned BOOKKEEPER-247:
-------------------------------------

    Assignee: Ivan Kelly
    
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403152#comment-13403152 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

How about skipping this kind of fragments like below
 while finding fault index from ensemble start ID, it should have a check that should not cross end entry ID also. If it is not able to find the failed BK index with in this range then we can skip this fragment right?
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uma Maheswara Rao G updated BOOKKEEPER-247:
-------------------------------------------

    Attachment: BOOKKEEPER-247-1.patch

Attached the patch with above change and tests correction. Added one more test with this special case.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uma Maheswara Rao G updated BOOKKEEPER-247:
-------------------------------------------

    Component/s:     (was: bookkeeper-server)
                     (was: bookkeeper-client)
                 bookkeeper-auto-recovery
    
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0
>
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407395#comment-13407395 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Yep, you are right It should not be a problem. openEditLogForWrite()should do that on startingActiveServices.

On quick look change looks great. I like the idea to confirm the empty ledger.  :-)

Thanks a lot for the update.

small nit: testShouldGet3FragmentWithEmptyLedgerButBookiesDead  --> testShouldGet2FragmentsWithEmptyLedgerButBookiesDead ?
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406940#comment-13406940 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

One more case here, the above proposal may break that. For example if the ledger is really empty, it just created the ledger. then also, there will be one ensemble created i guess then, above incrementing will give unnecessary assumption that ledger freagment is in underReplicated.
Any alternatives?
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404723#comment-13404723 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Great :-)

Thanks a lot for addressing the issue.

I have some scenarios verified tests, I should have included them in my previous patch, but don't want to messup with the patch which explains the problem.
Now I have updated the patch with the below minor modifications and newly added tests.

- added javadoc for hasEntry methods.
- updated correct header comments in DistributedSchedule, RRDSchedule algorithm.
- Included some more tests which I have added recently.
- Made the inner classes of LedgerChecker as private static. 
- also updated the class level javadocs for LedgerChecker and LedgerFragment.

If you like the changes, you can push this to your git and make it ready for review with your added tests. And many of the scenarios works well for me.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293352#comment-13293352 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Thanks a lot Ivan,
 Currently we are planing to detect the replication based on Bookie failures. If one bookie shutdown or failed, then we will be able to detect with help of auditor. 
This code is checking each ledger for under-replication. If one bookie failed, then all ledgers in that bookie should be replicated.

Use of this ledger checking would be only when some disks failed in bookie and if still continue running the Bookie. That time, same bookie(partial disk failures Bookie) can use this class and find the underreplicated ledgers?

If that is the case, worth editing the JIRA title as Detect under replication of ledgers on bookie disk failures?





 
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396704#comment-13396704 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Yep, I missed BK-2. It was mentioned in BK-2's description already.
BTW, I have just taken a look on BK-292. Seems Sijie +1'd on it now. and tests looks great.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405001#comment-13405001 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

This may not be the clearest place to check this though. I think it would be clearer to check this at the point at which the replication worker is selecting the ledger to rereplicate. If it sees that the ledger is still open, it can wait for a grace period before running the checker and rereplicating any missing fragments.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396650#comment-13396650 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

@Uma, yes, but there's a chain of JIRAs which need to be submitted first though. BOOKKEEPER-292 needs to go in. Once that is in, I can start working on tests for BOOKKEEPER-2 (I already have the changes to the code itself). Once BOOKKEEPER-2 is in, this can go in, and then BOOKKEEPER-299. 

Perhaps you could review BOOKKEEPER-292 to get the ball rolling. Sijie had looked at it, but hadn't +1'd.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407223#comment-13407223 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Hi Ivan,
{quote}
Btw, if you guys want to add changes to the patch, could you clone the github branch, and make the mods there. It makes it easier to track what the changes between the patches are. 
{quote}
Yeah, I should have done that. Will do from next if any more changes on it.

{quote}
The empty ledger problem im not so sure of how to fix. It's not obvious how to tell the difference between. There's already a test case for this #testShouldNotGetAnyFragmentWithEmptyLedger
{quote}
The current fix should solve the single entry problem. But with empty ledger also we may get the entry as currentEntryId. This will make ledger checker mark that fragment as underReplicated. 
Anyway this fragment will be postponed by worker as lastFragment is in underReplication state. Once that pendingReplication timedout, this will get force fenced.

So, with PendingReplications logic, it turns out that, there won't be any ledger idle more than the pendingReplicationTime out period.

Only my worry is that, In Namenode case,

Just start the namenode and dont write any data. By this time, if any Bookie goes down from selected quoram. Then after 30sec, if user startes writing then it will fail with fenced exception because Replication worker already would have fenced. then this will make one switch unnecessarily.

On startup, keeping system idle for some time may be normal scenario as OM may start the process one another. To start all processes, it may take some time. Because of this idleness, it will cause one switch. Another argument is, since this is startup, one switch should be ok.

                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293639#comment-13293639 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

This also works. 
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (BOOKKEEPER-247) Detection of under replication

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R reassigned BOOKKEEPER-247:
-----------------------------------

    Assignee:     (was: Rakesh R)

@Ivan @Uma
Just to summarize:- As per my understanding, the Auditor will identify the suspected ledgers and publish to everyone. The re-replicator will take this under replicated ledgers and do the re-replication.

Then I feel we can use this JIRA for handling self-check on disk failures and do re-replication. Also, it would be good to modify the JIRA title accordingly. How does it sound?
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293689#comment-13293689 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

I think what is in the patch now works as a standalone patch. I'd prefer to keep the patches small and modular like this and do the work over many JIRAs. It makes it easier to get code through the review and testing process. I'll open new JIRAs for the other bits.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285791#comment-13285791 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

@rakesh, are you working on this checking now? if not i'd like to take a run at it.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-247:
----------------------------------

    Description: This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.
        Summary: Detection of under replication  (was: Recording of under replication)
    
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406914#comment-13406914 ] 

Rakesh R commented on BOOKKEEPER-247:
-------------------------------------

@Uma @Ivan

bq.Here is one boundary case came across. When client written single entry and waiting, at this time if one BK goes down, then Ledger checker is not able to find that as underReplicated fragment.

I've seen the LedgerRecoveryOp.java is doing the following logic to identify the lastAddConfirmed entry. Can we have similar stuff here also in the replication logic if the ledger is in open state.

{code}
    /**
     * Try to read past the last confirmed.
     */
    private void doRecoveryRead() {
        lh.lastAddConfirmed++;
        lh.asyncReadEntries(lh.lastAddConfirmed, lh.lastAddConfirmed, this, null);
    }
{code}


                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407182#comment-13407182 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

The first case (single entry in ledger) is actually straightforward to fix. The problem is that the fragment to check has the last entry set wrong, so it never actually checks. 
https://github.com/ivankelly/bookkeeper/commit/73b55018efdc451b356781fbe9b25a148878b308

The empty ledger problem im not so sure of how to fix. It's not obvious how to tell the difference between. There's already a test case for this #testShouldNotGetAnyFragmentWithEmptyLedger

Btw, if you guys want to add changes to the patch, could you clone the github branch, and make the mods there. It makes it easier to track what the changes between the patches are. 
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436250#comment-13436250 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Latest patch looks great to me. Majority of the issues we already found and fixed above. I don't see any issues now in the patch.

I am +1 to push this in.

Others, could you please add your comments, if we miss anything here.


Thanks,
Uma
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0
>
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435832#comment-13435832 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

@Uma,

I'll get to this today. Yesterday was a bank holiday here, which is why there was no movement.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403946#comment-13403946 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

The problem was quite simple in the end. The code that ensures I only checked the correct replicas, only worked if the number of entries in the ledger was greater than the number of bookies in the ensemble. Adding two checks fixed it. I also moved around the code a bit. I've pushed it to github and attached a new patch. I still need to add tests to this, so the patch shouldn't be considered ready for submission yet.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293634#comment-13293634 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

We are thinking about the sequence till now like below:

1.Bookie fails
2.Auditor puts list of affected ledgers in suspected/underreplicated ledgers znode
3.Replication worker will take one by one ledger from suspected ledgers znode and re-replicate it.
  If we are able reuse the BookKeeperAdmin code to re-replicate, then BookKeeperAdmin #recoverLedger already finding the fragments and replicating then and there. Am I missing some thing here?

Otherwise Recovery worker/Replication worker may need to watch two level of data. 1. suspected ledgers znode 2. underreplicated znode.


{quote}
 Also, i think bookies should run this detection on all their ledgers, every few hours, to detect disk issues
{quote}
I agree. I think work can be triggered on disk failures and will run hourly basis by default.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293592#comment-13293592 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

A bookie failure, is really the failure of a lot of ledger fragments. I think the direction of BOOKKEEPER-272 matches that. The sequence of events for a bookie failure is.

# Bookie fails
# Auditor puts list of affected ledgers in suspected ledgers znode
# Recovery worker takes a ledger from the list, and runs this detection on it. Puts underreplicated ledger fragments in underreplicated znode.
# Recovery worker takes an underreplicated ledger fragment, and rereplicates it.

Each bookie is running a recovery worker, so the work of detection and rereplication will be distributed, while the auditor for checking the bookies will be centralized. Also, i think bookies should run this detection on all their ledgers, every few hours, to detect disk issues.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Rakesh R
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435557#comment-13435557 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Ivan, as per the order of JIRAs to go in, I think now turn is for this JIRA.
Could you please generate a patch with all our latest changes discussed? It should be simple as you already added the changed code in github.
So, that we can continue our further reviews.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402964#comment-13402964 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

I think we have to handle one special case in LedgerChecker.

Take a case, creating the ledger with ensemble 3 and quorum is 2.

Add a first entry:
Now ensemble should look like '0 A B C'
Entry should have added to A, B. Now kill the bookie C.

Add one more entry. Now Writer will get the exception when writing to C and will lead to ensemble updation.
Now new ensemble should look like '1 A B D'

Writer can continue with this ensemble util there is no failure again.

Now if you run the ledger checker on this Ledger, It will consider '0 A B C' as UnderReplicated Fragment. But here first entry already met the quorum, we need not reoplicate any entries.

I think we should skip such cases here.

Some grepped logs related to this issue:

{noformat}

First entry write:

2012-06-28 14:23:46,797 - INFO  - [main:BookKeeperClusterTestCase@336] - New bookie on port 5002 has been created.
2012-06-28 14:23:46,970 - INFO  - [New I/O client worker #1-1:PerChannelBookieClient$1@146] - Successfully connected to bookie: /XX.XX.XX.127:5000
2012-06-28 14:23:46,970 - INFO  - [New I/O client worker #1-2:PerChannelBookieClient$1@146] - Successfully connected to bookie: /XX.XX.XX.127:5001
2012-06-28 14:23:47,064 - INFO  - [main:TestLedgerChecker@137] - Killing /XX.XX.XX.127:5002 from ensemble=[/XX.XX.XX.127:5000, /XX.XX.XX.127:5001, /XX.XX.XX.127:5002]
Ensembles after first entry : {0=[/XX.XX.XX.127:5000, /XX.XX.XX.127:5001, /XX.XX.XX.127:5002]}
.......................
.......................


2012-06-28 14:23:47,549 - INFO  - [main:BookKeeperClusterTestCase@336] - New bookie on port 5003 has been created.


Second erntry write:

2012-06-28 14:23:48,537 - ERROR - [New I/O client boss #1:PerChannelBookieClient$1@151] - Could not connect to bookie: /XX.XX.XX.127:5002
2012-06-28 14:23:48,537 - WARN  - [New I/O client boss #1:PendingAddOp@146] - Write did not succeed: 3, 1
2012-06-28 14:23:48,584 - INFO  - [New I/O client worker #1-4:PerChannelBookieClient$1@146] - Successfully connected to bookie: /XX.XX.XX.127:5003
Ensembles after second entry : {0=[/XX.XX.XX.127:5000, /XX.XX.XX.127:5001, /XX.XX.XX.127:5002], 1=[/XX.XX.XX.127:5000, /XX.XX.XX.127:5001, /XX.XX.XX.127:5003]}
2012-06-28 14:23:48,631 - ERROR - [pool-4-thread-1:PerChannelBookieClient@618] - Unexpected read response received from bookie: /XX.XX.XX.127:5000 for ledger: 3, entry: 0 , ignoring
2012-06-28 14:23:49,633 - ERROR - [New I/O client boss #1:PerChannelBookieClient$1@151] - Could not connect to bookie: /XX.XX.XX.127:5002
2012-06-28 14:23:49,633 - INFO  - [main:TestLedgerChecker@160] - unreplicated fragment: Fragment(LedgerID: 3, FirstEntryID: 1[2], LastEntryID: 1[0], Host: /XX.XX.XX.127:5000)
2012-06-28 14:23:49,633 - INFO  - [main:TestLedgerChecker@160] - unreplicated fragment: Fragment(LedgerID: 3, FirstEntryID: 0[1], LastEntryID: 0[-1], Host: /XX.XX.XX.127:5002)
2012-06-28 14:23:49,633 - INFO  - [main:BookKeeperClusterTestCase@92] - TearDown
{noformat}
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436043#comment-13436043 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Great, Thanks Ivan.
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403793#comment-13403793 ] 

Ivan Kelly commented on BOOKKEEPER-247:
---------------------------------------

This shouldn't happen, as I only check the replicas which should have the entry, for each entry. I'll check your test now. I probably missed something. Well spotted :)
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406950#comment-13406950 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-247:
------------------------------------------------

Even though we treat this empty ledger as underReplicated due to the above lastAddConfirmed++, we will post pone this ledger as last fragment is underReplicated and it is in open state. After pendingReplication timeout, this ledger will get force fenced.

That means, after adding the PendingReplicationWorker logic and ReplicationWorker logic, there won't be any ledger in open state more than PendingReplication grace period interval. Ofcource this is configurable. Is this behaviour fine with you all?
                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-247) Detection of under replication

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436611#comment-13436611 ] 

Rakesh R commented on BOOKKEEPER-247:
-------------------------------------

Thanks Ivan and the new Patch looks good.
+1 from me. As we discussed and covered all the known scenarios it would be fine to go.

                
> Detection of under replication
> ------------------------------
>
>                 Key: BOOKKEEPER-247
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-247
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-client, bookkeeper-server
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0
>
>         Attachments: BOOKKEEPER-247-1.patch, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.diff, BOOKKEEPER-247.patch, BOOKKEEPER-247.patch
>
>
> This JIRA discusses how the bookkeeper system will detect underreplication of ledger entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira