You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2012/03/30 09:58:43 UTC

[jira] [Created] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

replaying entries of deleted ledgers would exhaust ledger cache.
----------------------------------------------------------------

                 Key: BOOKKEEPER-198
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
             Project: Bookkeeper
          Issue Type: Bug
            Reporter: Sijie Guo
            Assignee: Sijie Guo
             Fix For: 4.1.0


we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.

{code}
java.util.NoSuchElementException
        at java.util.LinkedList.getFirst(LinkedList.java:109)
        at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
        at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
{code}

this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Flavio Junqueira (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242192#comment-13242192 ] 

Flavio Junqueira commented on BOOKKEEPER-198:
---------------------------------------------

Thanks, Sijie. The patch looks good, but I'm now confused by a different thing. I couldn't find code to decrement pageCount in trunk. Shouldn't we decrement pageCount as we flush pages? Consequently, there should a line somewhere decrementing it, no? 

I'm mentioning this because it might be worth having a test, not necessarily for this jira, that confirms that our logic increments and decrements correctly. In particular, we should test that it grows and eventually becomes zero again, never going negative.
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch, BK-198.patch_v2
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243068#comment-13243068 ] 

Sijie Guo commented on BOOKKEEPER-198:
--------------------------------------

committed as r1307732. thanks Flavio for reviewing.
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch, BK-198.patch_v2
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242255#comment-13242255 ] 

Sijie Guo commented on BOOKKEEPER-198:
--------------------------------------

> If the pool size is supposed to be constant, then why do we have a page count?

actually, ledger cache doesn't preallocate the pages pool. it did that incrementally. so we need a page count.
when the size of pages reaches pageLimit, it becomes be constant.

newly requests to grab a new page just acts as borrowing existing clean pages in the pages pool.

> Why don't we return the page to the pool in the case of a failure?

actually the pool is a mapping between ledger id and pages. the failure happened after we borrowed an existing clean page from other ledgers and before we put it again to table. the original info in that page has been cleaned after we grabbed it, so we don't know where to return it back. this page becomes orphan, the only way is to drop it and decrement the pageCount, so a new page would be allocated to replace the orphan in future requests.
{quote}
168         LedgerEntryPage lep = grabCleanPage(ledger, pageEntry);
169         try {
170             // should update page before we put it into table
171             // otherwise we would put an empty page in it
172             updatePage(lep);
173             synchronized(this) {
174                 putIntoTable(pages, lep);
175             }
{quote}
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch, BK-198.patch_v2
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Flavio Junqueira (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242260#comment-13242260 ] 

Flavio Junqueira commented on BOOKKEEPER-198:
---------------------------------------------

+1, thanks for the clarifications, Sijie.
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch, BK-198.patch_v2
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243075#comment-13243075 ] 

Hudson commented on BOOKKEEPER-198:
-----------------------------------

Integrated in bookkeeper-trunk #437 (See [https://builds.apache.org/job/bookkeeper-trunk/437/])
    BOOKKEEPER-198: replaying entries of deleted ledgers would exhaust ledger cache. (sijie) (Revision 1307732)

     Result = SUCCESS
sijie : 
Files : 
* /zookeeper/bookkeeper/trunk/CHANGES.txt
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/LedgerCacheImpl.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/LedgerCacheTest.java

                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch, BK-198.patch_v2
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Flavio Junqueira (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242168#comment-13242168 ] 

Flavio Junqueira commented on BOOKKEEPER-198:
---------------------------------------------

It is mostly good, Sijie. I just think that pageCount has to be in a synchronized block when decremented here:

{noformat}
catch (IOException ie) {
+            // if we grab a clean page, but failed to update the page
+            // we are exhuasting the count of ledger entry pages.
+            // since this page will be never used, so we need to decrement
+            // page count of ledger cache.
+            lep.releasePage();
+            --pageCount;
+            throw ie; 
{noformat}

no?
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Flavio Junqueira (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242249#comment-13242249 ] 

Flavio Junqueira commented on BOOKKEEPER-198:
---------------------------------------------

bq. only the case we need to decrement pageCount is after removing it from pages pool, but new request failed to use it. it is orphan page, we need to decrement it.

Why don't we return the page to the pool in the case of a failure? If the pool size is supposed to be constant, then why do we have a page count?
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch, BK-198.patch_v2
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-198:
---------------------------------

    Attachment: BK-198.patch_v2

ah, ur right. It should do synchronization when decrementing page count.

attach a new patch to address this issue.
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch, BK-198.patch_v2
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242202#comment-13242202 ] 

Sijie Guo commented on BOOKKEEPER-198:
--------------------------------------

actually, we don't need to decrement the page count in successful case. when we need to grab a clean page, we find the available page from existing pages in ledger cache, if there exists a clean page in pages pool (HashMap<Long, HashMap<Long,LedgerEntryPage>>), we remove it from pages pool, make it as a zero page and available for new request to use. so the ledger page is just changing the ownership, we don't need to decrement pageCount.

only the case we need to decrement pageCount is after removing it from pages pool, but new request failed to use it. it is orphan page, we need to decrement it.

from this side, the pageCount would not go negative, since the max number of decrements would not be more than number pages existed in pages pool, and the number of pages in pages pool is pageLimit.
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch, BK-198.patch_v2
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-198) replaying entries of deleted ledgers would exhaust ledger cache.

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-198:
---------------------------------

    Attachment: BK-198.patch

quite simple a patch. it just returns the failed to be used clean page back to ledger cache.
                
> replaying entries of deleted ledgers would exhaust ledger cache.
> ----------------------------------------------------------------
>
>                 Key: BOOKKEEPER-198
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-198
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-198.patch
>
>
> we found that replaying entries of deleted ledgers would exhaust ledger cache. then ledger cache would no clean page to grab, it would throw following exception.
> {code}
> java.util.NoSuchElementException
>         at java.util.LinkedList.getFirst(LinkedList.java:109)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.grabCleanPage(LedgerCacheImpl.java:454)
>         at org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:165)
> {code}
> this issue is because bookie grabs a clean page but fail to updating page due to NoLedgerException, but bookie doesn't return this clean page back to ledger cache. so the ledger cache is exhausted, when new ledger want to grab a clean page, it failed to find available page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira