You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2012/03/01 06:12:09 UTC

[jira] [Created] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

bookie server doesn't quit when running out of disk space
---------------------------------------------------------

                 Key: BOOKKEEPER-180
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
             Project: Bookkeeper
          Issue Type: Bug
          Components: bookkeeper-server
            Reporter: Sijie Guo
             Fix For: 4.1.0


we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 

did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.

we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-180:
---------------------------------

    Attachment: BK-180.diff_v4

new patch is attached.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, BK-180.diff_v4, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225382#comment-13225382 ] 

Hudson commented on BOOKKEEPER-180:
-----------------------------------

Integrated in bookkeeper-trunk #394 (See [https://builds.apache.org/job/bookkeeper-trunk/394/])
    BOOKKEEPER-180: bookie server doesn't quit when running out of disk space (sijie via ivank) (Revision 1298492)

     Result = UNSTABLE
ivank : 
Files : 
* /zookeeper/bookkeeper/trunk/CHANGES.txt
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/ExitCode.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieServer.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/LedgerCacheTest.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/test/ConcurrentLedgerTest.java
* /zookeeper/bookkeeper/trunk/hedwig-server/src/test/java/org/apache/hedwig/server/persistence/BookKeeperTestBase.java

                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, BK-180.diff_v4, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222273#comment-13222273 ] 

Ivan Kelly commented on BOOKKEEPER-180:
---------------------------------------

Patch looks good. You need to run "mvn clean install" on toplevel directory though, there's a couple of compile errors due to InterruptedException no longer being thrown.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292739#comment-13292739 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-180:
------------------------------------------------

@Sijie, I remeber we were discussing about strategy at that time.

{quote}
So how about make it as a strategy, user could decide shutting down or turning to read-only when encountering a faulty bookie?

Sounds good to me.
Just adding config parameter for this option also should be ok. if we enable it, Bookie will turn automatically to read-only mode. If we don't enable it, it will sutdown by default. Admins also can start the bookie in read-only mode explicitly.
{quote}

I think when user don't want read only mode to enable, then shutting down is the other option.
You mean we will handle this conditions as part of BK-199?
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, BK-180.diff_v4, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Sijie Guo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292722#comment-13292722 ] 

Sijie Guo commented on BOOKKEEPER-180:
--------------------------------------

@Vinay

Actually we are discussing turning server into r-o mode when encountering IOException flushing ledgers. You could refer BOOKKEEPER-199 .
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, BK-180.diff_v4, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Sijie Guo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292745#comment-13292745 ] 

Sijie Guo commented on BOOKKEEPER-180:
--------------------------------------

@Uma

yes. turning r-o mode or shutting down is decided by user as discussed in BOOKKEEPER-199. 

flushing ledger is not only running in SyncThread but also when evicting ledger index files. BOOKKEEPER-199 is tried to cover all those IOException cases, so I prefer handling them in that jira.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, BK-180.diff_v4, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-180:
---------------------------------

    Attachment: BK-180.diff_v3

thanks for Ivan's reminder. attach a new patch addressed InterruptedException issue.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292836#comment-13292836 ] 

Rakesh R commented on BOOKKEEPER-180:
-------------------------------------

Yeah Sijie. since it is IOE, I agree to add this scenario as part of BOOKKEEPER-199.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, BK-180.diff_v4, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-180:
---------------------------------

    Attachment: conn3.png

attach a throughput graph. the throughput went down when a bookie ran out of disk space, while the throughput went up again after shutting down the ran-out-of-disk-space bookie server.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Vinay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292704#comment-13292704 ] 

Vinay commented on BOOKKEEPER-180:
----------------------------------

hi,
Here One more scenario needs to be handled. 
Adding new ledger and flushing is failed in SyncThread due to disk full. But Server did not shutdown here.

{noformat}2012-06-11 140014,696 - ERROR [SyncThreadInterleavedLedgerStorage@156] - Exception flushing Ledger
java.io.IOException No space left on device
	at sun.nio.ch.FileDispatcher.write0(Native Method)
	at sun.nio.ch.FileDispatcher.write(FileDispatcher.java39)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java69)
	at sun.nio.ch.IOUtil.write(IOUtil.java26)
	at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java198)
	at org.apache.bookkeeper.bookie.BufferedChannel.flush(BufferedChannel.java109)
	at org.apache.bookkeeper.bookie.EntryLogger.flush(EntryLogger.java280)
	at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.flush(InterleavedLedgerStorage.java154)
	at org.apache.bookkeeper.bookie.Bookie$SyncThread.run(Bookie.java200){noformat}
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, BK-180.diff_v4, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225147#comment-13225147 ] 

Ivan Kelly commented on BOOKKEEPER-180:
---------------------------------------

The new patch conflicts with BOOKKEEPER-160. The merge looks simple, but I'd prefer if you did it, as you know these changes better.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, BK-180.diff_v3, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-180:
---------------------------------

    Attachment: BK-180.diff_v2

attach a new patch trying to address ivan's comment.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, BK-180.diff_v2, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-180:
---------------------------------

    Attachment: BK-180.diff

attach a patch to shut down the bookie server if the bookie thread quits. 

and did some change to call System.exit in BookieServer when encountering issues (such as ZkExpire, bookie thread quit), since some monitoring tool would detect exit code to restart the bookie server process. if bookie server don't exit with non-zero code, it would be treated as a normal quit, monitor tool would not start it.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-180) bookie server doesn't quit when running out of disk space

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220826#comment-13220826 ] 

Sijie Guo commented on BOOKKEEPER-180:
--------------------------------------

> I dont understand the new code in Bookie#run. Shouldn't the Deathwatcher catch this problem?

currently Deathwatcher watching on running flag to know whether bookie is alive or not. If Bookie thread encountered exceptions such as IOException (due to no disk space left), the bookie thread quits but other threads are still alive and the running flag is not set to false. so new code is added to shut down other threads.
                
> bookie server doesn't quit when running out of disk space
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-180
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-180
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-180.diff, conn3.png
>
>
> we found that the publish throughput drops down when one bookie server ran out of disk space (due to we don't do log rotation   which exhausts disk space). 
> did some investigation, we found that bookie server doesn't quit when encountering no disk space issue. so hub server treat this bookie server as available. The adding requests would be sent to this bookie server, some adding requests are put in journal queue to flush, but the journal flush thread has quit due to no disk space. so these adding requests didn't respond to bookie client until it read timeout and chose other bookie servers.
> we did an experiment to shut down the ran-out-of-disk-space bookie, the publish throughput went up again quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira