You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Flavio Junqueira (JIRA)" <ji...@apache.org> on 2012/10/26 11:51:12 UTC

[jira] [Commented] (BOOKKEEPER-443) Revisit Ledger Deletion in BookKeeper

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484820#comment-13484820 ] 

Flavio Junqueira commented on BOOKKEEPER-443:
---------------------------------------------

This is a comment to reflect an offline discussion with Ivan, Sijie, and Jiannan. 

If a client deletes a ledger, but the ledger writer is still writing to it, then the bookies in the ledger ensemble should accept the requests to write until the ledger is garbage collected in each of the bookies. I haven't checked, but I assume that if a ledger is garbage-collected and the writer tries to write to a bookie that doesn't have it any longer, then the add request will fail. In the case the ledger is not garbage collected before the writer closes it, the writer will get an error when it tries to change the metadata to mark the ledger closed. In both cases, at least one operation returns an error

One concern is that this could lead to data loss, since the writer adds entries that are confirmed but never really accessible. Although true, we could see it as responsibility of the application to make sure that when it asks BK to delete a ledger, the application logic is doing the right thing. 

If we introduce a mechanism to return an error when trying to delete a ledger that is not closed, then we will be asking the application to introduce some mechanism to coordinate clients when deleting ledgers. My perception is that most applications will have such a coordination mechanism naturally, like Hedwig does with consumed messages, but in the case the application does not require it (e.g., it finds out that the data in some ledger is not useful any longer and decides to get rid of it independent of concurrent writes), we will be forcing the application to add it. 

My preference is to either leave as is or introduce a mechanism to mark ledger metadata for deletion instead of actually deleting the metadata. The one advantage of this last proposal is that the writer will be able to cleanly close the ledger, independent of concurrent delete requests. I don't see why the operation of a writer should be affected by concurrent delete ledgers, although one obvious drawback is that the writer will keep writing unnecessarily.
                
> Revisit Ledger Deletion in BookKeeper
> -------------------------------------
>
>                 Key: BOOKKEEPER-443
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-443
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-server
>            Reporter: Jiannan Wang
>             Fix For: 4.2.0
>
>
> Currently, we don't look at ledger metadata when delete ledgers. So when a client is opening/writing a ledger and other client delete it, the behavior is undefined.
> So we would suggest if a client is writing/recovery a ledger, we should not delete it. We can change the behavior of ledger deletion to perform a conditional remove only when the state of ledger is CLOSED.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (BOOKKEEPER-443) Revisit Ledger Deletion in BookKeeper

Posted by Ivan Kelly <iv...@apache.org>.
> I haven't checked, but I assume that if a ledger is
> garbage-collected and the writer tries to write to a bookie that
> doesn't have it any longer, then the add request will fail.
The request will not fail in this case. It'll behave the same as the
first write on the ledger.

> My preference is to either leave as is or introduce a mechanism to
> mark ledger metadata for deletion instead of actually deleting the
> metadata. The one advantage of this last proposal is that the writer
> will be able to cleanly close the ledger, independent of concurrent
> delete requests. I don't see why the operation of a writer should be
> affected by concurrent delete ledgers, although one obvious drawback
> is that the writer will keep writing unnecessarily. 
I don't like the idea of marking for deletion, because it changes what
deletion actually means. Under what circumstances would the metadata
be needed at a later stage? If anyone actually uses the metadata,
doesn't this mean that deletion isn't actually deletion.

My preference for this would be that, if we find a ledger is open, we
fence and then delete. This will notify the writer that something is
wrong, and pass the error up to the application, which should be able
to handle it.