You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "UENISHI Kota (Jira)" <ji...@apache.org> on 2022/01/28 07:54:00 UTC

[jira] [Commented] (HDDS-5905) Race condition of deletion service and active object deletion

    [ https://issues.apache.org/jira/browse/HDDS-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483625#comment-17483625 ] 

UENISHI Kota commented on HDDS-5905:
------------------------------------

Bharat once gave me an advice [1] to use object IDs instead of
transaction index (and instead of timestamps), to address restart and
cluster upgrade to Ratis. But it has a drawback on object overwrite
and I came up with another design choice. They are:

1. Use object IDs as a key in the delete table
Pros: object IDs are consistently used in OM and easy to pick up in RocksDB batch.
Cons:
 - On objects being overwrite, object ID of the key is not updated, while previous blocks of the overwritten key are eligible for deletion (see HDDS-5461 and HDDS-5656).
 - Under this condition, there are a race where blocks gets lost and will never be
   collected. Example scenario is like:

key open  oid=1
key commit
key open (overwrite) oid=1’  #<= oid must be updated on overwrite, or
use update id
key delete oid=1
key commit
key delete oid=1’ (<= overwritten and previous block gets leaked)
deletion service deletes 1’

   This behavior should be changed as to assign new oid=2 on overwrite.
 - In addition to the need of this fix, blocks are deleted in the order of key open, not in the order of key deletion. It's better than alphabetical order, but not perfect.

2. Use update IDs as a key in the delete table
Pros: The design is cleaner and the order of block deletion will be correct.
Cons:
 - Currently, assignment of update IDs are somewhat fuzzy. In most places
   raw transaction index, in some places object ID is used as-is e.g. directory
   creation (See OMDirectoryCreateRequest.java).
 - A fix on the update ID assignment would be prefix them with epoch nubmer
   as well as object ID, but most part of setting update ID should be fixed.

I feel 1. is easier but a bit not correct, while 2 is more correct but
the required change is wide. I updated my proposal accordingly [2], so
please let me know your thoughts on which to choose. Also, my messy
working branch can be found here [3].

P.S. my fix on HDDS-5905 conflicts and depends on HDDS-5656, because
it's also about key deletion and overwrite. I want to get it reviewed
and merged beforehand. It's kinda leftover task from HDDS-5461 and
should be merged for 1.3.

[1] https://lists.apache.org/thread/79qgx598rv3qcojmzoxhc9ypkh1jj64y
[2] https://docs.google.com/document/d/1KeyhiE1i5SqRSgLy-pIOGW9X6mUYb8iYEkEoDAEQD9Q/edit#heading=h.nqxuhw78zsv7
[3] https://github.com/kuenishi/ozone/pull/1

> Race condition of deletion service and active object deletion
> -------------------------------------------------------------
>
>                 Key: HDDS-5905
>                 URL: https://issues.apache.org/jira/browse/HDDS-5905
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM
>    Affects Versions: 0.5.0, 1.0.0, 1.1.0, 1.2.0
>            Reporter: UENISHI Kota
>            Assignee: UENISHI Kota
>            Priority: Major
>
> Race condition of deletion service - the deletion service does delete blocks and later delete the entry in delete table without any locking. After the deletion service fetches the keys and before deleting them from the table, a user's concurrent  deletion of active key (and addition to the delete table) will be lost without deleting blocks.
> From Bharat's [Slack comment|https://the-asf.slack.com/archives/C5RK7PWA1/p1635135579007300?thread_ts=1635134167.007000&cid=C5RK7PWA1]:
> There seems to issue When same key is created/deleted we might miss deleting some blocks.  Scenario below.
> 1. lets say key deleted we add to delete table
> 2. the BG picks up and completed sending to SCM,
> 3. After SCM ack, it deletes from delete table using purgekey
> 4. Now there is a new key addeed/deleted between 2 and 3 we add to delete key table, we will delete newly added entries which are not deleted by purge key.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org