You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2022/04/21 21:34:00 UTC

[jira] [Commented] (HDDS-5819) Intermittent failure in TestRootedOzoneFileSystem#testRenameToTrashEnabled

    [ https://issues.apache.org/jira/browse/HDDS-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526092#comment-17526092 ] 

Ethan Rose commented on HDDS-5819:
----------------------------------

It looks like we need to account for the possibility of the checkpointing operation happening while the test runs. Looks like what’s happening is:
1. The test creates <bucket>/.Trash/<username>/Current/key
2. The trash checkpoint interval goes off, so <bucket>/.Trash/<username>/Current/key is moved to <bucket>/.Trash/<username>/<checkpoint time>/key
3. The test checks that <bucket>/.Trash/<username>/Current/ and/or <bucket>/.Trash/<username>/Current/key exist, and fails.
We could try to change the trash checkpoint interval configs for that test only, but that requires a cluster restart. This may be a good option if we want to use a mini ozone cluster provider for each test instead of reusing the same cluster, but given the number of tests * parameterization I’m thinking that might be too many clusters at once. Another easier option could be to just check the checkpoints that may exist to find the key. Something like this:
{code}
trashKeyFound = ofs.exists(trashPath)
if !trashKeyFound:
    for checkpoint in ofs.listStatus(userTrashRoot):
        checkpointTrashPath = new Path(userTrash, dir.checkpoint.getName(), key)
        if ofs.exists(checkpointTrashPath):
            trashKeyFound = true
            break

Assert.assertTrue(trashKeyFound)
{code}
The time to live for a checkpoint is set to 3 seconds (fs.trash.interval, see TrashPolicyOzone#deleteCheckpoint). This means the test has 3 seconds between between calling moveToTrash on the key and finding which checkpoint it is currently in before it is gone forever. This should be plenty of time to find the key in one of the checkpoints.

> Intermittent failure in TestRootedOzoneFileSystem#testRenameToTrashEnabled
> --------------------------------------------------------------------------
>
>                 Key: HDDS-5819
>                 URL: https://issues.apache.org/jira/browse/HDDS-5819
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Assignee: Keyi Song
>            Priority: Major
>         Attachments: it-filesystem-hdds.zip
>
>
> TestRootedOzoneFileSystem reuses the same MiniOzoneCluster for all runs. A cascading series of failures was observed in this CI run: [https://github.com/apache/ozone/runs/3792440274]
> It looks like testRenameToTrashEnabled was the original failure that caused the others. Looking in the logs (attached in the zip file) there are numerous volume and bucket request errors that may have resulted in this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org