You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Sadanand Shenoy (Jira)" <ji...@apache.org> on 2023/08/07 14:42:00 UTC

[jira] [Assigned] (HDDS-9126) [ozone-snapshot] Unordered deletion of snapshots corrupting OM

     [ https://issues.apache.org/jira/browse/HDDS-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sadanand Shenoy reassigned HDDS-9126:
-------------------------------------

    Assignee: Sadanand Shenoy

> [ozone-snapshot] Unordered deletion of snapshots corrupting OM
> --------------------------------------------------------------
>
>                 Key: HDDS-9126
>                 URL: https://issues.apache.org/jira/browse/HDDS-9126
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Manager
>            Reporter: Soumitra Sulav
>            Assignee: Sadanand Shenoy
>            Priority: Critical
>              Labels: ozone-snapshot, pull-request-available
>             Fix For: 1.4.0
>
>         Attachments: console.log, ozone-om-quasar-csvjze-1.log, ozone-om-quasar-csvjze-2.log, ozone-om-quasar-csvjze-3.log, ozone-scm-quasar-csvjze-1.log, ozone-scm-quasar-csvjze-2.log, ozone-scm-quasar-csvjze-3.log
>
>
> Test scenario :
> The test test_unordered_deletion is trying to delete snapshots in random order. And while doing so, we are hitting below exception with OM more often than not.
> Once the error is seen, the OM goes into an unhealthy state, and all the tests after this couldn't run.
> Snapshot is deleted :
> {code:java}
> 2023-08-06 06:33:27,113 INFO [OM StateMachine ApplyTransaction Thread - 0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotDeleteRequest: Deleted snapshot 'snap-ae5or' under path 'vol-w19gk/buck-f9sqw'
> {code}
> And soon after during copy
> {code:java}
> 2023-08-06 06:39:06,314|INFO|MainThread|machine.py:188 - run()||GUID=5210f279-e5c7-4ee9-b652-b49a6b0eb07a|RUNNING: /opt/cloudera/parcels/CDH/bin/ozone fs -cp ofs://ozone1/vol-w19gk/buck-f9sqw/.snapshot/snap-5qmtv/key_1691303390 ofs://ozone1/vol-w19gk/buck-f9sqw/
> {code}
> OM log stacktrace:
> {code:java}
> 2023-08-06 06:33:38,126 INFO [SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.RocksDatabase: Deleting sst file /000396.sst corresponding to column family keyTable from db: /var/lib/hadoop-ozone/om/data293349/db.snapshots/checkpointState/om.db-0ccb08e9-c5ab-45bb-a71e-8444a2142511
> 2023-08-06 06:33:38,127 INFO [SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils: Waited for 1 milliseconds for file /var/lib/hadoop-ozone/om/data293349/db.snapshots/checkpointState/om.db-0ccb08e9-c5ab-45bb-a71e-8444a2142511/000396.sst deletion.
> 2023-08-06 06:34:37,938 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: Loading snapshot. Table key: /vol-w19gk/buck-f9sqw/snap-ae5or
> 2023-08-06 06:34:37,938 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.helpers.OmKeyInfo: OmKeyInfo.getCodec ignorePipeline = true
> 2023-08-06 06:34:37,989 ERROR [SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService: Error during Snapshot sst filtering
> FILE_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Unable to load snapshot. Snapshot with table key '/vol-w19gk/buck-f9sqw/snap-ae5or' is no longer active
>     at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:205)
>     at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:151)
>     at org.apache.hadoop.ozone.om.SstFilteringService$SstFilteringTask.call(SstFilteringService.java:178)
>     at org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121)
>     at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> 2023-08-06 06:35:30,232 INFO [pool-8-thread-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Removing SST files: [000410, 000453, 000496, 000253, 000374, 000535, 000611, 000456, 000417, 000658, 000338, 000459, 000380, 000185, 000124, 000245, 000443, 000200, 000563, 000364, 000562, 000128, 000447, 000248, 000688, 000324, 000522, 000367, 000209, 000407, 000129, 000602, 000290, 000296, 000692, 000130, 000372, 000690, 000172, 000293, 000157, 000355, 000399, 000674, 000233, 000277, 000310, 000398, 000552, 000596, 000474, 000352, 000550, 000315, 000359, 000634, 000236, 000599, 000554, 000638, 000637, 000559, 000514, 000518, 000160, 000681, 000163, 000284, 000162, 000344, 000663, 000264, 000462, 000425, 000667, 000225, 000302, 000467, 000588, 000301, 000506, 000307, 000504, 000668, 000628, 000193, 000391, 000197] as part of SST file pruning.
> 2023-08-06 06:35:37,937 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: Loading snapshot. Table key: /vol-w19gk/buck-f9sqw/snap-ae5or
> 2023-08-06 06:35:37,937 ERROR [SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService: Error during Snapshot sst filtering
> FILE_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Unable to load snapshot. Snapshot with table key '/vol-w19gk/buck-f9sqw/snap-ae5or' is no longer active 
> {code}
> Other runs with the same/similar exception:
> https://quanta.infra.cloudera.com/#/runDetails?gtn=43909320
> https://quanta.infra.cloudera.com/#/runDetails?gtn=43909314



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org