You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2022/04/28 22:03:00 UTC
[jira] [Created] (HDDS-6667) Recon can crash if processing a container report after installing an OM snapshot
Ethan Rose created HDDS-6667:
--------------------------------
Summary: Recon can crash if processing a container report after installing an OM snapshot
Key: HDDS-6667
URL: https://issues.apache.org/jira/browse/HDDS-6667
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Recon
Affects Versions: 1.2.0, 1.1.0, 1.2.1
Reporter: Ethan Rose
Assignee: Ethan Rose
Attachments: hs_err_pid46101.log.txt
There are two threads that access Recon's RocksDB instance: One is doing updates based on the OM DB state (ContainerKeyMapperTask), the other is doing updates based on container reports (ReconContainerReportHandler). When ContainerKeyMapperTask is updating from a snapshot, it needs to account for keys that may have been deleted, however the snapshot alone does not provide this information, so it needs to clear out its existing container -> key mappings and rebuild them from scratch. It does this by calling ContainerDBServiceProvider#initNewContainerDB, which deletes the whole recon DB from the disk and creates a new one. This gives us the current problem:
1. ContainerKeyMapperTask#reprocess is called to do a snapshot based update from OM.
2. ContainerKeyMapperTask deletes and recreates the Recon DB.
3. Recon receives and processes a container report. When it needs to update the DB it may be using a stale handle from the old DB, or it may be trying to access the DB between it being deleted and created.
This scenario caused a RocksDB crash on Recon, shown in the attached crash dump.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org