You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by "fapifta (via GitHub)" <gi...@apache.org> on 2023/02/23 17:14:20 UTC

[GitHub] [ozone] fapifta commented on pull request #4194: HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM

fapifta commented on PR #4194:
URL: https://github.com/apache/ozone/pull/4194#issuecomment-1442141428

You're welcome!
In the meantime one thought does bother me a bit, and would like to understand the intent for that...

In the SecretKeyManager and the SecretKeyState, I have some doubts around the mechanism that generates a new key and does the rotation... it is hard to tell what exactly, it is just a bad feeling so far, so let me think out loud here:
If I understand correctly, SecretKeyManager will decide it should rotate, once the current key is older than 1 day.
Once the rotation happens eventually after 1d, the SecretKeyManager#checkAndRotate() method creates a new list of secret keys, it filters the oldest expired key from the list, and SecretKeyState#updateKeys is called with the keys in a sorted order, where the newest key is the first item in the list.
This call to updateKeys is replicated to the other SCMs, so all SCMs eventually will have the new set of keys.

So far this is clean on its own, the confusion on my side comes after this. The way of implementation suggests that the second proposal the pull model will be used on the DNs.
How this will look like?
I guess if the DN runs into a token, that is newer then the newest key it has, it will pull the list of the keys from SCM. While OMs will pull periodically from SCM.
If my understanding is correct, then operations wise we are good, and probably my mind stuck with the push model and the pre-generated and distributed next key.

During the thought experiment around this, I started to feel that it might not be correct to have a SecretKeyState that gets its internal representation from outside, and also misses proper encapsulation... With the current code this might not be a problem, but here are the concerns:
1. the internal list is guarded by locking, but its reference is leaking via the getSortedKeys() method.
2. updateKeys can override the full internal state

1. is easy to fix, and probably it is just some automatism where usually I don't think about this concern either for the first sight
2. on the other hand there might be a more interesting question... Let's do some thought experiment:
Let's say we have one SCM (SCM1) that is stopped for an extended period of time, and its raft logs are so old that when it gets back in service it downloads a new snapshot, and since that snapshot there was no rotation happening. At this time SCM2 is the leader, but after SCM1 comes up and syncronizes its data and becomes a proper follower, SCM2 goes down before there was a rotation of secret keys, and during election SCM1 becomes the elected leader.

In this scenario will SCM1 have the secretKeys properly? As based on the code my assumption is that in this case SCM1 will have 1 secret key in its list generated at startup (let's assume every key that was persisted has expired already), with that it will issue tokens, and DNs will be able to validate those tokens, but once keys are rotated, the list of old keys will be overridden in all SCMs.

After this has happened, let's assume a DN goes down, and comes up due to a restart for any reason. In this case if there is a client that has a token issued by SCM2 back when it was the leader, was not be able to read data from the restarted DN, as that DN will not be able to fetch the formerly valid token from anywhere, while those DNs that have the token in memory will work fine further causing intermittent read failures.

Do I miss something, or this is a legit scenario with this approach of managing the secret keys?
I see this happening because the keys are managed in a file, not in rocksDB, and so when SCM1 downloads the snapshot of the data to replay logs from that snapshot, it might not replay the transaction that updates the file, and the file is not in the snapshot, but putting the secret keys to rocksDB with that to the snapshot will hurt us later when it comes to conforming with FISMA for example.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org