You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/04 08:32:51 UTC

[GitHub] [iceberg] kbendick commented on a change in pull request #2675: Core: add key_metadata to ManifestFile spec

kbendick commented on a change in pull request #2675:
URL: https://github.com/apache/iceberg/pull/2675#discussion_r645363982



##########
File path: api/src/main/java/org/apache/iceberg/ManifestFile.java
##########
@@ -179,6 +181,13 @@ default boolean hasDeletedFiles() {
    */
   List<PartitionFieldSummary> partitions();
 
+  /**
+   * Returns metadata about how this manifest file is encrypted, or null if the file is stored in plain text.
+   */
+  default ByteBuffer keyMetadata() {
+    return null;
+  }

Review comment:
       Is there any way to avoid using `null` as the return argument for `default` methods?
   
   This particular case isn't too bad, as the `null` return value is documented and has semantic meaning. But I've noticed that we return `null` for a number of methods in base classes or default methods in interfaces, many times where it's expected that the child classes _will_ override the method, that could lead to bad results.
   
   So partially I wanted to bring that issue up as it seems to be a code style that is employed sometimes that is somewhat unsafe, and I've noticed it in a few PRs recently.
   
   [I recently fixed a bug](https://github.com/apache/iceberg/pull/2630/files#diff-1ae8e9490fe1a4b6da8842c1c313fa57e68a674e186bfeace1c763bee1381faaL72-L74) arising from a similar situation - in this case, `BaseMetastoreTableOperations#tableName`, which should have been `abstract` as all catalogs needed to implement it, that was subsequently not overridden by `NessieTableOperations` and would have lead to all of the issues one can imagine with unexpected nulls in a PR that was being merged at the time.
   

##########
File path: api/src/main/java/org/apache/iceberg/ManifestFile.java
##########
@@ -179,6 +181,13 @@ default boolean hasDeletedFiles() {
    */
   List<PartitionFieldSummary> partitions();
 
+  /**
+   * Returns metadata about how this manifest file is encrypted, or null if the file is stored in plain text.
+   */
+  default ByteBuffer keyMetadata() {
+    return null;
+  }

Review comment:
       Having looked through the PR further, I think the usage of `null` here is definitely justified. Especially as the utility methods in `ByteBuffers` handle null already so there's not too much extra null handling, and the physical type needs to be byte[] for Avro.
   
   Feel free to resolve this, but I've seen this in a number of PRs recently where the `null` return value was more just a stand in for `abstract` and I felt it would be good to draw attention to this in general as it seems to be creeping into the codebase here and there. However, this is not one of those cases. 🙂 

##########
File path: core/src/main/java/org/apache/iceberg/GenericManifestFile.java
##########
@@ -399,6 +415,7 @@ public String toString() {
         .add("deleted_data_files_count", deletedFilesCount)
         .add("deleted_rows_count", deletedRowsCount)
         .add("partitions", partitions)
+        .add("key_metadata", keyMetadata == null ? "null" : "(redacted)")

Review comment:
       Given we're adding a number of encryption related PRs, most of which have very sensitive data in them (encryption keys), would it make sense to make this into a utility function, such as `EncryptionUtils.toRedactedString(byte[] value)`?
   
   We could have a unified way of redacting, which could also reduce the null checks in the code.
   
   It would give people the option of possibly customizing the way in which they redact keys (say, in a fork or something or via an override) that redacts in the way that allows SREs to better assist customers - for example, I can imagine that there _might_ be utility in showing the first 2 or 4 bytes or something so as to be able to check that the key is definitively not the same.
   
   Not sure how much benefit people would have in custom redaction for stringifying, but throwing that out there as I'm curious to hear if there is any utility in not entirely redacting, from a usability / debugging standpoint.

##########
File path: core/src/main/java/org/apache/iceberg/GenericManifestFile.java
##########
@@ -399,6 +415,7 @@ public String toString() {
         .add("deleted_data_files_count", deletedFilesCount)
         .add("deleted_rows_count", deletedRowsCount)
         .add("partitions", partitions)
+        .add("key_metadata", keyMetadata == null ? "null" : "(redacted)")

Review comment:
       Given we're adding a number of encryption related PRs, most of which have very sensitive data in them (encryption keys), would it make sense to make this into a utility function, such as `EncryptionUtils.toRedactedString(byte[] value)`?
   
   We could have a unified way of redacting, which could also reduce the null checks in the code.
   
   It would give people the option of possibly customizing the way in which they redact keys (say, in a fork or something or via an override) that redacts in the way that allows SREs to better assist customers - for example, I can imagine that there _might_ be utility in showing the first 2 or 4 bytes or something so as to be able to check that the key is definitively not the same as another one (would defer to the experts on how insecure it is to allow logging any portion of the key or a hash of a portion at all).
   
   Not sure how much benefit people would have in custom redaction for stringifying (either for checking if the key metadata is definitively not the same or something else), but throwing that out there as I'm curious to hear if there is any utility in not entirely redacting, from a usability / debugging standpoint.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org