You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2020/07/30 11:13:20 UTC

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

gszadovszky commented on a change in pull request #808:
URL: https://github.com/apache/parquet-mr/pull/808#discussion_r462909320



##########
File path: parquet-column/src/main/java/org/apache/parquet/schema/ExtType.java
##########
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.schema;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * This class decorates the class 'Type' by adding a Map field 'metadata'.
+ *
+ * This decoration is needed to add metadata to each column without changing existing class 'MessageType', which is used
+ * extensively. Here is the example usage to add column metadata to schema with type of 'MessageType'.
+ *
+ * MessageType oldSchema = ...
+ * Map metadata = ...
+ * List newFields = new ArrayList();
+ * for (Type field = oldSchema.getFields()) {
+ *     Type newField = new ExtType(field);
+ *     newField.setMetadata(metadata);
+ *     newFields.add(newField);
+ * }
+ * MessageType newSchema = new MessageType(oldSchema.getName(), newFields);
+ *
+ * The implementation is mostly following decoration pattern. Most of the methods are just thin wrappers of existing
+ * implementation of PrimitiveType or GroupType.
+ */
+public class ExtType<T> extends Type {
+  private Type type;
+  private Map<String, T> metadata;
+
+  public ExtType(Type type) {
+    super(type.getName(), type.getRepetition(), type.getOriginalType(), type.getId());
+    this.type = type;
+  }
+
+  public ExtType(Type type, String name) {
+    super(name, type.getRepetition(), OriginalType.UINT_64, type.getId());
+    this.type = new PrimitiveType(type.getRepetition(), type.asPrimitiveType().getPrimitiveTypeName(), name);
+  }
+
+  public Type withId(int id) {

Review comment:
       You should use the annotation `@Override` for every method that is overriden.

##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/example/ExampleParquetWriter.java
##########
@@ -104,15 +105,19 @@ public Builder withExtraMetaData(Map<String, String> extraMetaData) {
       return this;
     }
 
+    public Builder withWriteSupport(WriteSupport writeSupport) {
+      this.writeSupport = writeSupport;
+      return this;
+    }
+

Review comment:
       Is it necessary to allow setting the `WriteSupport`? The concept of the `ParquetWriter` implementations is to hide all these stuff from the user so it can simply create a `ParquetWriter<Group> writer = ExampleParquetWriter.builder(...).with(...)` without dealing with the logic required for converting a `Group` object to writable primitives. Also, allowing to set a simple `WriteSupport` allows to set one that is not compatible with the `Group` type breaking the whole logic.

##########
File path: parquet-hadoop/src/test/java/org/apache/parquet/crypto/CryptoPropertiesFactoryTests/SchemaControlEncryptionTest.java
##########
@@ -0,0 +1,252 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.crypto.CryptoPropertiesFactoryTests;

Review comment:
       This package name break java naming conventions. It should not contain uppercase. I would suggest using e.g. `org.apache.parquet.crypto.propertiesfactory`

##########
File path: parquet-hadoop/src/test/java/org/apache/parquet/crypto/CryptoPropertiesFactoryTests/SchemaCryptoPropertiesFactory.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.crypto.CryptoPropertiesFactoryTests;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.crypto.ColumnEncryptionProperties;
+import org.apache.parquet.crypto.DecryptionKeyRetrieverMock;
+import org.apache.parquet.crypto.DecryptionPropertiesFactory;
+import org.apache.parquet.crypto.EncryptionPropertiesFactory;
+import org.apache.parquet.crypto.FileDecryptionProperties;
+import org.apache.parquet.crypto.FileEncryptionProperties;
+import org.apache.parquet.crypto.ParquetCipher;
+import org.apache.parquet.crypto.ParquetCryptoRuntimeException;
+import org.apache.parquet.hadoop.api.WriteSupport;
+import org.apache.parquet.hadoop.api.WriteSupport.WriteContext;
+import org.apache.parquet.hadoop.metadata.ColumnPath;
+import org.apache.parquet.schema.ExtType;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class SchemaCryptoPropertiesFactory implements EncryptionPropertiesFactory, DecryptionPropertiesFactory {
+
+  private static Logger log = LoggerFactory.getLogger(SchemaCryptoPropertiesFactory.class);
+
+  public static final String CONF_ENCRYPTION_ALGORITHM = "parquet.encryption.algorithm";
+  public static final String CONF_ENCRYPTION_FOOTER = "parquet.encrypt.footer";
+  private static final byte[] FOOTER_KEY = {0x01, 0x02, 0x03, 0x4, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a,
+    0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10};
+  private static final byte[] FOOTER_KEY_METADATA = "footkey".getBytes(Charset.defaultCharset());

Review comment:
       Using `defaultCharset()` would work just as if you would not set any. We usually set the charset to ensure that the result will always be the same on every environment (independently from the default charset). I would suggest using one of the constants of `StandardCharsets`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org