You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/01 21:55:00 UTC

[GitHub] [iceberg] rdblue opened a new pull request, #5178: API: Add generic FileIO JSON serialization

rdblue opened a new pull request, #5178:
URL: https://github.com/apache/iceberg/pull/5178

   This adds a JSON parser to serialize FileIO instances to and from JSON. The format is an object with an `io-impl` and configuration `properties`. Deserialization uses `CatalogUtil` to re-create an instance from an environment Hadoop Configuration (optional), the IO impl class, and the configuration used to create the original `FileIO`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5178: API: Add generic FileIO JSON serialization

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5178:
URL: https://github.com/apache/iceberg/pull/5178#discussion_r912268953


##########
core/src/main/java/org/apache/iceberg/io/FileIOParser.java:
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.JsonNode;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.iceberg.CatalogUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.JsonUtil;
+
+public class FileIOParser {
+  private FileIOParser() {
+  }
+
+  private static final String FILE_IO_IMPL = "io-impl";

Review Comment:
   I guess that isn't really necessary since this is just the internal JSON representation.  Either way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks merged pull request #5178: API: Add generic FileIO JSON serialization

Posted by GitBox <gi...@apache.org>.
danielcweeks merged PR #5178:
URL: https://github.com/apache/iceberg/pull/5178


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5178: API: Add generic FileIO JSON serialization

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5178:
URL: https://github.com/apache/iceberg/pull/5178#discussion_r912274568


##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java:
##########
@@ -79,6 +79,7 @@ public class S3FileIO implements FileIO, SupportsBulkOperations, SupportsPrefixO
   private String credential = null;
   private SerializableSupplier<S3Client> s3;
   private AwsProperties awsProperties;
+  private Map<String, String> properties = null;

Review Comment:
   Ok, I think I understand why you're doing this now, which is basically that we need to re-initialize an S3/GCS FileIO (as opposed to the java/kryo serialization, which captures the aws/gcp properties for deserialization).  However, this pulls in far more than we would want to include.
   
   We could have the `properties()` return the aws properties configuration:
   
   ```java
     @Override
     public Map<String, String> properties() {
       return awsProperties.toMap();
     }
   ```
   
   This is a little painful for S3FileIO due to the number of properties, but I feel like we don't want to send all of the config properties (including sensitive properties) everywhere with the FileIO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5178: API: Add generic FileIO JSON serialization

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5178:
URL: https://github.com/apache/iceberg/pull/5178#discussion_r912281414


##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java:
##########
@@ -79,6 +79,7 @@ public class S3FileIO implements FileIO, SupportsBulkOperations, SupportsPrefixO
   private String credential = null;
   private SerializableSupplier<S3Client> s3;
   private AwsProperties awsProperties;
+  private Map<String, String> properties = null;

Review Comment:
   I guess this would be necessary for any custom aws client factories as well, since they may need properties not enumerated.  
   
   I guess this is probably the best we can do for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5178: API: Add generic FileIO JSON serialization

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5178:
URL: https://github.com/apache/iceberg/pull/5178#discussion_r912270804


##########
core/src/main/java/org/apache/iceberg/util/JsonUtil.java:
##########
@@ -50,6 +52,55 @@ public static ObjectMapper mapper() {
     return MAPPER;
   }
 
+  @FunctionalInterface
+  public interface ToJson {
+    void generate(JsonGenerator gen) throws IOException;
+  }
+
+  /**
+   * Helper for writing JSON with a JsonGenerator.
+   *
+   * @param toJson a function to produce JSON using a JsonGenerator
+   * @param pretty whether to pretty-print JSON for readability
+   * @return a JSON string produced from the generator
+   */
+  public static String generate(ToJson toJson, boolean pretty) {
+    try {
+      StringWriter writer = new StringWriter();
+      JsonGenerator generator = JsonUtil.factory().createGenerator(writer);

Review Comment:
   The JsonGenerator is closable, so we should probably be using TWR or closing with finally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5178: API: Add generic FileIO JSON serialization

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5178:
URL: https://github.com/apache/iceberg/pull/5178#discussion_r912512939


##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java:
##########
@@ -79,6 +79,7 @@ public class S3FileIO implements FileIO, SupportsBulkOperations, SupportsPrefixO
   private String credential = null;
   private SerializableSupplier<S3Client> s3;
   private AwsProperties awsProperties;
+  private Map<String, String> properties = null;

Review Comment:
   Yeah, I debated this for quite a while, too. I think this is the cleanest way to do it for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5178: API: Add generic FileIO JSON serialization

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5178:
URL: https://github.com/apache/iceberg/pull/5178#discussion_r912267719


##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java:
##########
@@ -79,6 +79,7 @@ public class S3FileIO implements FileIO, SupportsBulkOperations, SupportsPrefixO
   private String credential = null;
   private SerializableSupplier<S3Client> s3;
   private AwsProperties awsProperties;
+  private Map<String, String> properties = null;

Review Comment:
   Originally we didn't want to include all properties as part of the serialized object (this would include all catalog properties), which is why AwsProperties is a slimmed down version.  Maybe we could trim this to just what the FileIO needs?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5178: API: Add generic FileIO JSON serialization

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5178:
URL: https://github.com/apache/iceberg/pull/5178#discussion_r912513006


##########
core/src/main/java/org/apache/iceberg/io/FileIOParser.java:
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.JsonNode;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.iceberg.CatalogUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.JsonUtil;
+
+public class FileIOParser {
+  private FileIOParser() {
+  }
+
+  private static final String FILE_IO_IMPL = "io-impl";

Review Comment:
   Yeah, I copied it since this is internal. If we make a change to this, we don't want it to affect `CatalogProperties`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org