You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/15 17:22:24 UTC

[GitHub] [iceberg] aokolnychyi opened a new pull request #2827: Core: Add WriterFactory and delete table properties

aokolnychyi opened a new pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827


   This WIP PR has the following contributions (may be split into separate changes):
   - Introduce table properties for configuring delete file formats. For example, users may want to use a smaller target file size for deletes, another compression codec/level (i.e. we should prefer faster reads with slightly worse compression as delete files are expected to be queried and rewritten often), row group config, etc.
   - Changes in `Parquet` and `Avro` classes to pick up either data or delete table properties through a context.
   - Builders in `Parquet` and `Avro` for creating `DataWriter`. Right now, the code is inconsistent as we can build delete file writers and an appender (not a data writer). 
   - A factory for writers that is supposed to replace `FileAppenderFactory`. Originally, `FileAppenderFactory` was used to construct appenders but we then added methods for constructing writers too. I think we better move that logic into a separate factory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r675372343



##########
File path: data/src/main/java/org/apache/iceberg/data/BaseWriterFactory.java
##########
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.data;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Map;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.deletes.EqualityDeleteWriter;
+import org.apache.iceberg.deletes.PositionDeleteWriter;
+import org.apache.iceberg.encryption.EncryptedOutputFile;
+import org.apache.iceberg.encryption.EncryptionKeyMetadata;
+import org.apache.iceberg.io.DataWriter;
+import org.apache.iceberg.io.OutputFile;
+import org.apache.iceberg.io.WriterFactory;
+import org.apache.iceberg.parquet.Parquet;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public abstract class BaseWriterFactory<T> implements WriterFactory<T> {
+  private final Map<String, String> properties;
+  private final Schema dataSchema;
+  private final int[] equalityFieldIds;
+  private final Schema equalityDeleteRowSchema;
+  private final Schema positionDeleteRowSchema;
+
+  protected BaseWriterFactory(Map<String, String> properties, Schema dataSchema,
+                              int[] equalityFieldIds, Schema equalityDeleteRowSchema,
+                              Schema positionDeleteRowSchema) {
+    this.properties = properties;
+    this.dataSchema = dataSchema;
+    this.equalityFieldIds = equalityFieldIds;
+    this.equalityDeleteRowSchema = equalityDeleteRowSchema;
+    this.positionDeleteRowSchema = positionDeleteRowSchema;
+  }
+
+  protected abstract void configureDataWrite(Avro.DataWriteBuilder builder);
+  protected abstract void configureEqualityDelete(Avro.DeleteWriteBuilder builder);
+  protected abstract void configurePositionDelete(Avro.DeleteWriteBuilder builder);
+
+  protected abstract void configureDataWrite(Parquet.DataWriteBuilder builder);
+  protected abstract void configureEqualityDelete(Parquet.DeleteWriteBuilder builder);
+  protected abstract void configurePositionDelete(Parquet.DeleteWriteBuilder builder);
+
+  protected Schema equalityDeleteRowSchema() {
+    return equalityDeleteRowSchema;
+  }
+
+  protected Schema positionDeleteRowSchema() {
+    return positionDeleteRowSchema;
+  }
+
+  @Override
+  public DataWriter<T> newDataWriter(EncryptedOutputFile file, FileFormat format,
+                                     PartitionSpec spec, StructLike partition) {
+    Preconditions.checkArgument(spec != null,
+        "Spec must not be null when creating a data writer");
+    Preconditions.checkArgument(spec.isUnpartitioned() || partition != null,
+        "Partition must not be null for partitioned writes");
+
+    OutputFile outputFile = file.encryptingOutputFile();
+    EncryptionKeyMetadata keyMetadata = file.keyMetadata();
+    MetricsConfig metricsConfig = MetricsConfig.fromProperties(properties);
+
+    try {
+      switch (format) {
+        case AVRO:
+          Avro.DataWriteBuilder avroBuilder = Avro.writeData(outputFile)
+              .setAll(properties)
+              .metricsConfig(metricsConfig)
+              .schema(dataSchema)
+              .spec(spec)
+              .partition(partition)
+              .keyMetadata(keyMetadata)
+              .overwrite();
+
+          configureDataWrite(avroBuilder);
+
+          return avroBuilder.buildDataWriter();
+
+        case PARQUET:
+          Parquet.DataWriteBuilder parquetBuilder = Parquet.writeData(outputFile)
+              .setAll(properties)
+              .metricsConfig(metricsConfig)
+              .schema(dataSchema)
+              .spec(spec)
+              .partition(partition)
+              .keyMetadata(keyMetadata)
+              .overwrite();
+
+          configureDataWrite(parquetBuilder);
+
+          return parquetBuilder.buildDataWriter();
+
+        default:

Review comment:
       I think the ORC data writer is needed. Currently, we did not implement the ORC delete writer, so for the equality delete writer and position delete writer,  we don't support ORC now. But for the append-only case, the ORC data writer is necessary. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670669132



##########
File path: core/src/main/java/org/apache/iceberg/avro/Avro.java
##########
@@ -190,8 +189,134 @@ private CodecFactory codec() {
       // add the Iceberg schema to keyValueMetadata
       meta("iceberg.schema", SchemaParser.toJson(schema));
 
+      Context context = createContextFunc.apply(config);
+      CodecFactory codec = context.codec();
+
       return new AvroFileAppender<>(
-          schema, AvroSchemaUtil.convert(schema, name), file, writerFunc, codec(), metadata, metricsConfig, overwrite);
+          schema, AvroSchemaUtil.convert(schema, name), file, writerFunc, codec, metadata, metricsConfig, overwrite);
+    }
+
+    private static class Context {
+      private final CodecFactory codec;
+
+      private Context(CodecFactory codec) {
+        this.codec = codec;
+      }
+
+      public static Context dataContext(Map<String, String> props) {
+        String codecAsString = props.getOrDefault(AVRO_COMPRESSION, AVRO_COMPRESSION_DEFAULT);
+        CodecFactory codec = toCodec(codecAsString);
+
+        return new Context(codec);
+      }
+
+      public static Context deleteContext(Map<String, String> props) {
+        // default delete config using data config
+        Context dataContext = dataContext(props);
+
+        String codecAsString = props.get(DELETE_AVRO_COMPRESSION);
+        CodecFactory codec = codecAsString != null ? toCodec(codecAsString) : dataContext.codec();
+
+        return new Context(codec);
+      }
+
+      private static CodecFactory toCodec(String codecAsString) {
+        try {
+          return CodecName.valueOf(codecAsString.toUpperCase(Locale.ENGLISH)).get();
+        } catch (IllegalArgumentException e) {
+          throw new IllegalArgumentException("Unsupported compression codec: " + codecAsString);
+        }
+      }
+
+      public CodecFactory codec() {
+        return codec;
+      }
+    }
+  }
+
+  public static DataWriteBuilder writeData(OutputFile file) {
+    return new DataWriteBuilder(file);
+  }
+
+  public static class DataWriteBuilder {

Review comment:
       This matches our logic for building delete writers in this class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670746257



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -57,25 +60,46 @@ private TableProperties() {
   public static final String MANIFEST_MERGE_ENABLED = "commit.manifest-merge.enabled";
   public static final boolean MANIFEST_MERGE_ENABLED_DEFAULT = true;
 
-  public static final String DEFAULT_FILE_FORMAT = "write.format.default";
+  private static final String DEFAULT_FILE_FORMAT_CONF = "format.default";
+
+  public static final String DEFAULT_FILE_FORMAT = WRITE_PREFIX + DEFAULT_FILE_FORMAT_CONF;
+  public static final String DELETE_DEFAULT_FILE_FORMAT = DELETE_PREFIX + DEFAULT_FILE_FORMAT_CONF;
   public static final String DEFAULT_FILE_FORMAT_DEFAULT = "parquet";
 

Review comment:
       The idea I am following here is to default values for delete format based on data format unless the user sets an explicit value. I think that makes sense for configs like the row group size. I am planning to default the delete file size to some percentage of the target data file.
   
   When it comes to the default delete file format, Avro seems a good candidate on paper as it won't have the extra memory pressure but I'd say the user should opt in in that behavior. We still plan to use min/max stats for filtering and other columnar file format benefits still may be helpful. Right now, I am inclined towards defaulting the delete file format to the default data file format (with the smaller target file size).
   
   I am open here, though. Just my current thinking.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670670307



##########
File path: core/src/main/java/org/apache/iceberg/avro/Avro.java
##########
@@ -261,17 +386,17 @@ public DeleteWriteBuilder rowSchema(org.apache.iceberg.Schema newRowSchema) {
       return this;
     }
 
-    public DeleteWriteBuilder withSpec(PartitionSpec newSpec) {
+    public DeleteWriteBuilder spec(PartitionSpec newSpec) {

Review comment:
       I think the naming in this builder is inconsistent as only some methods use `withPrefix`. We don't have that prefix in the appender builder too. I think it should be fine to rename now as the delta code path is evolving.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670672160



##########
File path: data/src/main/java/org/apache/iceberg/data/BaseWriterFactory.java
##########
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.data;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Map;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.deletes.EqualityDeleteWriter;
+import org.apache.iceberg.deletes.PositionDeleteWriter;
+import org.apache.iceberg.encryption.EncryptedOutputFile;
+import org.apache.iceberg.encryption.EncryptionKeyMetadata;
+import org.apache.iceberg.io.DataWriter;
+import org.apache.iceberg.io.OutputFile;
+import org.apache.iceberg.io.WriterFactory;
+import org.apache.iceberg.parquet.Parquet;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public abstract class BaseWriterFactory<T> implements WriterFactory<T> {

Review comment:
       I think it would be great to have a common writer factory instead of an almost identical copy in each query engine module.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670671561



##########
File path: core/src/main/java/org/apache/iceberg/io/WriterFactory.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.deletes.EqualityDeleteWriter;
+import org.apache.iceberg.deletes.PositionDeleteWriter;
+import org.apache.iceberg.encryption.EncryptedOutputFile;
+
+/**
+ * A factory for creating data and delete writers.
+ */
+public interface WriterFactory<T> {

Review comment:
       An important point to note is that we now accept not only a partition but also a spec. This is mostly needed for delete files as we must write delete files using the spec of the data files we delete from.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#issuecomment-880883244


   @openinx @rdblue @RussellSpitzer @yyanyy @jackye1995, could you take a look, please? I want to share this early to get feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r671385248



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -24,6 +24,9 @@
   private TableProperties() {
   }
 
+  private static final String WRITE_PREFIX = "write.";

Review comment:
       Well, you are right. I was debating this for a while myself. I was afraid we may miss a typo in the config name after the prefix but I think agree about using separate variables. We just need to make sure reviewers double check them. Having these private prefixes also seem a little bit weird.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670668529



##########
File path: core/src/main/java/org/apache/iceberg/avro/Avro.java
##########
@@ -190,8 +189,134 @@ private CodecFactory codec() {
       // add the Iceberg schema to keyValueMetadata
       meta("iceberg.schema", SchemaParser.toJson(schema));
 
+      Context context = createContextFunc.apply(config);
+      CodecFactory codec = context.codec();
+
       return new AvroFileAppender<>(
-          schema, AvroSchemaUtil.convert(schema, name), file, writerFunc, codec(), metadata, metricsConfig, overwrite);
+          schema, AvroSchemaUtil.convert(schema, name), file, writerFunc, codec, metadata, metricsConfig, overwrite);
+    }
+
+    private static class Context {

Review comment:
       `Parquet` class below probably has a better example but I implemented the same approach here for consistency.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#issuecomment-885789705


   Closing this in favor of smaller PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670670307



##########
File path: core/src/main/java/org/apache/iceberg/avro/Avro.java
##########
@@ -261,17 +386,17 @@ public DeleteWriteBuilder rowSchema(org.apache.iceberg.Schema newRowSchema) {
       return this;
     }
 
-    public DeleteWriteBuilder withSpec(PartitionSpec newSpec) {
+    public DeleteWriteBuilder spec(PartitionSpec newSpec) {

Review comment:
       I think naming in this builder is inconsistent as only some methods use `withPrefix`. We don't have that prefix in the appender builder too. I think it should be fine to rename now as the delta code path is evolving.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi closed pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi closed pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670981863



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -24,6 +24,9 @@
   private TableProperties() {
   }
 
+  private static final String WRITE_PREFIX = "write.";

Review comment:
       I haven't got time to read the writer factory changes, will do that during the weekend. 
   
   For the table properties, one thing immediately came to my mind was that I will no longer be able to search for a property based on its string. Maybe it's better to just continue to rewrite `write.` and `write.delete` for each property, and new people editing this file can simply follow the convention. In that way the class is much cleaner and more readable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670667923



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -24,6 +24,9 @@
   private TableProperties() {
   }
 
+  private static final String WRITE_PREFIX = "write.";

Review comment:
       Any other ideas are welcome.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2827: Core: Add WriterFactory and delete table properties

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2827:
URL: https://github.com/apache/iceberg/pull/2827#discussion_r670729077



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -57,25 +60,46 @@ private TableProperties() {
   public static final String MANIFEST_MERGE_ENABLED = "commit.manifest-merge.enabled";
   public static final boolean MANIFEST_MERGE_ENABLED_DEFAULT = true;
 
-  public static final String DEFAULT_FILE_FORMAT = "write.format.default";
+  private static final String DEFAULT_FILE_FORMAT_CONF = "format.default";
+
+  public static final String DEFAULT_FILE_FORMAT = WRITE_PREFIX + DEFAULT_FILE_FORMAT_CONF;
+  public static final String DELETE_DEFAULT_FILE_FORMAT = DELETE_PREFIX + DEFAULT_FILE_FORMAT_CONF;
   public static final String DEFAULT_FILE_FORMAT_DEFAULT = "parquet";
 

Review comment:
       missing `DELETE_DEFAULT_FILE_FORMAT_DEFAULT`.
   
   I am thinking if we should use avro as the default delete file format instead, because it's a row-based format, and it is likely going to be compacted and removed very soon, so it's not worth the effort for writing deletes in Parquet format. I haven't done any rigorous benchmarking on this, so this is just based on my feeling. Any thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org