You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/11 00:44:34 UTC

[GitHub] [incubator-iceberg] rdblue opened a new pull request #913: Add v2 manifests (WIP)

rdblue opened a new pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913
 
 
   This is a work-in-progress because it includes the commits from #907. Once that's merged, I'll update.
   
   This separates the write path for manifests into v1 and v2 using the same approach as #907. It also adds sequence number to the v2 write path.
   
   This also adds more checks in v2 writers to validate when sequence numbers are added. When writing `ManifestFile` to a manifest list, this checks that the snapshot ID of the manifest matches the current write's snapshot ID.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r408531832
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/GenericManifestEntry.java
 ##########
 @@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.MoreObjects;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.avro.specific.SpecificData;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+class GenericManifestEntry implements ManifestEntry, IndexedRecord, SpecificData.SchemaConstructable {
+  private final org.apache.avro.Schema schema;
+  private final V1Metadata.IndexedDataFile fileWrapper;
+  private Status status = Status.EXISTING;
+  private Long snapshotId = null;
+  private Long sequenceNumber = null;
+  private DataFile file = null;
+
+  GenericManifestEntry(org.apache.avro.Schema schema) {
+    this.schema = schema;
+    this.fileWrapper = null; // do not use the file wrapper to read
+  }
+
+  GenericManifestEntry(Types.StructType partitionType) {
+    this.schema = AvroSchemaUtil.convert(V1Metadata.entrySchema(partitionType), "manifest_entry");
+    this.fileWrapper = new V1Metadata.IndexedDataFile(schema.getField("data_file").schema());
+  }
+
+  private GenericManifestEntry(GenericManifestEntry toCopy, boolean fullCopy) {
+    this.schema = toCopy.schema;
+    this.fileWrapper = new V1Metadata.IndexedDataFile(schema.getField("data_file").schema());
+    this.status = toCopy.status;
+    this.snapshotId = toCopy.snapshotId;
+    if (fullCopy) {
+      this.file = toCopy.file().copy();
+    } else {
+      this.file = toCopy.file().copyWithoutStats();
+    }
+  }
+
+  ManifestEntry wrapExisting(Long newSnapshotId, Long newSequenceNumber, DataFile newFile) {
+    this.status = Status.EXISTING;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = newSequenceNumber;
+    this.file = newFile;
+    return this;
+  }
+
+  ManifestEntry wrapAppend(Long newSnapshotId, DataFile newFile) {
+    this.status = Status.ADDED;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = null;
+    this.file = newFile;
+    return this;
+  }
+
+  ManifestEntry wrapDelete(Long newSnapshotId, DataFile newFile) {
+    this.status = Status.DELETED;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = null;
+    this.file = newFile;
+    return this;
+  }
+
+  /**
+   * @return the status of the file, whether EXISTING, ADDED, or DELETED
+   */
+  public Status status() {
+    return status;
+  }
+
+  /**
+   * @return id of the snapshot in which the file was added to the table
+   */
+  public Long snapshotId() {
+    return snapshotId;
+  }
+
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  /**
+   * @return a file
+   */
+  public DataFile file() {
+    return file;
+  }
+
+  public ManifestEntry copy() {
+    return new GenericManifestEntry(this, true /* full copy */);
+  }
+
+  public ManifestEntry copyWithoutStats() {
+    return new GenericManifestEntry(this, false /* drop stats */);
+  }
+
+  @Override
+  public void setSnapshotId(long newSnapshotId) {
+    this.snapshotId = newSnapshotId;
+  }
+
+  @Override
+  public void setSequenceNumber(long newSequenceNumber) {
+    this.sequenceNumber = newSequenceNumber;
+  }
+
+  @Override
+  public void put(int i, Object v) {
+    switch (i) {
+      case 0:
+        this.status = Status.values()[(Integer) v];
+        return;
+      case 1:
+        this.snapshotId = (Long) v;
+        return;
+      case 2:
+        this.sequenceNumber = (Long) v;
+        return;
+      case 3:
+        this.file = (DataFile) v;
+        return;
+      default:
+        // ignore the object, it must be from a newer version of the format
+    }
+  }
+
+  @Override
+  public Object get(int i) {
+    switch (i) {
+      case 0:
+        return status.id();
+      case 1:
+        return snapshotId;
+      case 2:
+        return sequenceNumber;
+      case 3:
+        if (fileWrapper == null || file instanceof GenericDataFile) {
+          return file;
+        } else {
+          return fileWrapper.wrap(file);
+        }
+      default:
+        throw new UnsupportedOperationException("Unknown field ordinal: " + i);
+    }
+  }
+
+  @Override
+  public org.apache.avro.Schema getSchema() {
+    return schema;
+  }
+
+  @Override
+  public String toString() {
+    return MoreObjects.toStringHelper(this)
+        .add("status", status)
+        .add("snapshot_id", snapshotId)
 
 Review comment:
   Added. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407961970
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/ManifestLists.java
 ##########
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import java.io.IOException;
+import java.util.List;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.io.OutputFile;
+
+class ManifestLists {
+  private ManifestLists() {
+  }
+
+  static List<ManifestFile> read(InputFile manifestList) {
+    try (CloseableIterable<ManifestFile> files = Avro.read(manifestList)
+        .rename("manifest_file", GenericManifestFile.class.getName())
+        .rename("partitions", GenericPartitionFieldSummary.class.getName())
+        .rename("r508", GenericPartitionFieldSummary.class.getName())
+        .classLoader(GenericManifestFile.class.getClassLoader())
+        .project(ManifestFile.schema())
+        .reuseContainers(false)
+        .build()) {
+
+      return Lists.newLinkedList(files);
+
+    } catch (IOException e) {
+      throw new RuntimeIOException(e, "Cannot read manifest list file: %s", manifestList.location());
+    }
+  }
+
+  static ManifestListWriter write(int formatVersion, OutputFile manifestListFile,
 
 Review comment:
   I don't see other places call this except a unit test. This may be misused, how about removing it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] openinx commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407348311
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java
 ##########
 @@ -30,22 +31,40 @@ static InheritableMetadata empty() {
   }
 
   static InheritableMetadata fromManifest(ManifestFile manifest) {
-    return new BaseInheritableMetadata(manifest.snapshotId());
+    if (manifest.snapshotId() != null) {
+      return new BaseInheritableMetadata(manifest.snapshotId(), manifest.sequenceNumber());
+    } else {
+      return NOOP;
 
 Review comment:
   I'm not sure in which case we will encounter the snapshot == null, mind to explain the case ? Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407973735
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/V2Metadata.java
 ##########
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+class V2Metadata {
+  private V2Metadata() {
+  }
+
+  // fields for v2 write schema for required metadata
+  static final Types.NestedField REQUIRED_SNAPSHOT_ID =
+      required(503, "added_snapshot_id", Types.LongType.get());
+  static final Types.NestedField REQUIRED_ADDED_FILES_COUNT =
+      required(504, "added_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_EXISTING_FILES_COUNT =
+      required(505, "existing_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_DELETED_FILES_COUNT =
+      required(506, "deleted_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_ADDED_ROWS_COUNT =
+      required(512, "added_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_EXISTING_ROWS_COUNT =
+      required(513, "existing_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_DELETED_ROWS_COUNT =
+      required(514, "deleted_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_SEQUENCE_NUMBER =
+      required(515, "sequence_number", Types.LongType.get());
+  static final Types.NestedField REQUIRED_MIN_SEQUENCE_NUMBER =
+      required(516, "min_sequence_number", Types.LongType.get());
+
+  static final Schema MANIFEST_LIST_SCHEMA = new Schema(
+      ManifestFile.PATH, ManifestFile.LENGTH, ManifestFile.SPEC_ID,
+      REQUIRED_SEQUENCE_NUMBER, REQUIRED_MIN_SEQUENCE_NUMBER, REQUIRED_SNAPSHOT_ID,
+      REQUIRED_ADDED_FILES_COUNT, REQUIRED_EXISTING_FILES_COUNT, REQUIRED_DELETED_FILES_COUNT,
+      REQUIRED_ADDED_ROWS_COUNT, REQUIRED_EXISTING_ROWS_COUNT, REQUIRED_DELETED_ROWS_COUNT,
+      ManifestFile.PARTITION_SUMMARIES);
+
+
+  /**
+   * A wrapper class to write any ManifestFile implementation to Avro using the v2 write schema.
+   *
+   * This is used to maintain compatibility with v2 by writing manifest list files with the old schema, instead of
+   * writing a sequence number into metadata files in v2 tables.
+   */
+  static class IndexedManifestFile implements ManifestFile, IndexedRecord {
+    private static final org.apache.avro.Schema AVRO_SCHEMA =
+        AvroSchemaUtil.convert(MANIFEST_LIST_SCHEMA, "manifest_file");
+
+    private final long snapshotId;
+    private final long sequenceNumber;
+    private ManifestFile wrapped = null;
+
+    IndexedManifestFile(long snapshotId, long sequenceNumber) {
+      this.snapshotId = snapshotId;
+      this.sequenceNumber = sequenceNumber;
+    }
+
+    public ManifestFile wrap(ManifestFile file) {
+      this.wrapped = file;
+      return this;
+    }
+
+    @Override
+    public org.apache.avro.Schema getSchema() {
+      return AVRO_SCHEMA;
+    }
+
+    @Override
+    public void put(int i, Object v) {
+      throw new UnsupportedOperationException("Cannot read using IndexedManifestFile");
+    }
+
+    @Override
+    public Object get(int pos) {
+      switch (pos) {
+        case 0:
+          return wrapped.path();
+        case 1:
+          return wrapped.length();
+        case 2:
+          return wrapped.partitionSpecId();
+        case 3:
+          if (wrapped.sequenceNumber() == ManifestWriter.UNASSIGNED_SEQ) {
+            Preconditions.checkState(snapshotId == wrapped.snapshotId(),
 
 Review comment:
   Does this mean that a manifest file with a new snapshot id must have a sequence number?  Maybe add a more detailed reason in the exception message for this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r408534877
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/ManifestLists.java
 ##########
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import java.io.IOException;
+import java.util.List;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.io.OutputFile;
+
+class ManifestLists {
+  private ManifestLists() {
+  }
+
+  static List<ManifestFile> read(InputFile manifestList) {
+    try (CloseableIterable<ManifestFile> files = Avro.read(manifestList)
+        .rename("manifest_file", GenericManifestFile.class.getName())
+        .rename("partitions", GenericPartitionFieldSummary.class.getName())
+        .rename("r508", GenericPartitionFieldSummary.class.getName())
+        .classLoader(GenericManifestFile.class.getClassLoader())
+        .project(ManifestFile.schema())
+        .reuseContainers(false)
+        .build()) {
+
+      return Lists.newLinkedList(files);
+
+    } catch (IOException e) {
+      throw new RuntimeIOException(e, "Cannot read manifest list file: %s", manifestList.location());
+    }
+  }
+
+  static ManifestListWriter write(int formatVersion, OutputFile manifestListFile,
 
 Review comment:
   Opened https://github.com/apache/incubator-iceberg/pull/921 to remove this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407585835
 
 

 ##########
 File path: api/src/main/java/org/apache/iceberg/ManifestFile.java
 ##########
 @@ -66,6 +77,16 @@ static Schema schema() {
    */
   int partitionSpecId();
 
+  /**
+   * @return the sequence number of the commit that added the manifest file
+   */
+  long sequenceNumber();
+
+  /**
+   * @return the lowest sequence number of any data file in the manifest
+   */
+  long minSequenceNumber();
 
 Review comment:
   This is additional information for job planning and metadata maintenance.
   
   This may change, but the initial idea is that if we have the min sequence number of a manifest, we can use that to filter other manifests for planning. For example, if I have a manifest with delete files that was written at sequence number 14, but only manifests with data files written at sequence number 16 and later, I can ignore the manifest with delete files.
   
   We can do something similar when maintaining metadata. If we detect that a delete file is older than all data files then we can remove it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407950567
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/ManifestFiles.java
 ##########
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.io.OutputFile;
+
+public class ManifestFiles {
+  private ManifestFiles() {
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   * <p>
+   * <em>Note:</em> Callers should use {@link ManifestFiles#read(ManifestFile, FileIO, Map)} to ensure
+   * the schema used by filters is the latest table schema. This should be used only when reading
+   * a manifest without filters.
+   *
+   * @param manifest a ManifestFile
+   * @param io a FileIO
+   * @return a manifest reader
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io) {
+    return read(manifest, io, null);
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   *
+   * @param manifest a {@link ManifestFile}
+   * @param io a {@link FileIO}
+   * @param specsById a Map from spec ID to partition spec
+   * @return a {@link ManifestReader}
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io, Map<Integer, PartitionSpec> specsById) {
+    InputFile file = io.newInputFile(manifest.path());
+    InheritableMetadata inheritableMetadata = InheritableMetadataFactory.fromManifest(manifest);
+    return new ManifestReader(file, specsById, inheritableMetadata);
+  }
+
+  /**
+   * Create a new {@link ManifestWriter}.
+   * <p>
+   * Manifests created by this writer have all entry snapshot IDs set to null.
+   * All entries will inherit the snapshot ID that will be assigned to the manifest on commit.
+   *
+   * @param spec {@link PartitionSpec} used to produce {@link DataFile} partition tuples
+   * @param outputFile the destination file location
+   * @return a manifest writer
+   */
+  public static ManifestWriter write(PartitionSpec spec, OutputFile outputFile) {
 
 Review comment:
   The document is duplicated with the one in ManifestWriter.write. We can remove this function definition and just call from there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r408271122
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/ManifestFiles.java
 ##########
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.io.OutputFile;
+
+public class ManifestFiles {
+  private ManifestFiles() {
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   * <p>
+   * <em>Note:</em> Callers should use {@link ManifestFiles#read(ManifestFile, FileIO, Map)} to ensure
+   * the schema used by filters is the latest table schema. This should be used only when reading
+   * a manifest without filters.
+   *
+   * @param manifest a ManifestFile
+   * @param io a FileIO
+   * @return a manifest reader
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io) {
+    return read(manifest, io, null);
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   *
+   * @param manifest a {@link ManifestFile}
+   * @param io a {@link FileIO}
+   * @param specsById a Map from spec ID to partition spec
+   * @return a {@link ManifestReader}
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io, Map<Integer, PartitionSpec> specsById) {
+    InputFile file = io.newInputFile(manifest.path());
+    InheritableMetadata inheritableMetadata = InheritableMetadataFactory.fromManifest(manifest);
+    return new ManifestReader(file, specsById, inheritableMetadata);
+  }
+
+  /**
+   * Create a new {@link ManifestWriter}.
+   * <p>
+   * Manifests created by this writer have all entry snapshot IDs set to null.
+   * All entries will inherit the snapshot ID that will be assigned to the manifest on commit.
+   *
+   * @param spec {@link PartitionSpec} used to produce {@link DataFile} partition tuples
+   * @param outputFile the destination file location
+   * @return a manifest writer
+   */
+  public static ManifestWriter write(PartitionSpec spec, OutputFile outputFile) {
 
 Review comment:
   The one in `ManifestWriter` was released in version 0.7 so we don't want to remove it from the public API without a period of deprecation. However, we do want to move the factory methods here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407988195
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/V2Metadata.java
 ##########
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+class V2Metadata {
+  private V2Metadata() {
+  }
+
+  // fields for v2 write schema for required metadata
+  static final Types.NestedField REQUIRED_SNAPSHOT_ID =
+      required(503, "added_snapshot_id", Types.LongType.get());
+  static final Types.NestedField REQUIRED_ADDED_FILES_COUNT =
+      required(504, "added_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_EXISTING_FILES_COUNT =
+      required(505, "existing_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_DELETED_FILES_COUNT =
+      required(506, "deleted_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_ADDED_ROWS_COUNT =
+      required(512, "added_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_EXISTING_ROWS_COUNT =
+      required(513, "existing_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_DELETED_ROWS_COUNT =
+      required(514, "deleted_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_SEQUENCE_NUMBER =
+      required(515, "sequence_number", Types.LongType.get());
+  static final Types.NestedField REQUIRED_MIN_SEQUENCE_NUMBER =
+      required(516, "min_sequence_number", Types.LongType.get());
+
+  static final Schema MANIFEST_LIST_SCHEMA = new Schema(
+      ManifestFile.PATH, ManifestFile.LENGTH, ManifestFile.SPEC_ID,
+      REQUIRED_SEQUENCE_NUMBER, REQUIRED_MIN_SEQUENCE_NUMBER, REQUIRED_SNAPSHOT_ID,
+      REQUIRED_ADDED_FILES_COUNT, REQUIRED_EXISTING_FILES_COUNT, REQUIRED_DELETED_FILES_COUNT,
+      REQUIRED_ADDED_ROWS_COUNT, REQUIRED_EXISTING_ROWS_COUNT, REQUIRED_DELETED_ROWS_COUNT,
+      ManifestFile.PARTITION_SUMMARIES);
+
+
+  /**
+   * A wrapper class to write any ManifestFile implementation to Avro using the v2 write schema.
+   *
+   * This is used to maintain compatibility with v2 by writing manifest list files with the old schema, instead of
+   * writing a sequence number into metadata files in v2 tables.
+   */
+  static class IndexedManifestFile implements ManifestFile, IndexedRecord {
+    private static final org.apache.avro.Schema AVRO_SCHEMA =
+        AvroSchemaUtil.convert(MANIFEST_LIST_SCHEMA, "manifest_file");
+
+    private final long snapshotId;
+    private final long sequenceNumber;
+    private ManifestFile wrapped = null;
+
+    IndexedManifestFile(long snapshotId, long sequenceNumber) {
+      this.snapshotId = snapshotId;
+      this.sequenceNumber = sequenceNumber;
+    }
+
+    public ManifestFile wrap(ManifestFile file) {
+      this.wrapped = file;
+      return this;
+    }
+
+    @Override
+    public org.apache.avro.Schema getSchema() {
+      return AVRO_SCHEMA;
+    }
+
+    @Override
+    public void put(int i, Object v) {
+      throw new UnsupportedOperationException("Cannot read using IndexedManifestFile");
+    }
+
+    @Override
+    public Object get(int pos) {
+      switch (pos) {
+        case 0:
+          return wrapped.path();
+        case 1:
+          return wrapped.length();
+        case 2:
+          return wrapped.partitionSpecId();
+        case 3:
+          if (wrapped.sequenceNumber() == ManifestWriter.UNASSIGNED_SEQ) {
+            Preconditions.checkState(snapshotId == wrapped.snapshotId(),
+                "Found unassigned sequence number for a manifest from snapshot: %s", wrapped.snapshotId());
+            return sequenceNumber;
+          } else {
+            return wrapped.sequenceNumber();
+          }
+        case 4:
+          return wrapped.minSequenceNumber();
+        case 5:
+          return wrapped.snapshotId();
+        case 6:
+          return wrapped.addedFilesCount();
+        case 7:
+          return wrapped.existingFilesCount();
+        case 8:
+          return wrapped.deletedFilesCount();
+        case 9:
+          return wrapped.addedRowsCount();
+        case 10:
+          return wrapped.existingRowsCount();
+        case 11:
+          return wrapped.deletedRowsCount();
+        case 12:
+          return wrapped.partitions();
+        default:
+          throw new UnsupportedOperationException("Unknown field ordinal: " + pos);
+      }
+    }
+
+    @Override
+    public String path() {
+      return wrapped.path();
+    }
+
+    @Override
+    public long length() {
+      return wrapped.length();
+    }
+
+    @Override
+    public int partitionSpecId() {
+      return wrapped.partitionSpecId();
+    }
+
+    @Override
+    public long sequenceNumber() {
+      return wrapped.sequenceNumber();
+    }
+
+    @Override
+    public long minSequenceNumber() {
+      return wrapped.minSequenceNumber();
+    }
+
+    @Override
+    public Long snapshotId() {
+      return wrapped.snapshotId();
+    }
+
+    @Override
+    public boolean hasAddedFiles() {
+      return wrapped.hasAddedFiles();
+    }
+
+    @Override
+    public Integer addedFilesCount() {
+      return wrapped.addedFilesCount();
+    }
+
+    @Override
+    public Long addedRowsCount() {
+      return wrapped.addedRowsCount();
+    }
+
+    @Override
+    public boolean hasExistingFiles() {
+      return wrapped.hasExistingFiles();
+    }
+
+    @Override
+    public Integer existingFilesCount() {
+      return wrapped.existingFilesCount();
+    }
+
+    @Override
+    public Long existingRowsCount() {
+      return wrapped.existingRowsCount();
+    }
+
+    @Override
+    public boolean hasDeletedFiles() {
+      return wrapped.hasDeletedFiles();
+    }
+
+    @Override
+    public Integer deletedFilesCount() {
+      return wrapped.deletedFilesCount();
+    }
+
+    @Override
+    public Long deletedRowsCount() {
+      return wrapped.deletedRowsCount();
+    }
+
+    @Override
+    public List<PartitionFieldSummary> partitions() {
+      return wrapped.partitions();
+    }
+
+    @Override
+    public ManifestFile copy() {
+      return wrapped.copy();
+    }
+  }
+
+  static Schema entrySchema(Types.StructType partitionType) {
+    return wrapFileSchema(DataFile.getType(partitionType));
+  }
+
+  static Schema wrapFileSchema(Types.StructType fileSchema) {
+    // this is used to build projection schemas
+    return new Schema(
+        ManifestEntry.STATUS, ManifestEntry.SNAPSHOT_ID,
+        required(ManifestEntry.DATA_FILE_ID, "data_file", fileSchema));
+  }
+
+  static class IndexedManifestEntry implements ManifestEntry, IndexedRecord {
+    private final org.apache.avro.Schema avroSchema;
+    private final long snapshotId;
+    private final V1Metadata.IndexedDataFile fileWrapper;
+    private ManifestEntry wrapped = null;
+
+    IndexedManifestEntry(long snapshotId, Types.StructType partitionType) {
+      this.avroSchema = AvroSchemaUtil.convert(entrySchema(partitionType), "manifest_entry");
+      this.snapshotId = snapshotId;
+      // TODO: when v2 data files differ from v1, this should use a v2 wrapper
+      this.fileWrapper = new V1Metadata.IndexedDataFile(avroSchema.getField("data_file").schema());
+    }
+
+    public IndexedManifestEntry wrap(ManifestEntry entry) {
+      this.wrapped = entry;
+      return this;
+    }
+
+    @Override
+    public org.apache.avro.Schema getSchema() {
+      return avroSchema;
+    }
+
+    @Override
+    public void put(int i, Object v) {
+      throw new UnsupportedOperationException("Cannot read using IndexedManifestEntry");
+    }
+
+    @Override
+    public Object get(int i) {
+      switch (i) {
+        case 0:
+          return wrapped.status().id();
+        case 1:
+          return wrapped.snapshotId();
+        case 2:
+          if (wrapped.sequenceNumber() == null) {
+            Preconditions.checkState(wrapped.snapshotId() == null || snapshotId == wrapped.snapshotId(),
 
 Review comment:
   I think this also needs some detail or document about why it happens.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r408534594
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/V2Metadata.java
 ##########
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+class V2Metadata {
+  private V2Metadata() {
+  }
+
+  // fields for v2 write schema for required metadata
+  static final Types.NestedField REQUIRED_SNAPSHOT_ID =
+      required(503, "added_snapshot_id", Types.LongType.get());
+  static final Types.NestedField REQUIRED_ADDED_FILES_COUNT =
+      required(504, "added_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_EXISTING_FILES_COUNT =
+      required(505, "existing_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_DELETED_FILES_COUNT =
+      required(506, "deleted_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_ADDED_ROWS_COUNT =
+      required(512, "added_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_EXISTING_ROWS_COUNT =
+      required(513, "existing_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_DELETED_ROWS_COUNT =
+      required(514, "deleted_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_SEQUENCE_NUMBER =
+      required(515, "sequence_number", Types.LongType.get());
+  static final Types.NestedField REQUIRED_MIN_SEQUENCE_NUMBER =
+      required(516, "min_sequence_number", Types.LongType.get());
+
+  static final Schema MANIFEST_LIST_SCHEMA = new Schema(
+      ManifestFile.PATH, ManifestFile.LENGTH, ManifestFile.SPEC_ID,
+      REQUIRED_SEQUENCE_NUMBER, REQUIRED_MIN_SEQUENCE_NUMBER, REQUIRED_SNAPSHOT_ID,
+      REQUIRED_ADDED_FILES_COUNT, REQUIRED_EXISTING_FILES_COUNT, REQUIRED_DELETED_FILES_COUNT,
+      REQUIRED_ADDED_ROWS_COUNT, REQUIRED_EXISTING_ROWS_COUNT, REQUIRED_DELETED_ROWS_COUNT,
+      ManifestFile.PARTITION_SUMMARIES);
+
+
+  /**
+   * A wrapper class to write any ManifestFile implementation to Avro using the v2 write schema.
+   *
+   * This is used to maintain compatibility with v2 by writing manifest list files with the old schema, instead of
+   * writing a sequence number into metadata files in v2 tables.
+   */
+  static class IndexedManifestFile implements ManifestFile, IndexedRecord {
+    private static final org.apache.avro.Schema AVRO_SCHEMA =
+        AvroSchemaUtil.convert(MANIFEST_LIST_SCHEMA, "manifest_file");
+
+    private final long snapshotId;
+    private final long sequenceNumber;
+    private ManifestFile wrapped = null;
+
+    IndexedManifestFile(long snapshotId, long sequenceNumber) {
+      this.snapshotId = snapshotId;
+      this.sequenceNumber = sequenceNumber;
+    }
+
+    public ManifestFile wrap(ManifestFile file) {
+      this.wrapped = file;
+      return this;
+    }
+
+    @Override
+    public org.apache.avro.Schema getSchema() {
+      return AVRO_SCHEMA;
+    }
+
+    @Override
+    public void put(int i, Object v) {
+      throw new UnsupportedOperationException("Cannot read using IndexedManifestFile");
+    }
+
+    @Override
+    public Object get(int pos) {
+      switch (pos) {
+        case 0:
+          return wrapped.path();
+        case 1:
+          return wrapped.length();
+        case 2:
+          return wrapped.partitionSpecId();
+        case 3:
+          if (wrapped.sequenceNumber() == ManifestWriter.UNASSIGNED_SEQ) {
+            Preconditions.checkState(snapshotId == wrapped.snapshotId(),
+                "Found unassigned sequence number for a manifest from snapshot: %s", wrapped.snapshotId());
+            return sequenceNumber;
+          } else {
+            return wrapped.sequenceNumber();
+          }
+        case 4:
+          return wrapped.minSequenceNumber();
+        case 5:
+          return wrapped.snapshotId();
+        case 6:
+          return wrapped.addedFilesCount();
+        case 7:
+          return wrapped.existingFilesCount();
+        case 8:
+          return wrapped.deletedFilesCount();
+        case 9:
+          return wrapped.addedRowsCount();
+        case 10:
+          return wrapped.existingRowsCount();
+        case 11:
+          return wrapped.deletedRowsCount();
+        case 12:
+          return wrapped.partitions();
+        default:
+          throw new UnsupportedOperationException("Unknown field ordinal: " + pos);
+      }
+    }
+
+    @Override
+    public String path() {
+      return wrapped.path();
+    }
+
+    @Override
+    public long length() {
+      return wrapped.length();
+    }
+
+    @Override
+    public int partitionSpecId() {
+      return wrapped.partitionSpecId();
+    }
+
+    @Override
+    public long sequenceNumber() {
+      return wrapped.sequenceNumber();
+    }
+
+    @Override
+    public long minSequenceNumber() {
+      return wrapped.minSequenceNumber();
+    }
+
+    @Override
+    public Long snapshotId() {
+      return wrapped.snapshotId();
+    }
+
+    @Override
+    public boolean hasAddedFiles() {
+      return wrapped.hasAddedFiles();
+    }
+
+    @Override
+    public Integer addedFilesCount() {
+      return wrapped.addedFilesCount();
+    }
+
+    @Override
+    public Long addedRowsCount() {
+      return wrapped.addedRowsCount();
+    }
+
+    @Override
+    public boolean hasExistingFiles() {
+      return wrapped.hasExistingFiles();
+    }
+
+    @Override
+    public Integer existingFilesCount() {
+      return wrapped.existingFilesCount();
+    }
+
+    @Override
+    public Long existingRowsCount() {
+      return wrapped.existingRowsCount();
+    }
+
+    @Override
+    public boolean hasDeletedFiles() {
+      return wrapped.hasDeletedFiles();
+    }
+
+    @Override
+    public Integer deletedFilesCount() {
+      return wrapped.deletedFilesCount();
+    }
+
+    @Override
+    public Long deletedRowsCount() {
+      return wrapped.deletedRowsCount();
+    }
+
+    @Override
+    public List<PartitionFieldSummary> partitions() {
+      return wrapped.partitions();
+    }
+
+    @Override
+    public ManifestFile copy() {
+      return wrapped.copy();
+    }
+  }
+
+  static Schema entrySchema(Types.StructType partitionType) {
+    return wrapFileSchema(DataFile.getType(partitionType));
+  }
+
+  static Schema wrapFileSchema(Types.StructType fileSchema) {
+    // this is used to build projection schemas
+    return new Schema(
+        ManifestEntry.STATUS, ManifestEntry.SNAPSHOT_ID,
+        required(ManifestEntry.DATA_FILE_ID, "data_file", fileSchema));
+  }
+
+  static class IndexedManifestEntry implements ManifestEntry, IndexedRecord {
+    private final org.apache.avro.Schema avroSchema;
+    private final long snapshotId;
+    private final V1Metadata.IndexedDataFile fileWrapper;
+    private ManifestEntry wrapped = null;
+
+    IndexedManifestEntry(long snapshotId, Types.StructType partitionType) {
+      this.avroSchema = AvroSchemaUtil.convert(entrySchema(partitionType), "manifest_entry");
+      this.snapshotId = snapshotId;
+      // TODO: when v2 data files differ from v1, this should use a v2 wrapper
+      this.fileWrapper = new V1Metadata.IndexedDataFile(avroSchema.getField("data_file").schema());
+    }
+
+    public IndexedManifestEntry wrap(ManifestEntry entry) {
+      this.wrapped = entry;
+      return this;
+    }
+
+    @Override
+    public org.apache.avro.Schema getSchema() {
+      return avroSchema;
+    }
+
+    @Override
+    public void put(int i, Object v) {
+      throw new UnsupportedOperationException("Cannot read using IndexedManifestEntry");
+    }
+
+    @Override
+    public Object get(int i) {
+      switch (i) {
+        case 0:
+          return wrapped.status().id();
+        case 1:
+          return wrapped.snapshotId();
+        case 2:
+          if (wrapped.sequenceNumber() == null) {
+            Preconditions.checkState(wrapped.snapshotId() == null || snapshotId == wrapped.snapshotId(),
 
 Review comment:
   I added a comment to explain here as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407583397
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java
 ##########
 @@ -30,22 +31,40 @@ static InheritableMetadata empty() {
   }
 
   static InheritableMetadata fromManifest(ManifestFile manifest) {
-    return new BaseInheritableMetadata(manifest.snapshotId());
+    if (manifest.snapshotId() != null) {
+      return new BaseInheritableMetadata(manifest.snapshotId(), manifest.sequenceNumber());
+    } else {
+      return NOOP;
 
 Review comment:
   Snapshot id is null when a manifest is appended to a table and the snapshot id hasn't been assigned yet.
   
   If the manifest is committed to table metadata, then it will be set when writing and will always be present. That's why it is required in the v2 schema for manifest list files.
   
   Some appended manifests are rewritten before committing. When reading those to rewrite them, this path is used.
   
   I think I'm going to update how this works, but as a separate commit. The problem is that this allows reading a manifest without filling in the snapshot id. But the rewrite method where the reader that is configured this was is used has the snapshot id. So we should add the snapshot id to the `ManifestFile` and then read to avoid the special case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407577814
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/GenericManifestFile.java
 ##########
 @@ -257,21 +255,25 @@ public Object get(int i) {
       case 2:
         return specId;
       case 3:
-        return snapshotId;
+        return sequenceNumber;
 
 Review comment:
   Changing the order won't break compatibility. The requirement here is that the order of fields in these classes must match the order of fields in the schema. That's why this commit freezes the v1 schema and an IndexedWriter for v1. That way, we can write exactly what we would have before for v1, but make changes for v2.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r408272201
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/ManifestLists.java
 ##########
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import java.io.IOException;
+import java.util.List;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.io.OutputFile;
+
+class ManifestLists {
+  private ManifestLists() {
+  }
+
+  static List<ManifestFile> read(InputFile manifestList) {
+    try (CloseableIterable<ManifestFile> files = Avro.read(manifestList)
+        .rename("manifest_file", GenericManifestFile.class.getName())
+        .rename("partitions", GenericPartitionFieldSummary.class.getName())
+        .rename("r508", GenericPartitionFieldSummary.class.getName())
+        .classLoader(GenericManifestFile.class.getClassLoader())
+        .project(ManifestFile.schema())
+        .reuseContainers(false)
+        .build()) {
+
+      return Lists.newLinkedList(files);
+
+    } catch (IOException e) {
+      throw new RuntimeIOException(e, "Cannot read manifest list file: %s", manifestList.location());
+    }
+  }
+
+  static ManifestListWriter write(int formatVersion, OutputFile manifestListFile,
 
 Review comment:
   Yeah, we probably will.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] openinx commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407297599
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/GenericManifestEntry.java
 ##########
 @@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.MoreObjects;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.avro.specific.SpecificData;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+class GenericManifestEntry implements ManifestEntry, IndexedRecord, SpecificData.SchemaConstructable {
+  private final org.apache.avro.Schema schema;
+  private final V1Metadata.IndexedDataFile fileWrapper;
+  private Status status = Status.EXISTING;
+  private Long snapshotId = null;
+  private Long sequenceNumber = null;
+  private DataFile file = null;
+
+  GenericManifestEntry(org.apache.avro.Schema schema) {
+    this.schema = schema;
+    this.fileWrapper = null; // do not use the file wrapper to read
+  }
+
+  GenericManifestEntry(Types.StructType partitionType) {
+    this.schema = AvroSchemaUtil.convert(V1Metadata.entrySchema(partitionType), "manifest_entry");
+    this.fileWrapper = new V1Metadata.IndexedDataFile(schema.getField("data_file").schema());
+  }
+
+  private GenericManifestEntry(GenericManifestEntry toCopy, boolean fullCopy) {
+    this.schema = toCopy.schema;
+    this.fileWrapper = new V1Metadata.IndexedDataFile(schema.getField("data_file").schema());
+    this.status = toCopy.status;
+    this.snapshotId = toCopy.snapshotId;
+    if (fullCopy) {
+      this.file = toCopy.file().copy();
+    } else {
+      this.file = toCopy.file().copyWithoutStats();
+    }
+  }
+
+  ManifestEntry wrapExisting(Long newSnapshotId, Long newSequenceNumber, DataFile newFile) {
+    this.status = Status.EXISTING;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = newSequenceNumber;
+    this.file = newFile;
+    return this;
+  }
+
+  ManifestEntry wrapAppend(Long newSnapshotId, DataFile newFile) {
+    this.status = Status.ADDED;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = null;
+    this.file = newFile;
+    return this;
+  }
+
+  ManifestEntry wrapDelete(Long newSnapshotId, DataFile newFile) {
+    this.status = Status.DELETED;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = null;
+    this.file = newFile;
+    return this;
+  }
+
+  /**
+   * @return the status of the file, whether EXISTING, ADDED, or DELETED
+   */
+  public Status status() {
+    return status;
+  }
+
+  /**
+   * @return id of the snapshot in which the file was added to the table
+   */
+  public Long snapshotId() {
+    return snapshotId;
+  }
+
+  @Override
+  public Long sequenceNumber() {
+    return sequenceNumber;
+  }
+
+  /**
+   * @return a file
+   */
+  public DataFile file() {
+    return file;
+  }
+
+  public ManifestEntry copy() {
+    return new GenericManifestEntry(this, true /* full copy */);
+  }
+
+  public ManifestEntry copyWithoutStats() {
+    return new GenericManifestEntry(this, false /* drop stats */);
+  }
+
+  @Override
+  public void setSnapshotId(long newSnapshotId) {
+    this.snapshotId = newSnapshotId;
+  }
+
+  @Override
+  public void setSequenceNumber(long newSequenceNumber) {
+    this.sequenceNumber = newSequenceNumber;
+  }
+
+  @Override
+  public void put(int i, Object v) {
+    switch (i) {
+      case 0:
+        this.status = Status.values()[(Integer) v];
+        return;
+      case 1:
+        this.snapshotId = (Long) v;
+        return;
+      case 2:
+        this.sequenceNumber = (Long) v;
+        return;
+      case 3:
+        this.file = (DataFile) v;
+        return;
+      default:
+        // ignore the object, it must be from a newer version of the format
+    }
+  }
+
+  @Override
+  public Object get(int i) {
+    switch (i) {
+      case 0:
+        return status.id();
+      case 1:
+        return snapshotId;
+      case 2:
+        return sequenceNumber;
+      case 3:
+        if (fileWrapper == null || file instanceof GenericDataFile) {
+          return file;
+        } else {
+          return fileWrapper.wrap(file);
+        }
+      default:
+        throw new UnsupportedOperationException("Unknown field ordinal: " + i);
+    }
+  }
+
+  @Override
+  public org.apache.avro.Schema getSchema() {
+    return schema;
+  }
+
+  @Override
+  public String toString() {
+    return MoreObjects.toStringHelper(this)
+        .add("status", status)
+        .add("snapshot_id", snapshotId)
 
 Review comment:
   nit: missing the sequence number here ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #913: Add v2 manifests
URL: https://github.com/apache/incubator-iceberg/pull/913#issuecomment-614306944
 
 
   I'll update this after #925 is in. I extracted those changes out of this one to make this smaller.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407950567
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/ManifestFiles.java
 ##########
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.io.OutputFile;
+
+public class ManifestFiles {
+  private ManifestFiles() {
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   * <p>
+   * <em>Note:</em> Callers should use {@link ManifestFiles#read(ManifestFile, FileIO, Map)} to ensure
+   * the schema used by filters is the latest table schema. This should be used only when reading
+   * a manifest without filters.
+   *
+   * @param manifest a ManifestFile
+   * @param io a FileIO
+   * @return a manifest reader
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io) {
+    return read(manifest, io, null);
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   *
+   * @param manifest a {@link ManifestFile}
+   * @param io a {@link FileIO}
+   * @param specsById a Map from spec ID to partition spec
+   * @return a {@link ManifestReader}
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io, Map<Integer, PartitionSpec> specsById) {
+    InputFile file = io.newInputFile(manifest.path());
+    InheritableMetadata inheritableMetadata = InheritableMetadataFactory.fromManifest(manifest);
+    return new ManifestReader(file, specsById, inheritableMetadata);
+  }
+
+  /**
+   * Create a new {@link ManifestWriter}.
+   * <p>
+   * Manifests created by this writer have all entry snapshot IDs set to null.
+   * All entries will inherit the snapshot ID that will be assigned to the manifest on commit.
+   *
+   * @param spec {@link PartitionSpec} used to produce {@link DataFile} partition tuples
+   * @param outputFile the destination file location
+   * @return a manifest writer
+   */
+  public static ManifestWriter write(PartitionSpec spec, OutputFile outputFile) {
 
 Review comment:
   The document is duplicated with the one in ManifestWriter.write. We can remove this function definition and just call from there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] openinx commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407299115
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/GenericManifestFile.java
 ##########
 @@ -257,21 +255,25 @@ public Object get(int i) {
       case 2:
         return specId;
       case 3:
-        return snapshotId;
+        return sequenceNumber;
 
 Review comment:
   Here we're changing the position of snapshotId/addedFilesCount/ ... etc,   will it break the compatibility ?  I mean the new code couldn't read the old data manifest file ... 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407577108
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/GenericManifestEntry.java
 ##########
 @@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.MoreObjects;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.avro.specific.SpecificData;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+class GenericManifestEntry implements ManifestEntry, IndexedRecord, SpecificData.SchemaConstructable {
+  private final org.apache.avro.Schema schema;
+  private final V1Metadata.IndexedDataFile fileWrapper;
+  private Status status = Status.EXISTING;
+  private Long snapshotId = null;
+  private Long sequenceNumber = null;
+  private DataFile file = null;
+
+  GenericManifestEntry(org.apache.avro.Schema schema) {
+    this.schema = schema;
+    this.fileWrapper = null; // do not use the file wrapper to read
+  }
+
+  GenericManifestEntry(Types.StructType partitionType) {
+    this.schema = AvroSchemaUtil.convert(V1Metadata.entrySchema(partitionType), "manifest_entry");
+    this.fileWrapper = new V1Metadata.IndexedDataFile(schema.getField("data_file").schema());
+  }
+
+  private GenericManifestEntry(GenericManifestEntry toCopy, boolean fullCopy) {
+    this.schema = toCopy.schema;
+    this.fileWrapper = new V1Metadata.IndexedDataFile(schema.getField("data_file").schema());
+    this.status = toCopy.status;
+    this.snapshotId = toCopy.snapshotId;
+    if (fullCopy) {
+      this.file = toCopy.file().copy();
+    } else {
+      this.file = toCopy.file().copyWithoutStats();
+    }
+  }
+
+  ManifestEntry wrapExisting(Long newSnapshotId, Long newSequenceNumber, DataFile newFile) {
+    this.status = Status.EXISTING;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = newSequenceNumber;
+    this.file = newFile;
+    return this;
+  }
+
+  ManifestEntry wrapAppend(Long newSnapshotId, DataFile newFile) {
+    this.status = Status.ADDED;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = null;
 
 Review comment:
   When a file is appended, it must have the sequence number of the snapshot where it is committed. The problem is that the sequence number isn't determined until the snapshot's commit is successful. Two committers may be racing to add a commit with the same sequence number.
   
   That's why sequence numbers are assigned initially through inheritance. Every snapshot commit attempt writes a new root manifest list with a new sequence number based on the table metadata's last sequence number. If two writers are trying to commit different snapshots as sequence number 5, one will win and the other will retry with 6. To avoid rewriting the metadata tree below the manifest list, sequence numbers that might change are inherited instead of written into the initial files.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] openinx commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407298145
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/GenericManifestEntry.java
 ##########
 @@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.MoreObjects;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.avro.specific.SpecificData;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+class GenericManifestEntry implements ManifestEntry, IndexedRecord, SpecificData.SchemaConstructable {
+  private final org.apache.avro.Schema schema;
+  private final V1Metadata.IndexedDataFile fileWrapper;
+  private Status status = Status.EXISTING;
+  private Long snapshotId = null;
+  private Long sequenceNumber = null;
+  private DataFile file = null;
+
+  GenericManifestEntry(org.apache.avro.Schema schema) {
+    this.schema = schema;
+    this.fileWrapper = null; // do not use the file wrapper to read
+  }
+
+  GenericManifestEntry(Types.StructType partitionType) {
+    this.schema = AvroSchemaUtil.convert(V1Metadata.entrySchema(partitionType), "manifest_entry");
+    this.fileWrapper = new V1Metadata.IndexedDataFile(schema.getField("data_file").schema());
+  }
+
+  private GenericManifestEntry(GenericManifestEntry toCopy, boolean fullCopy) {
+    this.schema = toCopy.schema;
+    this.fileWrapper = new V1Metadata.IndexedDataFile(schema.getField("data_file").schema());
+    this.status = toCopy.status;
+    this.snapshotId = toCopy.snapshotId;
+    if (fullCopy) {
+      this.file = toCopy.file().copy();
+    } else {
+      this.file = toCopy.file().copyWithoutStats();
+    }
+  }
+
+  ManifestEntry wrapExisting(Long newSnapshotId, Long newSequenceNumber, DataFile newFile) {
+    this.status = Status.EXISTING;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = newSequenceNumber;
+    this.file = newFile;
+    return this;
+  }
+
+  ManifestEntry wrapAppend(Long newSnapshotId, DataFile newFile) {
+    this.status = Status.ADDED;
+    this.snapshotId = newSnapshotId;
+    this.sequenceNumber = null;
 
 Review comment:
   The append/delete operations won't attach the sequence number to its entries ?  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] chenjunjiedada commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #913: Add v2 manifests
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r408534311
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/ManifestFiles.java
 ##########
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.io.OutputFile;
+
+public class ManifestFiles {
+  private ManifestFiles() {
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   * <p>
+   * <em>Note:</em> Callers should use {@link ManifestFiles#read(ManifestFile, FileIO, Map)} to ensure
+   * the schema used by filters is the latest table schema. This should be used only when reading
+   * a manifest without filters.
+   *
+   * @param manifest a ManifestFile
+   * @param io a FileIO
+   * @return a manifest reader
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io) {
+    return read(manifest, io, null);
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   *
+   * @param manifest a {@link ManifestFile}
+   * @param io a {@link FileIO}
+   * @param specsById a Map from spec ID to partition spec
+   * @return a {@link ManifestReader}
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io, Map<Integer, PartitionSpec> specsById) {
+    InputFile file = io.newInputFile(manifest.path());
+    InheritableMetadata inheritableMetadata = InheritableMetadataFactory.fromManifest(manifest);
+    return new ManifestReader(file, specsById, inheritableMetadata);
+  }
+
+  /**
+   * Create a new {@link ManifestWriter}.
+   * <p>
+   * Manifests created by this writer have all entry snapshot IDs set to null.
+   * All entries will inherit the snapshot ID that will be assigned to the manifest on commit.
+   *
+   * @param spec {@link PartitionSpec} used to produce {@link DataFile} partition tuples
+   * @param outputFile the destination file location
+   * @return a manifest writer
+   */
+  public static ManifestWriter write(PartitionSpec spec, OutputFile outputFile) {
 
 Review comment:
   I see, thanks for the explanation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r408534546
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/V2Metadata.java
 ##########
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+class V2Metadata {
+  private V2Metadata() {
+  }
+
+  // fields for v2 write schema for required metadata
+  static final Types.NestedField REQUIRED_SNAPSHOT_ID =
+      required(503, "added_snapshot_id", Types.LongType.get());
+  static final Types.NestedField REQUIRED_ADDED_FILES_COUNT =
+      required(504, "added_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_EXISTING_FILES_COUNT =
+      required(505, "existing_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_DELETED_FILES_COUNT =
+      required(506, "deleted_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_ADDED_ROWS_COUNT =
+      required(512, "added_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_EXISTING_ROWS_COUNT =
+      required(513, "existing_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_DELETED_ROWS_COUNT =
+      required(514, "deleted_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_SEQUENCE_NUMBER =
+      required(515, "sequence_number", Types.LongType.get());
+  static final Types.NestedField REQUIRED_MIN_SEQUENCE_NUMBER =
+      required(516, "min_sequence_number", Types.LongType.get());
+
+  static final Schema MANIFEST_LIST_SCHEMA = new Schema(
+      ManifestFile.PATH, ManifestFile.LENGTH, ManifestFile.SPEC_ID,
+      REQUIRED_SEQUENCE_NUMBER, REQUIRED_MIN_SEQUENCE_NUMBER, REQUIRED_SNAPSHOT_ID,
+      REQUIRED_ADDED_FILES_COUNT, REQUIRED_EXISTING_FILES_COUNT, REQUIRED_DELETED_FILES_COUNT,
+      REQUIRED_ADDED_ROWS_COUNT, REQUIRED_EXISTING_ROWS_COUNT, REQUIRED_DELETED_ROWS_COUNT,
+      ManifestFile.PARTITION_SUMMARIES);
+
+
+  /**
+   * A wrapper class to write any ManifestFile implementation to Avro using the v2 write schema.
+   *
+   * This is used to maintain compatibility with v2 by writing manifest list files with the old schema, instead of
+   * writing a sequence number into metadata files in v2 tables.
+   */
+  static class IndexedManifestFile implements ManifestFile, IndexedRecord {
+    private static final org.apache.avro.Schema AVRO_SCHEMA =
+        AvroSchemaUtil.convert(MANIFEST_LIST_SCHEMA, "manifest_file");
+
+    private final long snapshotId;
+    private final long sequenceNumber;
+    private ManifestFile wrapped = null;
+
+    IndexedManifestFile(long snapshotId, long sequenceNumber) {
+      this.snapshotId = snapshotId;
+      this.sequenceNumber = sequenceNumber;
+    }
+
+    public ManifestFile wrap(ManifestFile file) {
+      this.wrapped = file;
+      return this;
+    }
+
+    @Override
+    public org.apache.avro.Schema getSchema() {
+      return AVRO_SCHEMA;
+    }
+
+    @Override
+    public void put(int i, Object v) {
+      throw new UnsupportedOperationException("Cannot read using IndexedManifestFile");
+    }
+
+    @Override
+    public Object get(int pos) {
+      switch (pos) {
+        case 0:
+          return wrapped.path();
+        case 1:
+          return wrapped.length();
+        case 2:
+          return wrapped.partitionSpecId();
+        case 3:
+          if (wrapped.sequenceNumber() == ManifestWriter.UNASSIGNED_SEQ) {
+            Preconditions.checkState(snapshotId == wrapped.snapshotId(),
 
 Review comment:
   I added a better comment here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407962295
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/ManifestFiles.java
 ##########
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.io.OutputFile;
+
+public class ManifestFiles {
+  private ManifestFiles() {
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   * <p>
+   * <em>Note:</em> Callers should use {@link ManifestFiles#read(ManifestFile, FileIO, Map)} to ensure
+   * the schema used by filters is the latest table schema. This should be used only when reading
+   * a manifest without filters.
+   *
+   * @param manifest a ManifestFile
+   * @param io a FileIO
+   * @return a manifest reader
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io) {
+    return read(manifest, io, null);
+  }
+
+  /**
+   * Returns a new {@link ManifestReader} for a {@link ManifestFile}.
+   *
+   * @param manifest a {@link ManifestFile}
+   * @param io a {@link FileIO}
+   * @param specsById a Map from spec ID to partition spec
+   * @return a {@link ManifestReader}
+   */
+  public static ManifestReader read(ManifestFile manifest, FileIO io, Map<Integer, PartitionSpec> specsById) {
+    InputFile file = io.newInputFile(manifest.path());
+    InheritableMetadata inheritableMetadata = InheritableMetadataFactory.fromManifest(manifest);
+    return new ManifestReader(file, specsById, inheritableMetadata);
+  }
+
+  /**
+   * Create a new {@link ManifestWriter}.
+   * <p>
+   * Manifests created by this writer have all entry snapshot IDs set to null.
+   * All entries will inherit the snapshot ID that will be assigned to the manifest on commit.
+   *
+   * @param spec {@link PartitionSpec} used to produce {@link DataFile} partition tuples
+   * @param outputFile the destination file location
+   * @return a manifest writer
+   */
+  public static ManifestWriter write(PartitionSpec spec, OutputFile outputFile) {
 
 Review comment:
   The document is duplicated with the one in ManifestWriter.write. We can remove this function definition and just call from there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] openinx commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r407353719
 
 

 ##########
 File path: api/src/main/java/org/apache/iceberg/ManifestFile.java
 ##########
 @@ -66,6 +77,16 @@ static Schema schema() {
    */
   int partitionSpecId();
 
+  /**
+   * @return the sequence number of the commit that added the manifest file
+   */
+  long sequenceNumber();
+
+  /**
+   * @return the lowest sequence number of any data file in the manifest
+   */
+  long minSequenceNumber();
 
 Review comment:
   Why need to define this min sequence number ?  for manifest file filtering purpose in read path ? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913: Add v2 manifests (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r408273439
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/V2Metadata.java
 ##########
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+class V2Metadata {
+  private V2Metadata() {
+  }
+
+  // fields for v2 write schema for required metadata
+  static final Types.NestedField REQUIRED_SNAPSHOT_ID =
+      required(503, "added_snapshot_id", Types.LongType.get());
+  static final Types.NestedField REQUIRED_ADDED_FILES_COUNT =
+      required(504, "added_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_EXISTING_FILES_COUNT =
+      required(505, "existing_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_DELETED_FILES_COUNT =
+      required(506, "deleted_data_files_count", Types.IntegerType.get());
+  static final Types.NestedField REQUIRED_ADDED_ROWS_COUNT =
+      required(512, "added_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_EXISTING_ROWS_COUNT =
+      required(513, "existing_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_DELETED_ROWS_COUNT =
+      required(514, "deleted_rows_count", Types.LongType.get());
+  static final Types.NestedField REQUIRED_SEQUENCE_NUMBER =
+      required(515, "sequence_number", Types.LongType.get());
+  static final Types.NestedField REQUIRED_MIN_SEQUENCE_NUMBER =
+      required(516, "min_sequence_number", Types.LongType.get());
+
+  static final Schema MANIFEST_LIST_SCHEMA = new Schema(
+      ManifestFile.PATH, ManifestFile.LENGTH, ManifestFile.SPEC_ID,
+      REQUIRED_SEQUENCE_NUMBER, REQUIRED_MIN_SEQUENCE_NUMBER, REQUIRED_SNAPSHOT_ID,
+      REQUIRED_ADDED_FILES_COUNT, REQUIRED_EXISTING_FILES_COUNT, REQUIRED_DELETED_FILES_COUNT,
+      REQUIRED_ADDED_ROWS_COUNT, REQUIRED_EXISTING_ROWS_COUNT, REQUIRED_DELETED_ROWS_COUNT,
+      ManifestFile.PARTITION_SUMMARIES);
+
+
+  /**
+   * A wrapper class to write any ManifestFile implementation to Avro using the v2 write schema.
+   *
+   * This is used to maintain compatibility with v2 by writing manifest list files with the old schema, instead of
+   * writing a sequence number into metadata files in v2 tables.
+   */
+  static class IndexedManifestFile implements ManifestFile, IndexedRecord {
+    private static final org.apache.avro.Schema AVRO_SCHEMA =
+        AvroSchemaUtil.convert(MANIFEST_LIST_SCHEMA, "manifest_file");
+
+    private final long snapshotId;
+    private final long sequenceNumber;
+    private ManifestFile wrapped = null;
+
+    IndexedManifestFile(long snapshotId, long sequenceNumber) {
+      this.snapshotId = snapshotId;
+      this.sequenceNumber = sequenceNumber;
+    }
+
+    public ManifestFile wrap(ManifestFile file) {
+      this.wrapped = file;
+      return this;
+    }
+
+    @Override
+    public org.apache.avro.Schema getSchema() {
+      return AVRO_SCHEMA;
+    }
+
+    @Override
+    public void put(int i, Object v) {
+      throw new UnsupportedOperationException("Cannot read using IndexedManifestFile");
+    }
+
+    @Override
+    public Object get(int pos) {
+      switch (pos) {
+        case 0:
+          return wrapped.path();
+        case 1:
+          return wrapped.length();
+        case 2:
+          return wrapped.partitionSpecId();
+        case 3:
+          if (wrapped.sequenceNumber() == ManifestWriter.UNASSIGNED_SEQ) {
+            Preconditions.checkState(snapshotId == wrapped.snapshotId(),
 
 Review comment:
   This validates that only new manifests created by the current snapshot will have the sequence number added on write.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on issue #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#issuecomment-616775178


   It is a Javadoc change, I'll merge it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411640747



##########
File path: core/src/main/java/org/apache/iceberg/ManifestWriter.java
##########
@@ -116,12 +126,12 @@ void add(ManifestEntry entry) {
    * @param existingFile a data file
    * @param fileSnapshotId snapshot ID when the data file was added to the table
    */
-  public void existing(DataFile existingFile, long fileSnapshotId) {
-    addEntry(reused.wrapExisting(fileSnapshotId, existingFile));
+  public void existing(DataFile existingFile, long fileSnapshotId, long sequenceNumber) {

Review comment:
       Manifest rewrite uses the `existing(ManifestEntry)` method that copies from the original entry. That will copy the existing entry's sequence number, or 0 if it was null.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411639291



##########
File path: core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java
##########
@@ -30,22 +31,40 @@ static InheritableMetadata empty() {
   }
 
   static InheritableMetadata fromManifest(ManifestFile manifest) {
-    return new BaseInheritableMetadata(manifest.snapshotId());
+    if (manifest.snapshotId() != null) {
+      return new BaseInheritableMetadata(manifest.snapshotId(), manifest.sequenceNumber());
+    } else {
+      return NOOP;

Review comment:
       `added_snapshot_id` is optional because we used to store manifests as a list in the metadata.json file. V2 changes it to required because we always write a manifest-list with the inherited snapshot id and sequence number.
   
   My comment about changing this wasn't about removing `InheritableMetadata`. It was about the path that rewrites the manifest and uses the no-op `InheritableMetadata`. I think that we can simplify this by making sure we set the snapshot ID on the `ManifestFile` before rewriting. That way any rewrite has the snapshot ID and we don't need a special case here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411569517



##########
File path: core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java
##########
@@ -30,22 +31,40 @@ static InheritableMetadata empty() {
   }
 
   static InheritableMetadata fromManifest(ManifestFile manifest) {
-    return new BaseInheritableMetadata(manifest.snapshotId());
+    if (manifest.snapshotId() != null) {
+      return new BaseInheritableMetadata(manifest.snapshotId(), manifest.sequenceNumber());
+    } else {
+      return NOOP;

Review comment:
       A couple of questions here:
   - Now `added_snapshot_id` is optional. Do we plan to make it required once we start assigning the snapshot in `ManifestListWriter`?
   - We cannot actually remove `InheritableMetadata` as there might be old manifests where `added_snapshot_id` is null, correct?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411630333



##########
File path: core/src/main/java/org/apache/iceberg/ManifestWriter.java
##########
@@ -116,12 +126,12 @@ void add(ManifestEntry entry) {
    * @param existingFile a data file
    * @param fileSnapshotId snapshot ID when the data file was added to the table
    */
-  public void existing(DataFile existingFile, long fileSnapshotId) {
-    addEntry(reused.wrapExisting(fileSnapshotId, existingFile));
+  public void existing(DataFile existingFile, long fileSnapshotId, long sequenceNumber) {

Review comment:
       nit: I think we need to update the Javadoc.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411650166



##########
File path: core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java
##########
@@ -30,22 +31,40 @@ static InheritableMetadata empty() {
   }
 
   static InheritableMetadata fromManifest(ManifestFile manifest) {
-    return new BaseInheritableMetadata(manifest.snapshotId());
+    if (manifest.snapshotId() != null) {
+      return new BaseInheritableMetadata(manifest.snapshotId(), manifest.sequenceNumber());
+    } else {
+      return NOOP;

Review comment:
       I asked about `InheritableMetadata` as I wondered myself whether we can remove that completely. Seems not. 
   
   Yeah, I like the idea of setting the snapshot id in `ManifestFile` before rewriting.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on issue #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#issuecomment-616696213


   Let me go through this right now. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#issuecomment-616841617


   Yes, please do! That one should be easier since it is just updating a few tests to be parameterized.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411640747



##########
File path: core/src/main/java/org/apache/iceberg/ManifestWriter.java
##########
@@ -116,12 +126,12 @@ void add(ManifestEntry entry) {
    * @param existingFile a data file
    * @param fileSnapshotId snapshot ID when the data file was added to the table
    */
-  public void existing(DataFile existingFile, long fileSnapshotId) {
-    addEntry(reused.wrapExisting(fileSnapshotId, existingFile));
+  public void existing(DataFile existingFile, long fileSnapshotId, long sequenceNumber) {

Review comment:
       Manifest rewrite uses the `existing(ManifestEntry)` method that copies from the original entry. That will copy the existing entry's sequence number, or 0 if it was null.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on issue #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#issuecomment-616765034


   @rdblue, LGTM. I had only two questions. 
   
   I think we need to update Javadoc in one place and should be good to go.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#issuecomment-616807935


   Thanks, @aokolnychyi, @chenjunjiedada, and @openinx for reviewing!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411690685



##########
File path: core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java
##########
@@ -30,22 +31,40 @@ static InheritableMetadata empty() {
   }
 
   static InheritableMetadata fromManifest(ManifestFile manifest) {
-    return new BaseInheritableMetadata(manifest.snapshotId());
+    if (manifest.snapshotId() != null) {
+      return new BaseInheritableMetadata(manifest.snapshotId(), manifest.sequenceNumber());
+    } else {
+      return NOOP;

Review comment:
       I'll look into removing it entirely, although I don't mind it very much. As long as it is baked into the reader and unavoidable, it seems like a clean way to apply the inherited values.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411633628



##########
File path: core/src/main/java/org/apache/iceberg/ManifestWriter.java
##########
@@ -116,12 +126,12 @@ void add(ManifestEntry entry) {
    * @param existingFile a data file
    * @param fileSnapshotId snapshot ID when the data file was added to the table
    */
-  public void existing(DataFile existingFile, long fileSnapshotId) {
-    addEntry(reused.wrapExisting(fileSnapshotId, existingFile));
+  public void existing(DataFile existingFile, long fileSnapshotId, long sequenceNumber) {

Review comment:
       Also, I am using this method in the rewrite manifests action. The entries metadata table will have `sequenceNumber` but its value will be 0, correct?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#discussion_r411641421



##########
File path: core/src/main/java/org/apache/iceberg/ManifestWriter.java
##########
@@ -116,12 +126,12 @@ void add(ManifestEntry entry) {
    * @param existingFile a data file
    * @param fileSnapshotId snapshot ID when the data file was added to the table
    */
-  public void existing(DataFile existingFile, long fileSnapshotId) {
-    addEntry(reused.wrapExisting(fileSnapshotId, existingFile));
+  public void existing(DataFile existingFile, long fileSnapshotId, long sequenceNumber) {

Review comment:
       The new action should use the sequence number from the metadata table. That will be 0 for metadata written without sequence numbers, or the correct sequence number for v2 tables.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on issue #913: Add v2 manifests

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #913:
URL: https://github.com/apache/incubator-iceberg/pull/913#issuecomment-616828296


   Thanks for doing this, @rdblue! Shall we take a look at #936 next?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org