You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/20 11:54:39 UTC

[GitHub] [iceberg] chenjunjiedada opened a new pull request #1219: Make GenericManifestFile using Iceberg generic data model

chenjunjiedada opened a new pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219


   This PR replaces the data model of `GenericManifestFile` from Avro `IndexedRecord` to Iceberg `Record`.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r466020016



##########
File path: core/src/main/java/org/apache/iceberg/V2Metadata.java
##########
@@ -56,87 +58,31 @@ private V2Metadata() {
    * This is used to maintain compatibility with v2 by writing manifest list files with the old schema, instead of
    * writing a sequence number into metadata files in v2 tables.
    */
-  static class IndexedManifestFile implements ManifestFile, IndexedRecord {
+  static class IndexedManifestFile implements ManifestFile, Record {

Review comment:
       The Indexed* classes can still be used for writing. That way, we can convert the read path and then convert the write path later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r467548401



##########
File path: core/src/main/java/org/apache/iceberg/data/avro/GenericReaders.java
##########
@@ -133,4 +143,40 @@ protected void set(Record struct, int pos, Object value) {
       struct.set(pos, value);
     }
   }
+
+  private static class InternalRecordReader<R extends Record> extends ValueReaders.StructReader<R> {
+    private final Class<R> recordClass;
+    private final DynConstructors.Ctor<R> ctor;
+    private final Schema schema;
+
+    InternalRecordReader(List<ValueReader<?>> readers, Class<R> recordClass, Schema schema,
+                         Map<Integer, ?> idToConstant) {

Review comment:
       Yes, the `Record` implements the `StructLike`, so we need the Schema for setter and getter in case of projection.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada removed a comment on pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada removed a comment on pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#issuecomment-706973503


   @rdblue , Do we still need this change?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#issuecomment-671022845


   @rdblue , I think this PR only changes `GenericManifestFile` that is the content of the manifest list. The manifest is not changed because the `GenericManifestEntry` is not changed. Did I misunderstand?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r457317940



##########
File path: api/src/main/java/org/apache/iceberg/data/Record.java
##########
@@ -32,7 +32,7 @@
 
   Object get(int pos);
 
-  Record copy();
+  Record copyRecord();

Review comment:
       Both `Record` API and `Manifest` API define the copy method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#issuecomment-669940130


   @rdblue, Thanks a lot for comments, just replied some of them, I will go through rest and update accordingly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r502763700



##########
File path: core/src/main/java/org/apache/iceberg/GenericManifestFile.java
##########
@@ -68,12 +74,14 @@ public GenericManifestFile(org.apache.avro.Schema avroSchema) {
     List<Types.NestedField> allFields = ManifestFile.schema().asStruct().fields();
 
     this.fromProjectionPos = new int[fields.size()];
+    this.nameToProjectedPos = new HashMap<>();

Review comment:
       nit:  we usually use `Maps.newHashMap`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada closed pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada closed pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r466417351



##########
File path: api/src/main/java/org/apache/iceberg/data/Record.java
##########
@@ -32,7 +32,7 @@
 
   Object get(int pos);
 
-  Record copy();
+  Record copyRecord();

Review comment:
       If two interfaces contain a method with the same signature but different return types, then it is impossible to implement both the interface simultaneously.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r457317940



##########
File path: api/src/main/java/org/apache/iceberg/data/Record.java
##########
@@ -32,7 +32,7 @@
 
   Object get(int pos);
 
-  Record copy();
+  Record copyRecord();

Review comment:
       Both `Record` API and `Manifest` API define the copy method. This need to refactor.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#issuecomment-669527731


   @chenjunjiedada, why does this change manifest lists and manifests? Those are written separately, so I would expect this to only need to change one. Probably easier to start with converting manifest lists.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r466017701



##########
File path: core/src/main/java/org/apache/iceberg/V1Metadata.java
##########
@@ -184,6 +153,86 @@ public Long deletedRowsCount() {
     public ManifestFile copy() {
       return wrapped.copy();
     }
+
+    @Override
+    public <T> T get(int pos, Class<T> javaClass) {
+      return javaClass.cast(get(pos));
+    }
+
+    @Override
+    public Object get(int pos) {

Review comment:
       Why did this move?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r466016596



##########
File path: api/src/main/java/org/apache/iceberg/data/Record.java
##########
@@ -32,7 +32,7 @@
 
   Object get(int pos);
 
-  Record copy();
+  Record copyRecord();

Review comment:
       I don't think this should break the `Record` API. Why can't the class implement both in the same method by returning `GenericManifestFile`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r467155175



##########
File path: api/src/main/java/org/apache/iceberg/data/Record.java
##########
@@ -32,7 +32,7 @@
 
   Object get(int pos);
 
-  Record copy();
+  Record copyRecord();

Review comment:
       This works for me:
   
   ```java
     interface A {
       A copy();
     }
     
     interface B {
       B copy();
     }
     
     public class AB implements A, B {
       @Override
       public AB copy() {
         return this;
       }
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r467551799



##########
File path: core/src/main/java/org/apache/iceberg/V2Metadata.java
##########
@@ -56,87 +58,31 @@ private V2Metadata() {
    * This is used to maintain compatibility with v2 by writing manifest list files with the old schema, instead of
    * writing a sequence number into metadata files in v2 tables.
    */
-  static class IndexedManifestFile implements ManifestFile, IndexedRecord {
+  static class IndexedManifestFile implements ManifestFile, Record {

Review comment:
       Could you help to point out what do we need to do for write path?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#issuecomment-666378082


   @rdblue , would you please help to take a look whether this is in the right direction? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada closed pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada closed pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r466422605



##########
File path: core/src/main/java/org/apache/iceberg/V1Metadata.java
##########
@@ -184,6 +153,86 @@ public Long deletedRowsCount() {
     public ManifestFile copy() {
       return wrapped.copy();
     }
+
+    @Override
+    public <T> T get(int pos, Class<T> javaClass) {
+      return javaClass.cast(get(pos));
+    }
+
+    @Override
+    public Object get(int pos) {

Review comment:
       I met some checkstyle errors about breaking the order of override functions, so I reorder the functions. I will take a look at this again.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r467543877



##########
File path: api/src/main/java/org/apache/iceberg/data/Record.java
##########
@@ -32,7 +32,7 @@
 
   Object get(int pos);
 
-  Record copy();
+  Record copyRecord();

Review comment:
       Just learned!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r466422605



##########
File path: core/src/main/java/org/apache/iceberg/V1Metadata.java
##########
@@ -184,6 +153,86 @@ public Long deletedRowsCount() {
     public ManifestFile copy() {
       return wrapped.copy();
     }
+
+    @Override
+    public <T> T get(int pos, Class<T> javaClass) {
+      return javaClass.cast(get(pos));
+    }
+
+    @Override
+    public Object get(int pos) {

Review comment:
       I met some checkstyle errors about breaking the order of overloaded functions, so I reorder the functions. I will take a look at this again.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r467552976



##########
File path: core/src/main/java/org/apache/iceberg/V1Metadata.java
##########
@@ -184,6 +153,86 @@ public Long deletedRowsCount() {
     public ManifestFile copy() {
       return wrapped.copy();
     }
+
+    @Override
+    public <T> T get(int pos, Class<T> javaClass) {
+      return javaClass.cast(get(pos));
+    }
+
+    @Override
+    public Object get(int pos) {

Review comment:
       Changed it back.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#issuecomment-706973503


   @rdblue , Do we still need this change?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1219: Make GenericManifestFile using Iceberg generic data model

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1219:
URL: https://github.com/apache/iceberg/pull/1219#discussion_r466019457



##########
File path: core/src/main/java/org/apache/iceberg/data/avro/GenericReaders.java
##########
@@ -133,4 +143,40 @@ protected void set(Record struct, int pos, Object value) {
       struct.set(pos, value);
     }
   }
+
+  private static class InternalRecordReader<R extends Record> extends ValueReaders.StructReader<R> {
+    private final Class<R> recordClass;
+    private final DynConstructors.Ctor<R> ctor;
+    private final Schema schema;
+
+    InternalRecordReader(List<ValueReader<?>> readers, Class<R> recordClass, Schema schema,
+                         Map<Integer, ?> idToConstant) {

Review comment:
       I see the need for a reader that produces records of some specific class. I don't understand why that needs to accept an Avro schema, though. The only requirement should be that the class implements StructLike and accepts an Iceberg schema to set its projection.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org