You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/10 01:35:39 UTC

[GitHub] [iceberg] shardulm94 opened a new pull request #1191: ORC: Use ConstantReader for identity partition columns

shardulm94 opened a new pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191


   Fixes #1056 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191#discussion_r453034547



##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java
##########
@@ -181,12 +173,25 @@ private T readInternal(T struct, ColumnVector[] columnVectors, int row) {
       for (int c = 0; c < readers.length; ++c) {
         set(struct, c, reader(c).read(columnVectors[c], row));
       }
+      return struct;
+    }
+  }
 
-      for (int i = 0; i < positions.length; i += 1) {
-        set(struct, positions[i], constants[i]);
-      }
+  private static class ConstantReader<C> implements OrcValueReader<C> {
+    private final C constant;
 
-      return struct;
+    private ConstantReader(C constant) {
+      this.constant = constant;
+    }
+
+    @Override
+    public C read(ColumnVector ignored, int ignoredRow) {

Review comment:
       Yes, I guess we can do that by not asking ORC to project these column. Let me give it a try.

##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java
##########
@@ -181,12 +173,25 @@ private T readInternal(T struct, ColumnVector[] columnVectors, int row) {
       for (int c = 0; c < readers.length; ++c) {
         set(struct, c, reader(c).read(columnVectors[c], row));
       }
+      return struct;
+    }
+  }
 
-      for (int i = 0; i < positions.length; i += 1) {
-        set(struct, positions[i], constants[i]);
-      }
+  private static class ConstantReader<C> implements OrcValueReader<C> {
+    private final C constant;
 
-      return struct;
+    private ConstantReader(C constant) {
+      this.constant = constant;
+    }
+
+    @Override
+    public C read(ColumnVector ignored, int ignoredRow) {

Review comment:
       Yes, I guess we can do that by not asking ORC to project these columns. Let me give it a try.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191#discussion_r453123543



##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java
##########
@@ -178,15 +171,29 @@ public T nonNullRead(ColumnVector vector, int row) {
     }
 
     private T readInternal(T struct, ColumnVector[] columnVectors, int row) {
-      for (int c = 0; c < readers.length; ++c) {
-        set(struct, c, reader(c).read(columnVectors[c], row));
+      for (int c = 0, vectorIndex = 0; c < readers.length; ++c) {
+        ColumnVector vector = isConstantField[c] ? null : columnVectors[vectorIndex++];

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 edited a comment on pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
shardulm94 edited a comment on pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191#issuecomment-656940851


   I think this code will be changed a bit in #1021 to handle not just constant columns but also metadata columns, since we would want to avoid materializing a `ColumnVector` for metadata columns too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191#discussion_r452967447



##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java
##########
@@ -181,12 +173,25 @@ private T readInternal(T struct, ColumnVector[] columnVectors, int row) {
       for (int c = 0; c < readers.length; ++c) {
         set(struct, c, reader(c).read(columnVectors[c], row));
       }
+      return struct;
+    }
+  }
 
-      for (int i = 0; i < positions.length; i += 1) {
-        set(struct, positions[i], constants[i]);
-      }
+  private static class ConstantReader<C> implements OrcValueReader<C> {
+    private final C constant;
 
-      return struct;
+    private ConstantReader(C constant) {
+      this.constant = constant;
+    }
+
+    @Override
+    public C read(ColumnVector ignored, int ignoredRow) {

Review comment:
       Is `ColumnVector` still materialized? Is it possible to avoid reading that entirely?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191#issuecomment-656940851


   I think this code will be changed a bit in #1021 to handle not just constant columns but also metadata columns, since we would also want to avoid materializing a `ColumnVector` for metadata columns.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191#discussion_r453117756



##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java
##########
@@ -178,15 +171,29 @@ public T nonNullRead(ColumnVector vector, int row) {
     }
 
     private T readInternal(T struct, ColumnVector[] columnVectors, int row) {
-      for (int c = 0; c < readers.length; ++c) {
-        set(struct, c, reader(c).read(columnVectors[c], row));
+      for (int c = 0, vectorIndex = 0; c < readers.length; ++c) {
+        ColumnVector vector = isConstantField[c] ? null : columnVectors[vectorIndex++];

Review comment:
       Minor: we discourage using the return value of `++` expressions because it is error prone and makes code harder to read. Could you move the increment to a separate line?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191#issuecomment-656934974


   +1, just a minor issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191#discussion_r453039517



##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcValueReaders.java
##########
@@ -181,12 +173,25 @@ private T readInternal(T struct, ColumnVector[] columnVectors, int row) {
       for (int c = 0; c < readers.length; ++c) {
         set(struct, c, reader(c).read(columnVectors[c], row));
       }
+      return struct;
+    }
+  }
 
-      for (int i = 0; i < positions.length; i += 1) {
-        set(struct, positions[i], constants[i]);
-      }
+  private static class ConstantReader<C> implements OrcValueReader<C> {
+    private final C constant;
 
-      return struct;
+    private ConstantReader(C constant) {
+      this.constant = constant;
+    }
+
+    @Override
+    public C read(ColumnVector ignored, int ignoredRow) {

Review comment:
       That's going to be a bigger time savings and that's what we do for Parquet. We just drop it from the projection we pass down to the format.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1191: ORC: Use ConstantReader for identity partition columns

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1191:
URL: https://github.com/apache/iceberg/pull/1191


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org