You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/09/30 12:44:42 UTC

[GitHub] [iceberg] chenjunjiedada opened a new pull request #1532: MR: use encryption manager for input files

chenjunjiedada opened a new pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532


   This along with #1526 fixes https://github.com/apache/iceberg/issues/865.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#discussion_r497909037



##########
File path: mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergSplit.java
##########
@@ -79,12 +86,36 @@ public void write(DataOutput out) throws IOException {
     byte[] data = SerializationUtil.serializeToBytes(this.task);
     out.writeInt(data.length);
     out.write(data);
+
+    byte[] ioData = SerializationUtil.serializeToBytes(io);

Review comment:
       I'm ok with that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#discussion_r497779995



##########
File path: mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergSplit.java
##########
@@ -79,12 +85,36 @@ public void write(DataOutput out) throws IOException {
     byte[] data = SerializationUtil.serializeToBytes(this.task);
     out.writeInt(data.length);
     out.write(data);
+
+    byte[] ioData = SerializationUtil.serializeToBytes(io);
+    out.writeInt(ioData.length);
+    out.write(ioData);
+
+    byte[] encryptionManagerData = SerializationUtil.serializeToBytes(encryptionManager);
+    out.writeInt(encryptionManagerData.length);
+    out.write(encryptionManagerData);
   }
 
   @Override
   public void readFields(DataInput in) throws IOException {
     byte[] data = new byte[in.readInt()];
     in.readFully(data);
     this.task = SerializationUtil.deserializeFromBytes(data);
+
+    byte[] ioData = new byte[in.readInt()];
+    in.readFully(ioData);
+    this.io = SerializationUtil.deserializeFromBytes(ioData);
+
+    byte[] encryptionManagerData = new byte[in.readInt()];
+    in.readFully(encryptionManagerData);
+    this.encryptionManager = SerializationUtil.deserializeFromBytes(encryptionManagerData);
+  }
+
+  public FileIO fileIO() {

Review comment:
       This is named `io` in `Table`, so I don't think a longer name is needed here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada removed a comment on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
chenjunjiedada removed a comment on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-701730202


   Wait, this seems to serialize the configuration in `io`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-701875823


   > The HadoopFileIO contains SerailiableSupplier<Configuration>, so it should be serialized by SerializationUtil and then Writable, therefore satisfy the serialization requirement for MR framework.
   
   Where is the `HadoopFileIO` being rebuilt with `SerializableSupplier<Configuration>`? In Spark, we have to check the `FileIO` and replace it with the one that references the `SerializableSupplier`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-701730202


   Wait, this seems to serialize the configuration in `io`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-701627572


   Looks good to me. Would be nice to rename that method to fit the convention used elsewhere.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#discussion_r497481270



##########
File path: mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergSplit.java
##########
@@ -79,12 +86,36 @@ public void write(DataOutput out) throws IOException {
     byte[] data = SerializationUtil.serializeToBytes(this.task);
     out.writeInt(data.length);
     out.write(data);
+
+    byte[] ioData = SerializationUtil.serializeToBytes(io);

Review comment:
       @rdblue @rdsr , I saw your comments in #985 about remove customized serialization. Now the `IcebergSplit` implements `org.apache.hadoop.mapred.InputSplit` which extends writable. So I keep serialization implementation. Does this make sense to you?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-702411604


   Thanks for fixing it, @chenjunjiedada! I'll merge this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-702450378


   Thanks for merging!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada edited a comment on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
chenjunjiedada edited a comment on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-701718100


   @rdblue , The `HadoopFileIO` contains `SerailiableSupplier<Configuration>`, so it should be serialized by `SerializationUtil` and then `Writable`, therefore satisfy the serialization requirement for MR framework. Does that make sense to you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-701718100


   @rdblue , Just added Configuration serialization.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1532: MR: use encryption manager for input files

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1532:
URL: https://github.com/apache/iceberg/pull/1532#issuecomment-701885476


   @rdblue , Just added this. I missed that the configuration used to build `SerialableSupplier<Configuration>` also need to be serializable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org