You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/09 10:33:40 UTC

[GitHub] [iceberg] chenjunjiedada opened a new pull request #1186: refactor avro reader and writer

chenjunjiedada opened a new pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186


   This is for #1152 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada edited a comment on pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
chenjunjiedada edited a comment on pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186#issuecomment-658152290


   @rdblue, just updated PR description. The first step for refactoring I 'd like to take is to move public Avro reader and writer to iceberg-data subproject so that users can avoid referring the reader/writer in iceberg-core, and then remove `DataReader` and `DataWriter` in one of the later releases.
   
   It does look confusing between `GenericAvroReader` and `AvroGenericReader`, how about we rename the internal one to `AvroReader`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186#issuecomment-657799919


   @chenjunjiedada, can you summarize the changes here in the PR description?
   
   Looks like this is primarily a rename, `DataReader` -> `AvroGenericReader`, using a base class, `BaseAvroGenericReader` for the read path. That name seems confusing to me because we now have `o.a.i.avro.GenericAvroReader` and `o.a.i.data.avro.AvroGenericReader`.
   
   I understand the goal of removing the reader for Avro generics. What is your plan to get there?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186#issuecomment-658288472


   I don't think that these renames are a good idea right now. There isn't a big difference between having `AvroReader` and `GenericAvroReader` or `DataReader` and `AvroGenericReader`. I think the rename makes it _more_ confusing, but even if it is slightly better, I don't think it is worth the code churn.
   
   I think a better path to removing the support for Avro generics is to wait until the end to rename and start by moving internal classes to use a `ValueReader` tree that doesn't depend on Avro's object model.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186#issuecomment-658494234


   OK, make sense to me. Let me think about replacing `IndexedRecord` with `Record` at first. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186#issuecomment-701240758


   The PR: https://github.com/apache/iceberg/pull/1219 starting to replace `IndexedRecord` to `Record`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186#issuecomment-658152290


   @rdblue, just updated PR description. The first step for refactoring I 'd like to take is to move public Avro reader and writer to iceberg-data subproject so that users can avoid referring the reader/writer in iceberg-core, and then remove `DataReader` and `DataWriter` in one of the later release. It does look confusing between `GenericAvroReader` and `AvroGenericReader`, how about we rename the internal one to `AvroReader`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186#discussion_r452245473



##########
File path: core/src/main/java/org/apache/iceberg/data/avro/DataReader.java
##########
@@ -19,192 +19,25 @@
 
 package org.apache.iceberg.data.avro;
 
-import java.io.IOException;
-import java.util.HashMap;
-import java.util.List;
 import java.util.Map;
-import org.apache.avro.LogicalType;
-import org.apache.avro.LogicalTypes;
-import org.apache.avro.Schema;
-import org.apache.avro.io.DatumReader;
-import org.apache.avro.io.Decoder;
-import org.apache.avro.io.DecoderFactory;
-import org.apache.avro.io.ResolvingDecoder;
-import org.apache.iceberg.avro.AvroSchemaUtil;
-import org.apache.iceberg.avro.AvroSchemaWithTypeVisitor;
-import org.apache.iceberg.avro.ValueReader;
-import org.apache.iceberg.avro.ValueReaders;
-import org.apache.iceberg.exceptions.RuntimeIOException;
+import org.apache.iceberg.Schema;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
-import org.apache.iceberg.relocated.com.google.common.collect.MapMaker;
-import org.apache.iceberg.types.Type;
-import org.apache.iceberg.types.Types;
 
-public class DataReader<T> implements DatumReader<T> {
-
-  private static final ThreadLocal<Map<Schema, Map<Schema, ResolvingDecoder>>> DECODER_CACHES =
-      ThreadLocal.withInitial(() -> new MapMaker().weakKeys().makeMap());
+/**
+ * @deprecated will be removed, please use AvroGenericReader instead.
+ */
+@Deprecated
+public class DataReader<T> extends BaseAvroGenericReader<T> {
+  DataReader(Schema expectedSchema, org.apache.avro.Schema readSchema, Map<Integer, ?> idToConstant) {
+    super(expectedSchema, readSchema, idToConstant);
+  }
 
-  public static <D> DataReader<D> create(org.apache.iceberg.Schema expectedSchema, Schema readSchema) {
+  public static <D> DataReader<D> create(Schema expectedSchema, org.apache.avro.Schema readSchema) {

Review comment:
       No, I import iceberg schema instead of avro schema. Let me update this to make it clear.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186#discussion_r452160936



##########
File path: core/src/main/java/org/apache/iceberg/data/avro/DataReader.java
##########
@@ -19,192 +19,25 @@
 
 package org.apache.iceberg.data.avro;
 
-import java.io.IOException;
-import java.util.HashMap;
-import java.util.List;
 import java.util.Map;
-import org.apache.avro.LogicalType;
-import org.apache.avro.LogicalTypes;
-import org.apache.avro.Schema;
-import org.apache.avro.io.DatumReader;
-import org.apache.avro.io.Decoder;
-import org.apache.avro.io.DecoderFactory;
-import org.apache.avro.io.ResolvingDecoder;
-import org.apache.iceberg.avro.AvroSchemaUtil;
-import org.apache.iceberg.avro.AvroSchemaWithTypeVisitor;
-import org.apache.iceberg.avro.ValueReader;
-import org.apache.iceberg.avro.ValueReaders;
-import org.apache.iceberg.exceptions.RuntimeIOException;
+import org.apache.iceberg.Schema;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
-import org.apache.iceberg.relocated.com.google.common.collect.MapMaker;
-import org.apache.iceberg.types.Type;
-import org.apache.iceberg.types.Types;
 
-public class DataReader<T> implements DatumReader<T> {
-
-  private static final ThreadLocal<Map<Schema, Map<Schema, ResolvingDecoder>>> DECODER_CACHES =
-      ThreadLocal.withInitial(() -> new MapMaker().weakKeys().makeMap());
+/**
+ * @deprecated will be removed, please use AvroGenericReader instead.
+ */
+@Deprecated
+public class DataReader<T> extends BaseAvroGenericReader<T> {
+  DataReader(Schema expectedSchema, org.apache.avro.Schema readSchema, Map<Integer, ?> idToConstant) {
+    super(expectedSchema, readSchema, idToConstant);
+  }
 
-  public static <D> DataReader<D> create(org.apache.iceberg.Schema expectedSchema, Schema readSchema) {
+  public static <D> DataReader<D> create(Schema expectedSchema, org.apache.avro.Schema readSchema) {

Review comment:
       I think this changes the semantics of deprecated `DataReader#create`,  the arguments has been swapped. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada closed pull request #1186: refactor avro reader and writer

Posted by GitBox <gi...@apache.org>.
chenjunjiedada closed pull request #1186:
URL: https://github.com/apache/iceberg/pull/1186


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org