You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/01 05:03:36 UTC

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #3249: Optimized spark vectorized read parquet decimal

singhpk234 commented on code in PR #3249:
URL: https://github.com/apache/iceberg/pull/3249#discussion_r861231678


##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java:
##########
@@ -51,6 +52,25 @@ public VectorHolder(
     this.dictionary = dictionary;
     this.nullabilityHolder = holder;
     this.icebergType = type;
+    this.originalIcebergType = type;
+  }
+
+  public VectorHolder(

Review Comment:
   +1, if we want to keep it we can move the preconditions / assignments to the new constructor and just call :
   ```java
   this(columnDescriptor, vector, isDictionaryEncoded, dictionary, holder, type, type);
   ```
   
   from old constructor.



##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/GenericArrowVectorAccessorFactory.java:
##########
@@ -163,12 +165,18 @@ private ArrowVectorAccessor<DecimalT, Utf8StringT, ArrayT, ChildVectorT> getDict
 
   @SuppressWarnings("checkstyle:CyclomaticComplexity")
   private ArrowVectorAccessor<DecimalT, Utf8StringT, ArrayT, ChildVectorT>
-      getPlainVectorAccessor(FieldVector vector) {
+      getPlainVectorAccessor(FieldVector vector, PrimitiveType primitive) {
     if (vector instanceof BitVector) {
       return new BooleanAccessor<>((BitVector) vector);
     } else if (vector instanceof IntVector) {
+      if (primitive != null && OriginalType.DECIMAL.equals(primitive.getOriginalType())) {

Review Comment:
   [minor] This logic of using can be extracted in local variable in start of function , subsequently we can check that local in L#172, L#177, L#206



##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java:
##########
@@ -84,13 +85,28 @@ public VectorizedArrowReader(
       BufferAllocator ra,
       boolean setArrowValidityVector) {
     this.icebergField = icebergField;
+    this.logicalTypeField = icebergField;
+    this.columnDescriptor = desc;
+    this.rootAlloc = ra;
+    this.vectorizedColumnIterator = new VectorizedColumnIterator(desc, "", setArrowValidityVector);

Review Comment:
   can call new constructor with fields we don't have as null / empty.



##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/GenericArrowVectorAccessorFactory.java:
##########
@@ -92,12 +93,13 @@ public ArrowVectorAccessor<DecimalT, Utf8StringT, ArrayT, ChildVectorT> getVecto
     Dictionary dictionary = holder.dictionary();
     boolean isVectorDictEncoded = holder.isDictionaryEncoded();
     FieldVector vector = holder.vector();
+    ColumnDescriptor desc = holder.descriptor();
+    // desc could be null when the holder is PositionVectorHolder

Review Comment:
   [question] [ConstantVectorHolder](https://github.com/apache/iceberg/blob/master/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java#L119-L141) can also have descriptor() value as null, is it not being called out here as its mostly treated as `dummy` ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org