You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2018/08/14 01:06:29 UTC

[GitHub] paul-rogers commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

paul-rogers commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r209803870
 
 

 ##########
 File path: exec/vector/src/main/java/org/apache/drill/exec/vector/complex/ListVector.java
 ##########
 @@ -250,22 +270,140 @@ public void load(UserBitShared.SerializedField metadata, DrillBuf buffer) {
     bits.load(bitMetadata, buffer.slice(offsetLength, bitLength));
 
     final UserBitShared.SerializedField vectorMetadata = metadata.getChild(2);
-    if (getDataVector() == DEFAULT_DATA_VECTOR) {
+    if (isEmptyType()) {
       addOrGetVector(VectorDescriptor.create(vectorMetadata.getMajorType()));
     }
 
     final int vectorLength = vectorMetadata.getBufferLength();
     vector.load(vectorMetadata, buffer.slice(offsetLength + bitLength, vectorLength));
   }
 
+  public boolean isEmptyType() {
+    return getDataVector() == DEFAULT_DATA_VECTOR;
+  }
+
+  @Override
+  public void setChildVector(ValueVector childVector) {
+
+    // Unlike the repeated list vector, the (plain) list vector
+    // adds the dummy vector as a child type.
+
+    assert field.getChildren().size() == 1;
+    assert field.getChildren().iterator().next().getType().getMinorType() == MinorType.LATE;
+    field.removeChild(vector.getField());
+
+    super.setChildVector(childVector);
+
+    // Initial LATE type vector not added as a subtype initially.
+    // So, no need to remove it, just add the new subtype. Since the
+    // MajorType is immutable, must build a new one and replace the type
+    // in the materialized field. (We replace the type, rather than creating
+    // a new materialized field, to preserve the link to this field from
+    // a parent map, list or union.)
+
+    assert field.getType().getSubTypeCount() == 0;
+    field.replaceType(
+        field.getType().toBuilder()
+          .addSubType(childVector.getField().getType().getMinorType())
+          .build());
+  }
+
+  /**
+   * Promote the list to a union. Called from old-style writers. This implementation
+   * relies on the caller to set the types vector for any existing values.
+   * This method simply clears the existing vector.
+   *
+   * @return the new union vector
+   */
+
   public UnionVector promoteToUnion() {
-    MaterializedField newField = MaterializedField.create(getField().getName(), Types.optional(MinorType.UNION));
-    UnionVector vector = new UnionVector(newField, allocator, null);
+    UnionVector vector = createUnion();
+
+    // Replace the current vector, clearing its data. (This is the
+    // old behavior.
+
     replaceDataVector(vector);
     reader = new UnionListReader(this);
     return vector;
   }
 
+  /**
+   * Revised form of promote to union that correctly fixes up the list
+   * field metadata to match the new union type.
+   *
+   * @return the new union vector
+   */
+
+  public UnionVector promoteToUnion2() {
+    UnionVector unionVector = createUnion();
+
+    // Replace the current vector, clearing its data. (This is the
+    // old behavior.
+
+    setChildVector(unionVector);
+    return unionVector;
+  }
+
+  /**
+   * Promote to a union, preserving the existing data vector as a member of
+   * the new union. Back-fill the types vector with the proper type value
+   * for existing rows.
+   *
+   * @return the new union vector
+   */
+
+  public UnionVector convertToUnion(int allocValueCount, int valueCount) {
+    assert allocValueCount >= valueCount;
+    UnionVector unionVector = createUnion();
+    unionVector.allocateNew(allocValueCount);
+
+    // Preserve the current vector (and its data) if it is other than
+    // the default. (New behavior used by column writers.)
+
+    if (! isEmptyType()) {
+      unionVector.addType(vector);
+      int prevType = vector.getField().getType().getMinorType().getNumber();
+      UInt1Vector.Mutator typeMutator = unionVector.getTypeVector().getMutator();
+
+      // If the previous vector was nullable, then promote the nullable state
+      // to the type vector by setting either the null marker or the type
+      // marker depending on the original nullable values.
+
+      if (vector instanceof NullableVector) {
+        UInt1Vector.Accessor bitsAccessor =
+            ((UInt1Vector) ((NullableVector) vector).getBitsVector()).getAccessor();
+        for (int i = 0; i < valueCount; i++) {
+          typeMutator.setSafe(i, (bitsAccessor.get(i) == 0)
+              ? UnionVector.NULL_MARKER
+              : prevType);
+        }
+      } else {
+
+        // The value is not nullable. (Perhaps it is a map.)
+        // Note that the original design of lists have a flaw: if the sole member
+        // is a map, then map entries can't be nullable when the only type, but
+        // become nullable when in a union. What a mess...
+
+        for (int i = 0; i < valueCount; i++) {
+          typeMutator.setSafe(i, prevType);
+        }
+      }
+    }
+    vector = unionVector;
+    return unionVector;
+  }
+
+  private UnionVector createUnion() {
+    MaterializedField newField = MaterializedField.create(UNION_VECTOR_NAME, Types.optional(MinorType.UNION));
+    UnionVector unionVector = new UnionVector(newField, allocator, null);
+
+    // For efficiency, should not create a reader that will never be used.
+    // Keeping for backward compatibility.
+
+    reader = new UnionListReader(this);
 
 Review comment:
   This vector already creates a reader for every vector. Since I didn't have time to track down uses, I kept this code. The Union vector is normally created in the older "complex writer" which created the reader.
   
   Moving forward, we should revisit union vectors (and the older complex writer). At that time, we can determine if the vector needs to own a reader, or if the reader can be created as needed.
   
   If we create a "result set reader" that works cross-batch in parallel with the result set loader (for writing), then vectors would not carry their own readers.
   
   Yes indeed, union vectors are a mess.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services