You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by paul-rogers <gi...@git.apache.org> on 2018/02/06 05:03:06 UTC

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/1112

    DRILL-6114: Metadata revisions

    This PR is part of the "[batch handling updates\https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades]" project. It completes the new internal metadata system by including support for the remaining vector types: unions, lists and repeated lists.
    
    The metadata code was refactored. Previously, it was small enough to fit into a single file (with nested classes). With the added complexity, the metadata classes were split out into separate classes, grouped into its own Java package.
    
    A few fixes were made here and there to ensure the unit tests pass.
    
    @ppadma or @bitblender, can one of you run the pre-commit tests and send me the details of any failures? 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-6114B

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1112
    
----
commit e0f11f923ea48070f829a859395ee66b81b9ba18
Author: Paul Rogers <pr...@...>
Date:   2018-02-06T04:18:18Z

    DRILL-6114: Metadata revisions
    
    Support for union vectors, list vectors, repeated list vectors. Refactored metadata classes.

----


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168295700
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/RepeatedListColumnMetadata.java ---
    @@ -0,0 +1,94 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.record.metadata;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +import com.google.common.base.Preconditions;
    +
    +public class RepeatedListColumnMetadata extends AbstractColumnMetadata {
    +
    +  private AbstractColumnMetadata childSchema;
    +
    +  public RepeatedListColumnMetadata(MaterializedField field) {
    +    super(field);
    +    Preconditions.checkArgument(field.getType().getMinorType() == MinorType.LIST);
    +    Preconditions.checkArgument(field.getType().getMode() == DataMode.REPEATED);
    +    Preconditions.checkArgument(field.getChildren().size() <= 1);
    +    if (! field.getChildren().isEmpty()) {
    +      childSchema = MetadataUtils.fromField(field.getChildren().iterator().next());
    +      Preconditions.checkArgument(childSchema.isArray());
    +    }
    +  }
    +
    +  public RepeatedListColumnMetadata(String name, AbstractColumnMetadata childSchema) {
    +    super(name, MinorType.LIST, DataMode.REPEATED);
    +    if (childSchema != null) {
    +      Preconditions.checkArgument(childSchema.isArray());
    +    }
    +    this.childSchema = childSchema;
    +  }
    +
    +  public void childSchema(ColumnMetadata childMetadata) {
    +    Preconditions.checkState(childSchema == null);
    +    Preconditions.checkArgument(childMetadata.mode() == DataMode.REPEATED);
    +    childSchema = (AbstractColumnMetadata) childMetadata;
    +  }
    +
    +  @Override
    +  public StructureType structureType() { return StructureType.MULTI_ARRAY; }
    +
    +  @Override
    +  public MaterializedField schema() {
    +    MaterializedField field = emptySchema();
    +    if (childSchema != null) {
    +      field.addChild(childSchema.schema());
    +    }
    +    return field;
    +  }
    +
    +  @Override
    +  public MaterializedField emptySchema() {
    +    return MaterializedField.create(name(), majorType());
    +  }
    +
    +  @Override
    +  public ColumnMetadata cloneEmpty() {
    +    return new RepeatedListColumnMetadata(name, null);
    +  }
    +
    +  @Override
    +  public AbstractColumnMetadata copy() {
    +    return new RepeatedListColumnMetadata(name, childSchema);
    +  }
    +
    +  @Override
    +  public ColumnMetadata childSchema() { return childSchema; }
    +
    +  @Override
    +  public int dimensions() {
    +
    +    // If there is no child, then we don't know the
    +    // dimensionality.
    +
    +    return childSchema == null ? -1
    --- End diff --
    
    can we use a static final instead of -1 ?


---

[GitHub] drill issue #1112: DRILL-6114: Metadata revisions

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1112
  
    @arina-ielchiieva, @parthchandra can either of you perhaps give this one a committer review? Thanks! 


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168681135
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ---
    @@ -0,0 +1,206 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.record.metadata;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Abstract definition of column metadata. Allows applications to create
    + * specialized forms of a column metadata object by extending from this
    + * abstract class.
    + * <p>
    + * Note that, by design, primitive columns do not have a link to their
    + * tuple parent, or their index within that parent. This allows the same
    + * metadata to be shared between two views of a tuple, perhaps physical
    + * and projected views. This restriction does not apply to map columns,
    + * since maps (and the row itself) will, by definition, differ between
    + * the two views.
    + */
    +
    +public abstract class AbstractColumnMetadata implements ColumnMetadata {
    +
    +  // Capture the key schema information. We cannot use the MaterializedField
    +  // or MajorType because then encode child information that we encode here
    +  // as a child schema. Keeping the two in sync is nearly impossible.
    +
    +  protected final String name;
    +  protected final MinorType type;
    +  protected final DataMode mode;
    +  protected final int precision;
    +  protected final int scale;
    +  protected boolean projected = true;
    +
    --- End diff --
    
    AFAIK, we do not support the time-zone aware data types at present.


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168681901
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ---
    @@ -0,0 +1,206 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.record.metadata;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Abstract definition of column metadata. Allows applications to create
    + * specialized forms of a column metadata object by extending from this
    + * abstract class.
    + * <p>
    + * Note that, by design, primitive columns do not have a link to their
    + * tuple parent, or their index within that parent. This allows the same
    + * metadata to be shared between two views of a tuple, perhaps physical
    + * and projected views. This restriction does not apply to map columns,
    + * since maps (and the row itself) will, by definition, differ between
    + * the two views.
    + */
    +
    +public abstract class AbstractColumnMetadata implements ColumnMetadata {
    +
    +  // Capture the key schema information. We cannot use the MaterializedField
    +  // or MajorType because then encode child information that we encode here
    +  // as a child schema. Keeping the two in sync is nearly impossible.
    +
    +  protected final String name;
    +  protected final MinorType type;
    +  protected final DataMode mode;
    +  protected final int precision;
    +  protected final int scale;
    +  protected boolean projected = true;
    +
    +  /**
    +   * Predicted number of elements per array entry. Default is
    +   * taken from the often hard-coded value of 10.
    +   */
    +
    +  protected int expectedElementCount = 1;
    +
    +  public AbstractColumnMetadata(MaterializedField schema) {
    +    name = schema.getName();
    +    MajorType majorType = schema.getType();
    +    type = majorType.getMinorType();
    +    mode = majorType.getMode();
    +    precision = majorType.getPrecision();
    +    scale = majorType.getScale();
    +    if (isArray()) {
    +      expectedElementCount = DEFAULT_ARRAY_SIZE;
    +    }
    +  }
    +
    +  public AbstractColumnMetadata(String name, MinorType type, DataMode mode) {
    +    this.name = name;
    +    this.type = type;
    +    this.mode = mode;
    +    precision = 0;
    +    scale = 0;
    +    if (isArray()) {
    +      expectedElementCount = DEFAULT_ARRAY_SIZE;
    +    }
    +  }
    +
    +  public AbstractColumnMetadata(AbstractColumnMetadata from) {
    +    name = from.name;
    +    type = from.type;
    +    mode = from.mode;
    +    precision = from.precision;
    +    scale = from.scale;
    +    expectedElementCount = from.expectedElementCount;
    +  }
    +
    +  protected void bind(TupleSchema parentTuple) { }
    +
    +  @Override
    +  public String name() { return name; }
    +
    +  @Override
    +  public MinorType type() { return type; }
    +
    +  @Override
    +  public MajorType majorType() {
    +    return MajorType.newBuilder()
    +        .setMinorType(type())
    +        .setMode(mode())
    +        .build();
    +  }
    +
    +  @Override
    +  public DataMode mode() { return mode; }
    +
    +  @Override
    +  public boolean isNullable() { return mode() == DataMode.OPTIONAL; }
    +
    +  @Override
    +  public boolean isArray() { return mode() == DataMode.REPEATED; }
    +
    +  @Override
    +  public int dimensions() { return isArray() ? 1 : 0; }
    +
    +  @Override
    +  public boolean isMap() { return false; }
    +
    +  @Override
    +  public boolean isVariant() { return false; }
    --- End diff --
    
    I think there is a write-up somewhere about this. "Variant" is an old term from Visual Basic for a tagged union (which is what our "Union" type really is.) It is used here because both a (non-repeated) List and a Union are both unions, which is confusing, so I used a separate term. Also, the "Union" term is confusing since there is also a UNION operator in SQL. `See VariantMetadata.java`.


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1112


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168682152
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/MetadataUtils.java ---
    @@ -0,0 +1,165 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.record.metadata;
    +
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +public class MetadataUtils {
    +
    +  public static TupleSchema fromFields(Iterable<MaterializedField> fields) {
    +    TupleSchema tuple = new TupleSchema();
    +    for (MaterializedField field : fields) {
    +      tuple.add(field);
    +    }
    +    return tuple;
    +  }
    +
    +  /**
    +   * Create a column metadata object that holds the given
    +   * {@link MaterializedField}. The type of the object will be either a
    +   * primitive or map column, depending on the field's type. The logic
    +   * here mimics the code as written, which is very messy in some places.
    +   *
    +   * @param field the materialized field to wrap
    +   * @return the column metadata that wraps the field
    +   */
    +
    +  public static AbstractColumnMetadata fromField(MaterializedField field) {
    +    MinorType type = field.getType().getMinorType();
    +    switch (type) {
    +    case MAP:
    +      return MetadataUtils.newMap(field);
    +    case UNION:
    +      if (field.getType().getMode() != DataMode.OPTIONAL) {
    +        throw new UnsupportedOperationException(type.name() + " type must be nullable");
    +      }
    +      return new VariantColumnMetadata(field);
    +    case LIST:
    +      switch (field.getType().getMode()) {
    +      case OPTIONAL:
    +
    +        // List of unions (or a degenerate union of a single type.)
    +        // Not supported in Drill.
    --- End diff --
    
    We do. This comment is in the wrong place, should be in default.


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168301865
  
    --- Diff: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/ColumnMetadata.java ---
    @@ -15,36 +15,115 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -package org.apache.drill.exec.record;
    +package org.apache.drill.exec.record.metadata;
     
     import org.apache.drill.common.types.TypeProtos.DataMode;
     import org.apache.drill.common.types.TypeProtos.MajorType;
     import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.MaterializedField;
     
     /**
      * Metadata description of a column including names, types and structure
      * information.
      */
     
     public interface ColumnMetadata {
    +
    +  /**
    +   * Rough characterization of Drill types into metadata categories.
    +   * Various aspects of Drill's type system are very, very messy.
    +   * However, Drill is defined by its code, not some abstract design,
    +   * so the metadata system here does the best job it can to simplify
    +   * the messy type system while staying close to the underlying
    +   * implementation.
    +   */
    +
       enum StructureType {
    -    PRIMITIVE, LIST, TUPLE
    +
    +    /**
    +     * Primitive column (all types except List, Map and Union.)
    +     * Includes (one-dimensional) arrays of those types.
    +     */
    +
    +    PRIMITIVE,
    +
    +    /**
    +     * Map or repeated map. Also describes the row as a whole.
    +     */
    +
    +    TUPLE,
    +
    +    /**
    +     * Union or (non-repeated) list. (A non-repeated list is,
    +     * essentially, a repeated union.)
    +     */
    +
    +    VARIANT,
    +
    +    /**
    +     * A repeated list. A repeated list is not simply the repeated
    +     * form of a list, it is something else entirely. It acts as
    +     * a dimensional wrapper around any other type (except list)
    +     * and adds a non-nullable extra dimension. Hence, this type is
    +     * for 2D+ arrays.
    +     * <p>
    +     * In theory, a 2D list of, say, INT would be an INT column, but
    +     * repeated in to dimensions. Alas, that is not how it is. Also,
    +     * if we have a separate category for 2D lists, we should have
    +     * a separate category for 1D lists. But, again, that is not how
    +     * the code has evolved.
    +     */
    +
    +    MULTI_ARRAY
       }
     
    -  public static final int DEFAULT_ARRAY_SIZE = 10;
    +  int DEFAULT_ARRAY_SIZE = 10;
    --- End diff --
    
    why is this changed from static final int to int ? 


---

[GitHub] drill issue #1112: DRILL-6114: Metadata revisions

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1112
  
    Thanks @ppadma!
    
    Next we'll need a reviewer. @parthchandra or @arina-ielchiieva, is this something you can review?
    
    Once this one is good, I'll follow up with another that includes revisions to the `SchemaBuilder` that allows tests to build schemas using the additional data types added here. 


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168682331
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/RepeatedListColumnMetadata.java ---
    @@ -0,0 +1,94 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.record.metadata;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +import com.google.common.base.Preconditions;
    +
    +public class RepeatedListColumnMetadata extends AbstractColumnMetadata {
    +
    +  private AbstractColumnMetadata childSchema;
    +
    +  public RepeatedListColumnMetadata(MaterializedField field) {
    +    super(field);
    +    Preconditions.checkArgument(field.getType().getMinorType() == MinorType.LIST);
    +    Preconditions.checkArgument(field.getType().getMode() == DataMode.REPEATED);
    +    Preconditions.checkArgument(field.getChildren().size() <= 1);
    +    if (! field.getChildren().isEmpty()) {
    +      childSchema = MetadataUtils.fromField(field.getChildren().iterator().next());
    +      Preconditions.checkArgument(childSchema.isArray());
    +    }
    +  }
    +
    +  public RepeatedListColumnMetadata(String name, AbstractColumnMetadata childSchema) {
    +    super(name, MinorType.LIST, DataMode.REPEATED);
    +    if (childSchema != null) {
    +      Preconditions.checkArgument(childSchema.isArray());
    +    }
    +    this.childSchema = childSchema;
    +  }
    +
    +  public void childSchema(ColumnMetadata childMetadata) {
    +    Preconditions.checkState(childSchema == null);
    +    Preconditions.checkArgument(childMetadata.mode() == DataMode.REPEATED);
    +    childSchema = (AbstractColumnMetadata) childMetadata;
    +  }
    +
    +  @Override
    +  public StructureType structureType() { return StructureType.MULTI_ARRAY; }
    +
    +  @Override
    +  public MaterializedField schema() {
    +    MaterializedField field = emptySchema();
    +    if (childSchema != null) {
    +      field.addChild(childSchema.schema());
    +    }
    +    return field;
    +  }
    +
    +  @Override
    +  public MaterializedField emptySchema() {
    +    return MaterializedField.create(name(), majorType());
    +  }
    +
    +  @Override
    +  public ColumnMetadata cloneEmpty() {
    +    return new RepeatedListColumnMetadata(name, null);
    +  }
    +
    +  @Override
    +  public AbstractColumnMetadata copy() {
    +    return new RepeatedListColumnMetadata(name, childSchema);
    +  }
    +
    +  @Override
    +  public ColumnMetadata childSchema() { return childSchema; }
    +
    +  @Override
    +  public int dimensions() {
    +
    +    // If there is no child, then we don't know the
    +    // dimensionality.
    +
    +    return childSchema == null ? -1
    --- End diff --
    
    Fixed.


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168682470
  
    --- Diff: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/ColumnMetadata.java ---
    @@ -15,36 +15,115 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -package org.apache.drill.exec.record;
    +package org.apache.drill.exec.record.metadata;
     
     import org.apache.drill.common.types.TypeProtos.DataMode;
     import org.apache.drill.common.types.TypeProtos.MajorType;
     import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.MaterializedField;
     
     /**
      * Metadata description of a column including names, types and structure
      * information.
      */
     
     public interface ColumnMetadata {
    +
    +  /**
    +   * Rough characterization of Drill types into metadata categories.
    +   * Various aspects of Drill's type system are very, very messy.
    +   * However, Drill is defined by its code, not some abstract design,
    +   * so the metadata system here does the best job it can to simplify
    +   * the messy type system while staying close to the underlying
    +   * implementation.
    +   */
    +
       enum StructureType {
    -    PRIMITIVE, LIST, TUPLE
    +
    +    /**
    +     * Primitive column (all types except List, Map and Union.)
    +     * Includes (one-dimensional) arrays of those types.
    +     */
    +
    +    PRIMITIVE,
    +
    +    /**
    +     * Map or repeated map. Also describes the row as a whole.
    +     */
    +
    +    TUPLE,
    +
    +    /**
    +     * Union or (non-repeated) list. (A non-repeated list is,
    +     * essentially, a repeated union.)
    +     */
    +
    +    VARIANT,
    +
    +    /**
    +     * A repeated list. A repeated list is not simply the repeated
    +     * form of a list, it is something else entirely. It acts as
    +     * a dimensional wrapper around any other type (except list)
    +     * and adds a non-nullable extra dimension. Hence, this type is
    +     * for 2D+ arrays.
    +     * <p>
    +     * In theory, a 2D list of, say, INT would be an INT column, but
    +     * repeated in to dimensions. Alas, that is not how it is. Also,
    +     * if we have a separate category for 2D lists, we should have
    +     * a separate category for 1D lists. But, again, that is not how
    +     * the code has evolved.
    +     */
    +
    +    MULTI_ARRAY
       }
     
    -  public static final int DEFAULT_ARRAY_SIZE = 10;
    +  int DEFAULT_ARRAY_SIZE = 10;
    --- End diff --
    
    This is an interface. The default (and only legal) form of fields is `static final`. Thanks to Tim for reminding me of this.


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168288500
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ---
    @@ -0,0 +1,206 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.record.metadata;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Abstract definition of column metadata. Allows applications to create
    + * specialized forms of a column metadata object by extending from this
    + * abstract class.
    + * <p>
    + * Note that, by design, primitive columns do not have a link to their
    + * tuple parent, or their index within that parent. This allows the same
    + * metadata to be shared between two views of a tuple, perhaps physical
    + * and projected views. This restriction does not apply to map columns,
    + * since maps (and the row itself) will, by definition, differ between
    + * the two views.
    + */
    +
    +public abstract class AbstractColumnMetadata implements ColumnMetadata {
    +
    +  // Capture the key schema information. We cannot use the MaterializedField
    +  // or MajorType because then encode child information that we encode here
    +  // as a child schema. Keeping the two in sync is nearly impossible.
    +
    +  protected final String name;
    +  protected final MinorType type;
    +  protected final DataMode mode;
    +  protected final int precision;
    +  protected final int scale;
    +  protected boolean projected = true;
    +
    --- End diff --
    
    I see in the current definition of MajorType, there is also timeZone. We don't need that here ? 


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168289206
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ---
    @@ -0,0 +1,206 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.record.metadata;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Abstract definition of column metadata. Allows applications to create
    + * specialized forms of a column metadata object by extending from this
    + * abstract class.
    + * <p>
    + * Note that, by design, primitive columns do not have a link to their
    + * tuple parent, or their index within that parent. This allows the same
    + * metadata to be shared between two views of a tuple, perhaps physical
    + * and projected views. This restriction does not apply to map columns,
    + * since maps (and the row itself) will, by definition, differ between
    + * the two views.
    + */
    +
    +public abstract class AbstractColumnMetadata implements ColumnMetadata {
    +
    +  // Capture the key schema information. We cannot use the MaterializedField
    +  // or MajorType because then encode child information that we encode here
    +  // as a child schema. Keeping the two in sync is nearly impossible.
    +
    +  protected final String name;
    +  protected final MinorType type;
    +  protected final DataMode mode;
    +  protected final int precision;
    +  protected final int scale;
    +  protected boolean projected = true;
    +
    +  /**
    +   * Predicted number of elements per array entry. Default is
    +   * taken from the often hard-coded value of 10.
    +   */
    +
    +  protected int expectedElementCount = 1;
    +
    +  public AbstractColumnMetadata(MaterializedField schema) {
    +    name = schema.getName();
    +    MajorType majorType = schema.getType();
    +    type = majorType.getMinorType();
    +    mode = majorType.getMode();
    +    precision = majorType.getPrecision();
    +    scale = majorType.getScale();
    +    if (isArray()) {
    +      expectedElementCount = DEFAULT_ARRAY_SIZE;
    +    }
    +  }
    +
    +  public AbstractColumnMetadata(String name, MinorType type, DataMode mode) {
    +    this.name = name;
    +    this.type = type;
    +    this.mode = mode;
    +    precision = 0;
    +    scale = 0;
    +    if (isArray()) {
    +      expectedElementCount = DEFAULT_ARRAY_SIZE;
    +    }
    +  }
    +
    +  public AbstractColumnMetadata(AbstractColumnMetadata from) {
    +    name = from.name;
    +    type = from.type;
    +    mode = from.mode;
    +    precision = from.precision;
    +    scale = from.scale;
    +    expectedElementCount = from.expectedElementCount;
    +  }
    +
    +  protected void bind(TupleSchema parentTuple) { }
    +
    +  @Override
    +  public String name() { return name; }
    +
    +  @Override
    +  public MinorType type() { return type; }
    +
    +  @Override
    +  public MajorType majorType() {
    +    return MajorType.newBuilder()
    +        .setMinorType(type())
    +        .setMode(mode())
    +        .build();
    +  }
    +
    +  @Override
    +  public DataMode mode() { return mode; }
    +
    +  @Override
    +  public boolean isNullable() { return mode() == DataMode.OPTIONAL; }
    +
    +  @Override
    +  public boolean isArray() { return mode() == DataMode.REPEATED; }
    +
    +  @Override
    +  public int dimensions() { return isArray() ? 1 : 0; }
    +
    +  @Override
    +  public boolean isMap() { return false; }
    +
    +  @Override
    +  public boolean isVariant() { return false; }
    --- End diff --
    
    Seems like Variant is new name for old Union. Can we add a comment here  ?


---

[GitHub] drill issue #1112: DRILL-6114: Metadata revisions

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/1112
  
    @paul-rogers I ran the pre commit tests. No issues. Everything passed. Will do one more time once code reviews are done.


---

[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1112#discussion_r168291276
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/MetadataUtils.java ---
    @@ -0,0 +1,165 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.record.metadata;
    +
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +public class MetadataUtils {
    +
    +  public static TupleSchema fromFields(Iterable<MaterializedField> fields) {
    +    TupleSchema tuple = new TupleSchema();
    +    for (MaterializedField field : fields) {
    +      tuple.add(field);
    +    }
    +    return tuple;
    +  }
    +
    +  /**
    +   * Create a column metadata object that holds the given
    +   * {@link MaterializedField}. The type of the object will be either a
    +   * primitive or map column, depending on the field's type. The logic
    +   * here mimics the code as written, which is very messy in some places.
    +   *
    +   * @param field the materialized field to wrap
    +   * @return the column metadata that wraps the field
    +   */
    +
    +  public static AbstractColumnMetadata fromField(MaterializedField field) {
    +    MinorType type = field.getType().getMinorType();
    +    switch (type) {
    +    case MAP:
    +      return MetadataUtils.newMap(field);
    +    case UNION:
    +      if (field.getType().getMode() != DataMode.OPTIONAL) {
    +        throw new UnsupportedOperationException(type.name() + " type must be nullable");
    +      }
    +      return new VariantColumnMetadata(field);
    +    case LIST:
    +      switch (field.getType().getMode()) {
    +      case OPTIONAL:
    +
    +        // List of unions (or a degenerate union of a single type.)
    +        // Not supported in Drill.
    --- End diff --
    
    If not supported, should we throw an exception ?


---