You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by paul-rogers <gi...@git.apache.org> on 2018/02/06 05:03:06 UTC
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
GitHub user paul-rogers opened a pull request:
https://github.com/apache/drill/pull/1112
DRILL-6114: Metadata revisions
This PR is part of the "[batch handling updates\https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades]" project. It completes the new internal metadata system by including support for the remaining vector types: unions, lists and repeated lists.
The metadata code was refactored. Previously, it was small enough to fit into a single file (with nested classes). With the added complexity, the metadata classes were split out into separate classes, grouped into its own Java package.
A few fixes were made here and there to ensure the unit tests pass.
@ppadma or @bitblender, can one of you run the pre-commit tests and send me the details of any failures?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/paul-rogers/drill DRILL-6114B
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1112.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1112
----
commit e0f11f923ea48070f829a859395ee66b81b9ba18
Author: Paul Rogers <pr...@...>
Date: 2018-02-06T04:18:18Z
DRILL-6114: Metadata revisions
Support for union vectors, list vectors, repeated list vectors. Refactored metadata classes.
----
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168295700
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/RepeatedListColumnMetadata.java ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+import com.google.common.base.Preconditions;
+
+public class RepeatedListColumnMetadata extends AbstractColumnMetadata {
+
+ private AbstractColumnMetadata childSchema;
+
+ public RepeatedListColumnMetadata(MaterializedField field) {
+ super(field);
+ Preconditions.checkArgument(field.getType().getMinorType() == MinorType.LIST);
+ Preconditions.checkArgument(field.getType().getMode() == DataMode.REPEATED);
+ Preconditions.checkArgument(field.getChildren().size() <= 1);
+ if (! field.getChildren().isEmpty()) {
+ childSchema = MetadataUtils.fromField(field.getChildren().iterator().next());
+ Preconditions.checkArgument(childSchema.isArray());
+ }
+ }
+
+ public RepeatedListColumnMetadata(String name, AbstractColumnMetadata childSchema) {
+ super(name, MinorType.LIST, DataMode.REPEATED);
+ if (childSchema != null) {
+ Preconditions.checkArgument(childSchema.isArray());
+ }
+ this.childSchema = childSchema;
+ }
+
+ public void childSchema(ColumnMetadata childMetadata) {
+ Preconditions.checkState(childSchema == null);
+ Preconditions.checkArgument(childMetadata.mode() == DataMode.REPEATED);
+ childSchema = (AbstractColumnMetadata) childMetadata;
+ }
+
+ @Override
+ public StructureType structureType() { return StructureType.MULTI_ARRAY; }
+
+ @Override
+ public MaterializedField schema() {
+ MaterializedField field = emptySchema();
+ if (childSchema != null) {
+ field.addChild(childSchema.schema());
+ }
+ return field;
+ }
+
+ @Override
+ public MaterializedField emptySchema() {
+ return MaterializedField.create(name(), majorType());
+ }
+
+ @Override
+ public ColumnMetadata cloneEmpty() {
+ return new RepeatedListColumnMetadata(name, null);
+ }
+
+ @Override
+ public AbstractColumnMetadata copy() {
+ return new RepeatedListColumnMetadata(name, childSchema);
+ }
+
+ @Override
+ public ColumnMetadata childSchema() { return childSchema; }
+
+ @Override
+ public int dimensions() {
+
+ // If there is no child, then we don't know the
+ // dimensionality.
+
+ return childSchema == null ? -1
--- End diff --
can we use a static final instead of -1 ?
---
[GitHub] drill issue #1112: DRILL-6114: Metadata revisions
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/1112
@arina-ielchiieva, @parthchandra can either of you perhaps give this one a committer review? Thanks!
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168681135
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * Abstract definition of column metadata. Allows applications to create
+ * specialized forms of a column metadata object by extending from this
+ * abstract class.
+ * <p>
+ * Note that, by design, primitive columns do not have a link to their
+ * tuple parent, or their index within that parent. This allows the same
+ * metadata to be shared between two views of a tuple, perhaps physical
+ * and projected views. This restriction does not apply to map columns,
+ * since maps (and the row itself) will, by definition, differ between
+ * the two views.
+ */
+
+public abstract class AbstractColumnMetadata implements ColumnMetadata {
+
+ // Capture the key schema information. We cannot use the MaterializedField
+ // or MajorType because then encode child information that we encode here
+ // as a child schema. Keeping the two in sync is nearly impossible.
+
+ protected final String name;
+ protected final MinorType type;
+ protected final DataMode mode;
+ protected final int precision;
+ protected final int scale;
+ protected boolean projected = true;
+
--- End diff --
AFAIK, we do not support the time-zone aware data types at present.
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168681901
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * Abstract definition of column metadata. Allows applications to create
+ * specialized forms of a column metadata object by extending from this
+ * abstract class.
+ * <p>
+ * Note that, by design, primitive columns do not have a link to their
+ * tuple parent, or their index within that parent. This allows the same
+ * metadata to be shared between two views of a tuple, perhaps physical
+ * and projected views. This restriction does not apply to map columns,
+ * since maps (and the row itself) will, by definition, differ between
+ * the two views.
+ */
+
+public abstract class AbstractColumnMetadata implements ColumnMetadata {
+
+ // Capture the key schema information. We cannot use the MaterializedField
+ // or MajorType because then encode child information that we encode here
+ // as a child schema. Keeping the two in sync is nearly impossible.
+
+ protected final String name;
+ protected final MinorType type;
+ protected final DataMode mode;
+ protected final int precision;
+ protected final int scale;
+ protected boolean projected = true;
+
+ /**
+ * Predicted number of elements per array entry. Default is
+ * taken from the often hard-coded value of 10.
+ */
+
+ protected int expectedElementCount = 1;
+
+ public AbstractColumnMetadata(MaterializedField schema) {
+ name = schema.getName();
+ MajorType majorType = schema.getType();
+ type = majorType.getMinorType();
+ mode = majorType.getMode();
+ precision = majorType.getPrecision();
+ scale = majorType.getScale();
+ if (isArray()) {
+ expectedElementCount = DEFAULT_ARRAY_SIZE;
+ }
+ }
+
+ public AbstractColumnMetadata(String name, MinorType type, DataMode mode) {
+ this.name = name;
+ this.type = type;
+ this.mode = mode;
+ precision = 0;
+ scale = 0;
+ if (isArray()) {
+ expectedElementCount = DEFAULT_ARRAY_SIZE;
+ }
+ }
+
+ public AbstractColumnMetadata(AbstractColumnMetadata from) {
+ name = from.name;
+ type = from.type;
+ mode = from.mode;
+ precision = from.precision;
+ scale = from.scale;
+ expectedElementCount = from.expectedElementCount;
+ }
+
+ protected void bind(TupleSchema parentTuple) { }
+
+ @Override
+ public String name() { return name; }
+
+ @Override
+ public MinorType type() { return type; }
+
+ @Override
+ public MajorType majorType() {
+ return MajorType.newBuilder()
+ .setMinorType(type())
+ .setMode(mode())
+ .build();
+ }
+
+ @Override
+ public DataMode mode() { return mode; }
+
+ @Override
+ public boolean isNullable() { return mode() == DataMode.OPTIONAL; }
+
+ @Override
+ public boolean isArray() { return mode() == DataMode.REPEATED; }
+
+ @Override
+ public int dimensions() { return isArray() ? 1 : 0; }
+
+ @Override
+ public boolean isMap() { return false; }
+
+ @Override
+ public boolean isVariant() { return false; }
--- End diff --
I think there is a write-up somewhere about this. "Variant" is an old term from Visual Basic for a tagged union (which is what our "Union" type really is.) It is used here because both a (non-repeated) List and a Union are both unions, which is confusing, so I used a separate term. Also, the "Union" term is confusing since there is also a UNION operator in SQL. `See VariantMetadata.java`.
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/drill/pull/1112
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168682152
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/MetadataUtils.java ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.MaterializedField;
+
+public class MetadataUtils {
+
+ public static TupleSchema fromFields(Iterable<MaterializedField> fields) {
+ TupleSchema tuple = new TupleSchema();
+ for (MaterializedField field : fields) {
+ tuple.add(field);
+ }
+ return tuple;
+ }
+
+ /**
+ * Create a column metadata object that holds the given
+ * {@link MaterializedField}. The type of the object will be either a
+ * primitive or map column, depending on the field's type. The logic
+ * here mimics the code as written, which is very messy in some places.
+ *
+ * @param field the materialized field to wrap
+ * @return the column metadata that wraps the field
+ */
+
+ public static AbstractColumnMetadata fromField(MaterializedField field) {
+ MinorType type = field.getType().getMinorType();
+ switch (type) {
+ case MAP:
+ return MetadataUtils.newMap(field);
+ case UNION:
+ if (field.getType().getMode() != DataMode.OPTIONAL) {
+ throw new UnsupportedOperationException(type.name() + " type must be nullable");
+ }
+ return new VariantColumnMetadata(field);
+ case LIST:
+ switch (field.getType().getMode()) {
+ case OPTIONAL:
+
+ // List of unions (or a degenerate union of a single type.)
+ // Not supported in Drill.
--- End diff --
We do. This comment is in the wrong place, should be in default.
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168301865
--- Diff: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/ColumnMetadata.java ---
@@ -15,36 +15,115 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.drill.exec.record;
+package org.apache.drill.exec.record.metadata;
import org.apache.drill.common.types.TypeProtos.DataMode;
import org.apache.drill.common.types.TypeProtos.MajorType;
import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
/**
* Metadata description of a column including names, types and structure
* information.
*/
public interface ColumnMetadata {
+
+ /**
+ * Rough characterization of Drill types into metadata categories.
+ * Various aspects of Drill's type system are very, very messy.
+ * However, Drill is defined by its code, not some abstract design,
+ * so the metadata system here does the best job it can to simplify
+ * the messy type system while staying close to the underlying
+ * implementation.
+ */
+
enum StructureType {
- PRIMITIVE, LIST, TUPLE
+
+ /**
+ * Primitive column (all types except List, Map and Union.)
+ * Includes (one-dimensional) arrays of those types.
+ */
+
+ PRIMITIVE,
+
+ /**
+ * Map or repeated map. Also describes the row as a whole.
+ */
+
+ TUPLE,
+
+ /**
+ * Union or (non-repeated) list. (A non-repeated list is,
+ * essentially, a repeated union.)
+ */
+
+ VARIANT,
+
+ /**
+ * A repeated list. A repeated list is not simply the repeated
+ * form of a list, it is something else entirely. It acts as
+ * a dimensional wrapper around any other type (except list)
+ * and adds a non-nullable extra dimension. Hence, this type is
+ * for 2D+ arrays.
+ * <p>
+ * In theory, a 2D list of, say, INT would be an INT column, but
+ * repeated in to dimensions. Alas, that is not how it is. Also,
+ * if we have a separate category for 2D lists, we should have
+ * a separate category for 1D lists. But, again, that is not how
+ * the code has evolved.
+ */
+
+ MULTI_ARRAY
}
- public static final int DEFAULT_ARRAY_SIZE = 10;
+ int DEFAULT_ARRAY_SIZE = 10;
--- End diff --
why is this changed from static final int to int ?
---
[GitHub] drill issue #1112: DRILL-6114: Metadata revisions
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/1112
Thanks @ppadma!
Next we'll need a reviewer. @parthchandra or @arina-ielchiieva, is this something you can review?
Once this one is good, I'll follow up with another that includes revisions to the `SchemaBuilder` that allows tests to build schemas using the additional data types added here.
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168682331
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/RepeatedListColumnMetadata.java ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+import com.google.common.base.Preconditions;
+
+public class RepeatedListColumnMetadata extends AbstractColumnMetadata {
+
+ private AbstractColumnMetadata childSchema;
+
+ public RepeatedListColumnMetadata(MaterializedField field) {
+ super(field);
+ Preconditions.checkArgument(field.getType().getMinorType() == MinorType.LIST);
+ Preconditions.checkArgument(field.getType().getMode() == DataMode.REPEATED);
+ Preconditions.checkArgument(field.getChildren().size() <= 1);
+ if (! field.getChildren().isEmpty()) {
+ childSchema = MetadataUtils.fromField(field.getChildren().iterator().next());
+ Preconditions.checkArgument(childSchema.isArray());
+ }
+ }
+
+ public RepeatedListColumnMetadata(String name, AbstractColumnMetadata childSchema) {
+ super(name, MinorType.LIST, DataMode.REPEATED);
+ if (childSchema != null) {
+ Preconditions.checkArgument(childSchema.isArray());
+ }
+ this.childSchema = childSchema;
+ }
+
+ public void childSchema(ColumnMetadata childMetadata) {
+ Preconditions.checkState(childSchema == null);
+ Preconditions.checkArgument(childMetadata.mode() == DataMode.REPEATED);
+ childSchema = (AbstractColumnMetadata) childMetadata;
+ }
+
+ @Override
+ public StructureType structureType() { return StructureType.MULTI_ARRAY; }
+
+ @Override
+ public MaterializedField schema() {
+ MaterializedField field = emptySchema();
+ if (childSchema != null) {
+ field.addChild(childSchema.schema());
+ }
+ return field;
+ }
+
+ @Override
+ public MaterializedField emptySchema() {
+ return MaterializedField.create(name(), majorType());
+ }
+
+ @Override
+ public ColumnMetadata cloneEmpty() {
+ return new RepeatedListColumnMetadata(name, null);
+ }
+
+ @Override
+ public AbstractColumnMetadata copy() {
+ return new RepeatedListColumnMetadata(name, childSchema);
+ }
+
+ @Override
+ public ColumnMetadata childSchema() { return childSchema; }
+
+ @Override
+ public int dimensions() {
+
+ // If there is no child, then we don't know the
+ // dimensionality.
+
+ return childSchema == null ? -1
--- End diff --
Fixed.
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168682470
--- Diff: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/ColumnMetadata.java ---
@@ -15,36 +15,115 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.drill.exec.record;
+package org.apache.drill.exec.record.metadata;
import org.apache.drill.common.types.TypeProtos.DataMode;
import org.apache.drill.common.types.TypeProtos.MajorType;
import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
/**
* Metadata description of a column including names, types and structure
* information.
*/
public interface ColumnMetadata {
+
+ /**
+ * Rough characterization of Drill types into metadata categories.
+ * Various aspects of Drill's type system are very, very messy.
+ * However, Drill is defined by its code, not some abstract design,
+ * so the metadata system here does the best job it can to simplify
+ * the messy type system while staying close to the underlying
+ * implementation.
+ */
+
enum StructureType {
- PRIMITIVE, LIST, TUPLE
+
+ /**
+ * Primitive column (all types except List, Map and Union.)
+ * Includes (one-dimensional) arrays of those types.
+ */
+
+ PRIMITIVE,
+
+ /**
+ * Map or repeated map. Also describes the row as a whole.
+ */
+
+ TUPLE,
+
+ /**
+ * Union or (non-repeated) list. (A non-repeated list is,
+ * essentially, a repeated union.)
+ */
+
+ VARIANT,
+
+ /**
+ * A repeated list. A repeated list is not simply the repeated
+ * form of a list, it is something else entirely. It acts as
+ * a dimensional wrapper around any other type (except list)
+ * and adds a non-nullable extra dimension. Hence, this type is
+ * for 2D+ arrays.
+ * <p>
+ * In theory, a 2D list of, say, INT would be an INT column, but
+ * repeated in to dimensions. Alas, that is not how it is. Also,
+ * if we have a separate category for 2D lists, we should have
+ * a separate category for 1D lists. But, again, that is not how
+ * the code has evolved.
+ */
+
+ MULTI_ARRAY
}
- public static final int DEFAULT_ARRAY_SIZE = 10;
+ int DEFAULT_ARRAY_SIZE = 10;
--- End diff --
This is an interface. The default (and only legal) form of fields is `static final`. Thanks to Tim for reminding me of this.
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168288500
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * Abstract definition of column metadata. Allows applications to create
+ * specialized forms of a column metadata object by extending from this
+ * abstract class.
+ * <p>
+ * Note that, by design, primitive columns do not have a link to their
+ * tuple parent, or their index within that parent. This allows the same
+ * metadata to be shared between two views of a tuple, perhaps physical
+ * and projected views. This restriction does not apply to map columns,
+ * since maps (and the row itself) will, by definition, differ between
+ * the two views.
+ */
+
+public abstract class AbstractColumnMetadata implements ColumnMetadata {
+
+ // Capture the key schema information. We cannot use the MaterializedField
+ // or MajorType because then encode child information that we encode here
+ // as a child schema. Keeping the two in sync is nearly impossible.
+
+ protected final String name;
+ protected final MinorType type;
+ protected final DataMode mode;
+ protected final int precision;
+ protected final int scale;
+ protected boolean projected = true;
+
--- End diff --
I see in the current definition of MajorType, there is also timeZone. We don't need that here ?
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168289206
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * Abstract definition of column metadata. Allows applications to create
+ * specialized forms of a column metadata object by extending from this
+ * abstract class.
+ * <p>
+ * Note that, by design, primitive columns do not have a link to their
+ * tuple parent, or their index within that parent. This allows the same
+ * metadata to be shared between two views of a tuple, perhaps physical
+ * and projected views. This restriction does not apply to map columns,
+ * since maps (and the row itself) will, by definition, differ between
+ * the two views.
+ */
+
+public abstract class AbstractColumnMetadata implements ColumnMetadata {
+
+ // Capture the key schema information. We cannot use the MaterializedField
+ // or MajorType because then encode child information that we encode here
+ // as a child schema. Keeping the two in sync is nearly impossible.
+
+ protected final String name;
+ protected final MinorType type;
+ protected final DataMode mode;
+ protected final int precision;
+ protected final int scale;
+ protected boolean projected = true;
+
+ /**
+ * Predicted number of elements per array entry. Default is
+ * taken from the often hard-coded value of 10.
+ */
+
+ protected int expectedElementCount = 1;
+
+ public AbstractColumnMetadata(MaterializedField schema) {
+ name = schema.getName();
+ MajorType majorType = schema.getType();
+ type = majorType.getMinorType();
+ mode = majorType.getMode();
+ precision = majorType.getPrecision();
+ scale = majorType.getScale();
+ if (isArray()) {
+ expectedElementCount = DEFAULT_ARRAY_SIZE;
+ }
+ }
+
+ public AbstractColumnMetadata(String name, MinorType type, DataMode mode) {
+ this.name = name;
+ this.type = type;
+ this.mode = mode;
+ precision = 0;
+ scale = 0;
+ if (isArray()) {
+ expectedElementCount = DEFAULT_ARRAY_SIZE;
+ }
+ }
+
+ public AbstractColumnMetadata(AbstractColumnMetadata from) {
+ name = from.name;
+ type = from.type;
+ mode = from.mode;
+ precision = from.precision;
+ scale = from.scale;
+ expectedElementCount = from.expectedElementCount;
+ }
+
+ protected void bind(TupleSchema parentTuple) { }
+
+ @Override
+ public String name() { return name; }
+
+ @Override
+ public MinorType type() { return type; }
+
+ @Override
+ public MajorType majorType() {
+ return MajorType.newBuilder()
+ .setMinorType(type())
+ .setMode(mode())
+ .build();
+ }
+
+ @Override
+ public DataMode mode() { return mode; }
+
+ @Override
+ public boolean isNullable() { return mode() == DataMode.OPTIONAL; }
+
+ @Override
+ public boolean isArray() { return mode() == DataMode.REPEATED; }
+
+ @Override
+ public int dimensions() { return isArray() ? 1 : 0; }
+
+ @Override
+ public boolean isMap() { return false; }
+
+ @Override
+ public boolean isVariant() { return false; }
--- End diff --
Seems like Variant is new name for old Union. Can we add a comment here ?
---
[GitHub] drill issue #1112: DRILL-6114: Metadata revisions
Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:
https://github.com/apache/drill/pull/1112
@paul-rogers I ran the pre commit tests. No issues. Everything passed. Will do one more time once code reviews are done.
---
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:
https://github.com/apache/drill/pull/1112#discussion_r168291276
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/MetadataUtils.java ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.MaterializedField;
+
+public class MetadataUtils {
+
+ public static TupleSchema fromFields(Iterable<MaterializedField> fields) {
+ TupleSchema tuple = new TupleSchema();
+ for (MaterializedField field : fields) {
+ tuple.add(field);
+ }
+ return tuple;
+ }
+
+ /**
+ * Create a column metadata object that holds the given
+ * {@link MaterializedField}. The type of the object will be either a
+ * primitive or map column, depending on the field's type. The logic
+ * here mimics the code as written, which is very messy in some places.
+ *
+ * @param field the materialized field to wrap
+ * @return the column metadata that wraps the field
+ */
+
+ public static AbstractColumnMetadata fromField(MaterializedField field) {
+ MinorType type = field.getType().getMinorType();
+ switch (type) {
+ case MAP:
+ return MetadataUtils.newMap(field);
+ case UNION:
+ if (field.getType().getMode() != DataMode.OPTIONAL) {
+ throw new UnsupportedOperationException(type.name() + " type must be nullable");
+ }
+ return new VariantColumnMetadata(field);
+ case LIST:
+ switch (field.getType().getMode()) {
+ case OPTIONAL:
+
+ // List of unions (or a degenerate union of a single type.)
+ // Not supported in Drill.
--- End diff --
If not supported, should we throw an exception ?
---