You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2020/04/03 18:22:41 UTC

[GitHub] [drill] ihuzenko opened a new pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

ihuzenko opened a new pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048
 
 
   
   # [DRILL-7429](https://issues.apache.org/jira/browse/DRILL-7429): Wrong column order when selecting complex data using Hive storage plugin
   
   ## Description
   
   Column ordering was wrong when the first selected column had type STRUCT and second had a primitive type. 
   
   ## Documentation
   
   No need to document it.
   
   ## Testing
   
   Added unit test in TestHiveStructs. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko merged pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

Posted by GitBox <gi...@apache.org>.
ihuzenko merged pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048#discussion_r406068424
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ##########
 @@ -120,17 +119,7 @@ protected IterOutcome doWork() {
             // since this is first batch and we already got a NONE, need to set up the schema
             doAlloc(0);
             setValueCount(0);
-
-            // Only need to add the schema for the complex exprs because others should already have
-            // been setup during setupNewSchema
-            for (FieldReference fieldReference : complexFieldReferencesList) {
-              MaterializedField field = MaterializedField.create(fieldReference.getAsNamePart().getName(),
-                      UntypedNullHolder.TYPE);
-              container.add(new UntypedNullVector(field, container.getAllocator()));
-            }
-            container.buildSchema(SelectionVectorMode.NONE);
-            wasNone = true;
-            return IterOutcome.OK_NEW_SCHEMA;
+            return IterOutcome.NONE;
 
 Review comment:
   Thanks for referring to `handleNullInput()` and explanation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048#discussion_r406054349
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ##########
 @@ -120,17 +119,7 @@ protected IterOutcome doWork() {
             // since this is first batch and we already got a NONE, need to set up the schema
             doAlloc(0);
             setValueCount(0);
-
-            // Only need to add the schema for the complex exprs because others should already have
-            // been setup during setupNewSchema
-            for (FieldReference fieldReference : complexFieldReferencesList) {
-              MaterializedField field = MaterializedField.create(fieldReference.getAsNamePart().getName(),
-                      UntypedNullHolder.TYPE);
-              container.add(new UntypedNullVector(field, container.getAllocator()));
-            }
-            container.buildSchema(SelectionVectorMode.NONE);
-            wasNone = true;
-            return IterOutcome.OK_NEW_SCHEMA;
+            return IterOutcome.NONE;
 
 Review comment:
   AFAIK there is ```handleNullInput()``` which should be called for this purpose. Here was workaround for adding late complex types to container and rebuilding schema was necessary. One thing I may be missing here is setting ```wasNone = true;```, so I'll add it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048#discussion_r403732045
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ##########
 @@ -339,6 +331,14 @@ protected IterOutcome handleNullInput() {
     return IterOutcome.OK_NEW_SCHEMA;
   }
 
+  private boolean isNotSchemalessInput() {
+    RecordBatch incomingBatch = incoming instanceof IteratorValidatorBatchIterator
+        ? ((IteratorValidatorBatchIterator) incoming).getIncoming()
+        : incoming;
 
 Review comment:
   Is it required to ensure that incoming batch after `IteratorValidatorBatchIterator` is `IteratorValidatorBatchIterator`?
   
   Looks like `SchemalessBatch.getSchema()` always returns null, so at least the second check covers this `instance of` check...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on issue #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

Posted by GitBox <gi...@apache.org>.
ihuzenko commented on issue #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048#issuecomment-611420302
 
 
   Hello @vvysotskyi , thanks for good review. I've updated the PR, please take a look again. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048#discussion_r403730299
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ##########
 @@ -120,17 +119,7 @@ protected IterOutcome doWork() {
             // since this is first batch and we already got a NONE, need to set up the schema
             doAlloc(0);
             setValueCount(0);
-
-            // Only need to add the schema for the complex exprs because others should already have
-            // been setup during setupNewSchema
-            for (FieldReference fieldReference : complexFieldReferencesList) {
-              MaterializedField field = MaterializedField.create(fieldReference.getAsNamePart().getName(),
-                      UntypedNullHolder.TYPE);
-              container.add(new UntypedNullVector(field, container.getAllocator()));
-            }
-            container.buildSchema(SelectionVectorMode.NONE);
-            wasNone = true;
-            return IterOutcome.OK_NEW_SCHEMA;
+            return IterOutcome.NONE;
 
 Review comment:
   We haven't built the schema. I think even for the case when incoming operator returned `NONE` for the first batch, we should also build the schema, in the opposite case, we may break some cases of `limit 0` queries.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048#discussion_r403729644
 
 

 ##########
 File path: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/complex_types/TestHiveStructs.java
 ##########
 @@ -234,6 +240,28 @@ public void primitiveStruct() throws Exception {
         .go();
   }
 
+  @Test // DRILL-7429
+  public void testCorrectColumnOrdering() throws Exception {
+    String sql = "SELECT t.str_n0 a, rid b FROM hive.struct_tbl t LIMIT 1";
+    DirectRowSet directRowSet = queryBuilder().sql(sql).rowSet();
+    BatchSchema fields;
+    try {
+      fields = directRowSet.batchSchema();
+    } finally {
+      directRowSet.clear();
+    }
+
+    assertThat(fields.getFieldCount(), is(2));
 
 Review comment:
   We also have `RowSetComparison` class, which allows comparing expected and actual row sets, including their schema.
   
   Or if you don't want to populate row set with expected data, you may use the `schemaBaseLine()` method from `TestBuilder` to validate query schema only.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on a change in pull request #2048: DRILL-7429: Wrong column order when selecting complex data using Hive storage plugin
URL: https://github.com/apache/drill/pull/2048#discussion_r403732045
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ##########
 @@ -339,6 +331,14 @@ protected IterOutcome handleNullInput() {
     return IterOutcome.OK_NEW_SCHEMA;
   }
 
+  private boolean isNotSchemalessInput() {
+    RecordBatch incomingBatch = incoming instanceof IteratorValidatorBatchIterator
+        ? ((IteratorValidatorBatchIterator) incoming).getIncoming()
+        : incoming;
 
 Review comment:
   Is it required to ensure that incoming batch after `IteratorValidatorBatchIterator` is `SchemalessBatch`?
   
   Looks like `SchemalessBatch.getSchema()` always returns null, so at least the second check covers this `instance of` check...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services