You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by paul-rogers <gi...@git.apache.org> on 2018/04/18 00:10:59 UTC

[GitHub] drill pull request #1218: DRILL-6335: Refactor row set abstractions to prepa...

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/1218

    DRILL-6335: Refactor row set abstractions to prepare for unions

    Refactors the column accessors to prepare for adding unions, lists and repeated lists.
    
    This is a subset of a PR done a week ago in the hope that this one will be easier to review. The original one will be broken down into four or more smaller PRs: this one, a refactoring of the result set loader, also to prepare for unions, and the union work itself.
    
    The row set mechanism is fully described [here](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades).
    
    Rather than write a long description here, please take a look at the code and the Wiki post. To ease review, however, the following summarizes the changes:
    
    * Moved metadata from the tuple reader/writer to the column reader/writer so that it is available for all columns. Added a `tupleMetadata()` to tuples to continue to provide the tuple schema.
    * Added a `ProjectionType` bit of metadata in preparation for the projection system to be used by the scan operator. (Projection has three states, captured by the new enum.)
    * Updated some tests to use a slightly simpler version of the code that compares two result sets.
    * Added unit tests for "indirect" readers: a reader for an SV2.
    * Refactored the offset vector writer to allow a dummy offset vector writer as part of the projection mechanism.
    * Changed the column accessor code gen template to use constants instead of hard-coded numbers for field positions.
    * Additional documentation.
    * Restructured the column accessor base classes to better organize the functions in preparation for lists and unions (which are far more complex than maps and scalars.)
    * Pulled a couple of formerly-nested classes into top-level classes.
    * Reorganized the code that builds accessors; moved it into the accessor itself.
    * Added more complete system to allow writing of generic Java objects in tests.
    * Temporary patches to the row set loader classes to handle the above changes. (The patches will be replaced in the result set loader refactoring PR to come later.)
    
    The code already has extensive unit tests for this functionality. Those tests were rerun to demonstrate that the refactoring preserves existing functionality. A later PR will exercise the new structure in tests for unions, lists, etc.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-6335

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1218
    
----
commit c59a3cbe0aa910c8d31589fe1619846e6d417915
Author: Paul Rogers <pr...@...>
Date:   2018-04-17T04:44:10Z

    DRILL-6335: Column accessor refactoring

commit 0716a3894f0e7e48747b04f0f934cf94a04ebd3f
Author: Paul Rogers <pr...@...>
Date:   2018-04-17T17:41:07Z

    Merge fixes

----


---

[GitHub] drill pull request #1218: DRILL-6335: Refactor row set abstractions to prepa...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1218#discussion_r182921553
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/RowSetLoaderImpl.java ---
    @@ -95,4 +96,10 @@ public void endBatch() {
     
       @Override
       public int rowCount() { return rsLoader.rowCount(); }
    +
    +  @Override
    +  public ColumnMetadata schema() {
    +    // No column for the row tuple
    +    return null;
    --- End diff --
    
    this comment is not clear to me. is it ok to return null here ? 


---

[GitHub] drill pull request #1218: DRILL-6335: Refactor row set abstractions to prepa...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1218#discussion_r183262311
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/RowSetLoaderImpl.java ---
    @@ -95,4 +96,10 @@ public void endBatch() {
     
       @Override
       public int rowCount() { return rsLoader.rowCount(); }
    +
    +  @Override
    +  public ColumnMetadata schema() {
    +    // No column for the row tuple
    +    return null;
    --- End diff --
    
    Expanded comment. Let me know if it is still not clear.


---

[GitHub] drill pull request #1218: DRILL-6335: Refactor row set abstractions to prepa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1218


---

[GitHub] drill pull request #1218: DRILL-6335: Refactor row set abstractions to prepa...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1218#discussion_r182729279
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestHyperVectorReaders.java ---
    @@ -362,4 +363,136 @@ public void testRepeatedMap() {
     
         RowSetUtilities.verify(expected, hyperSet);
       }
    +
    +  @Test
    +  public void testUnion() {
    +    TupleMetadata schema = new SchemaBuilder()
    +        .add("a", MinorType.INT)
    +        .addUnion("u")
    +          .addType(MinorType.INT)
    +          .addType(MinorType.VARCHAR)
    +          .resumeSchema()
    +        .buildSchema();
    +
    +    SingleRowSet rowSet1 = fixture.rowSetBuilder(schema)
    +        .addRow(2, 20)
    +        .addRow(4, "fourth")
    +        .build();
    +
    +    SingleRowSet rowSet2 = fixture.rowSetBuilder(schema)
    +        .addRow(1, "first")
    +        .addRow(3, 30)
    +        .build();
    +
    +    // Build the hyper batch
    +
    +    HyperRowSet hyperSet = HyperRowSetImpl.fromRowSets(fixture.allocator(), rowSet1, rowSet2);
    +    assertEquals(4, hyperSet.rowCount());
    +    SelectionVector4 sv4 = hyperSet.getSv4();
    +    sv4.set(0, 1, 0);
    +    sv4.set(1, 0, 0);
    --- End diff --
    
    do we have to manually map each index like this ? may be we can do this in the fromRowSets method itself ?


---

[GitHub] drill pull request #1218: DRILL-6335: Refactor row set abstractions to prepa...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1218#discussion_r182922226
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetWriterImpl.java ---
    @@ -158,4 +159,10 @@ public SingleRowSet done() {
       public int lastWriteIndex() {
         return writerIndex.vectorIndex();
       }
    +
    +  @Override
    +  public ColumnMetadata schema() {
    +    // No column schema for the row as a whole.
    --- End diff --
    
    same comment as above


---

[GitHub] drill issue #1218: DRILL-6335: Refactor row set abstractions to prepare for ...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/1218
  
    LGTM. +1


---

[GitHub] drill pull request #1218: DRILL-6335: Refactor row set abstractions to prepa...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1218#discussion_r182923416
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestIndirectReaders.java ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet.test;
    +
    +import static org.junit.Assert.*;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.metadata.TupleMetadata;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.accessor.ArrayReader;
    +import org.apache.drill.exec.vector.accessor.ArrayWriter;
    +import org.apache.drill.exec.vector.accessor.ScalarReader;
    +import org.apache.drill.exec.vector.accessor.ScalarWriter;
    +import org.apache.drill.test.SubOperatorTest;
    +import org.apache.drill.test.rowSet.RowSetWriter;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
    +import org.apache.drill.test.rowSet.schema.SchemaBuilder;
    +import org.apache.drill.test.rowSet.RowSetComparison;
    +import org.apache.drill.test.rowSet.RowSetReader;
    +import org.junit.Test;
    +
    +/**
    + * Test reading with an indirection vector (sv2.) This form of
    + * indirection vector reorders values within a single batch.
    + * Since the indirection occurs only in the reader index, only
    + * light testing is done; all readers go through the same index class,
    + * so if the index works for one reader, it will for for all.
    --- End diff --
    
    typo: it will work for all (instead of for for all)


---

[GitHub] drill issue #1218: DRILL-6335: Refactor row set abstractions to prepare for ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1218
  
    @ppadma , this PR contains a subset of the changes from the previous big PR. Mostly refactoring. Can you take a look? Thanks. 


---