You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by paul-rogers <gi...@git.apache.org> on 2017/03/14 23:18:44 UTC

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/785

    DRILL-5323: Test tools for row sets

    Provide test tools to create, populate and compare row sets
    
    To simplify tests, we need a TestRowSet concept that wraps a
    VectorContainer and provides easy ways to:
    
    - Define a schema for the row set.
    - Create a set of vectors that implement the schema.
    - Populate the row set with test data via code.
    - Add an SV2 to the row set.
    - Pass the row set to operator components (such as generated code
    blocks.)
    - Examine the contents of a row set
    - Compare the results of the operation with an expected result set.
    - Dispose of the underling direct memory when work is done.
    
    This code builds on that in DRILL-5324 to provide a complete row set
    API. See DRILL-5318 for the spec.
    
    Note: this code can be reviewed as-is, but cannot be committed until
    after DRILL-5324 is committed: this code has compile-time dependencies
    on that code. This PR will be rebased once DRILL-5324 is pulled into
    master.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-5323

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #785
    
----
commit 99b3ed45eed41907a143ece7082b8af926bd920b
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-03-14T23:18:24Z

    DRILL-5323: Test tools for row sets
    
    Provide test tools to create, populate and compare row sets
    
    To simplify tests, we need a TestRowSet concept that wraps a
    VectorContainer and provides easy ways to:
    
    - Define a schema for the row set.
    - Create a set of vectors that implement the schema.
    - Populate the row set with test data via code.
    - Add an SV2 to the row set.
    - Pass the row set to operator components (such as generated code
    blocks.)
    - Examine the contents of a row set
    - Compare the results of the operation with an expected result set.
    - Dispose of the underling direct memory when work is done.
    
    This code builds on that in DRILL-5324 to provide a complete row set
    API. See DRILL-5318 for the spec.
    
    Note: this code can be reviewed as-is, but cannot be committed until
    after DRILL-5324 is committed: this code has compile-time dependencies
    on that code. This PR will be rebased once DRILL-5324 is pulled into
    master.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110055747
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetPrinter.java ---
    @@ -0,0 +1,97 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.io.PrintStream;
    +
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.test.rowSet.RowSet.RowSetReader;
    +import org.apache.drill.test.rowSet.RowSetSchema.AccessSchema;
    +
    +public class RowSetPrinter {
    +  private RowSet rowSet;
    +
    +  public RowSetPrinter(RowSet rowSet) {
    +    this.rowSet = rowSet;
    +  }
    +
    +  public void print() {
    +    print(System.out);
    +  }
    +
    +  public void print(PrintStream out) {
    +    SelectionVectorMode selectionMode = rowSet.getIndirectionType();
    +    RowSetReader reader = rowSet.reader();
    +    int colCount = reader.width();
    +    printSchema(out, selectionMode);
    +    while (reader.next()) {
    +      printHeader(out, reader, selectionMode);
    +      for (int i = 0; i < colCount; i++) {
    +        if (i > 0) {
    +          out.print(", ");
    +        }
    +        out.print(reader.getAsString(i));
    +      }
    +      out.println();
    +    }
    +  }
    +
    +  private void printSchema(PrintStream out, SelectionVectorMode selectionMode) {
    +    out.print("#");
    +    switch (selectionMode) {
    +    case FOUR_BYTE:
    +      out.print(" (batch #, row #)");
    +      break;
    +    case TWO_BYTE:
    +      out.print(" (row #)");
    +      break;
    +    default:
    +      break;
    +    }
    +    out.print(": ");
    +    AccessSchema schema = rowSet.schema().access();
    +    for (int i = 0; i < schema.count(); i++) {
    +      if (i > 0) {
    --- End diff --
    
    Access schema is a flattened view of the row set: the set of columns whose values can be set. If we have a schema like (a, b(c, d)) (where b is a map), the access schema is (a, b.c, b.d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111867385
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java ---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    +    }
    +
    +    public T get(String key) {
    +      Integer index = getIndex(key);
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110788792
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Builder of a row set schema expressed as a list of materialized
    + * fields. Optimized for use when creating schemas by hand in tests.
    + * <p>
    + * Example usage to create the following schema: <br>
    + * <tt>(c: INT, a: MAP(c: VARCHAR, d: INT, e: MAP(f: VARCHAR), g: INT), h: BIGINT)</tt>
    --- End diff --
    
    Still same as previous version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/785


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111867499
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java ---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    +    }
    +
    +    public T get(String key) {
    +      Integer index = getIndex(key);
    +      if (index == -1) {
    +        return null;
    +      }
    +      return get(index);
    +    }
    +
    +    public int getIndex(String key) {
    +      Integer index = nameSpace.get(key);
    +      if (index == null) {
    +        return -1;
    +      }
    +      return index;
    +    }
    +
    +    public int count() { return columns.size(); }
    +  }
    +
    +  private static class TupleSchemaImpl implements TupleSchema {
    +
    +    private NameSpace<LogicalColumn> columns;
    +
    +    public TupleSchemaImpl(NameSpace<LogicalColumn> ns) {
    +      this.columns = ns;
    --- End diff --
    
    Can't be. This is a private class, called only by this class when translating a batch schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110055928
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetUtilities.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.accessor.AccessorUtilities;
    +import org.apache.drill.exec.vector.accessor.ColumnAccessor.ValueType;
    +import org.apache.drill.exec.vector.accessor.ColumnWriter;
    +import org.apache.drill.test.rowSet.RowSet.RowSetWriter;
    +import org.joda.time.Duration;
    +import org.joda.time.Period;
    +
    +public class RowSetUtilities {
    +
    +  private RowSetUtilities() { }
    +
    +  public static void reverse(SelectionVector2 sv2) {
    +    int count = sv2.getCount();
    +    for (int i = 0; i < count / 2; i++) {
    +      char temp = sv2.getIndex(i);
    +      int dest = count - 1 - i;
    +      sv2.setIndex(i, sv2.getIndex(dest));
    +      sv2.setIndex(dest, temp);
    +    }
    +  }
    +
    +  /**
    +   * Set a test data value from an int. Uses the type information of the
    +   * column to handle interval types. Else, uses the value type of the
    +   * accessor. The value set here is purely for testing; the mapping
    +   * from ints to intervals has no real meaning.
    +   *
    +   * @param rowWriter
    +   * @param index
    +   * @param value
    +   */
    +
    +  public static void setFromInt(RowSetWriter rowWriter, int index, int value) {
    +    ColumnWriter writer = rowWriter.column(index);
    +    if (writer.valueType() == ValueType.PERIOD) {
    +      setPeriodFromInt(writer, rowWriter.schema().column(index).getType().getMinorType(), value);
    +    } else {
    +      AccessorUtilities.setFromInt(writer, value);
    +    }
    +  }
    +
    +  public static void setPeriodFromInt(ColumnWriter writer, MinorType minorType,
    +      int value) {
    +    switch (minorType) {
    +    case INTERVAL:
    +      writer.setPeriod(Duration.millis(value).toPeriod());
    +      break;
    +    case INTERVALYEAR:
    +      writer.setPeriod(Period.years(value / 12).withMonths(value % 12));
    +      break;
    +    case INTERVALDAY:
    +      int sec = value % 60;
    +      value = value / 60;
    +      int min = value % 60;
    +      value = value / 60;
    +      writer.setPeriod(Period.days(value).withMinutes(min).withSeconds(sec));
    --- End diff --
    
    This is a data generator. The int has no real meaning, it is just a convenient way to populate a field. So, here we just slice off some values to put into each field. Not pretty, but convenient for testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110801295
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
    @@ -0,0 +1,240 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
    +import org.apache.drill.exec.vector.accessor.impl.TupleWriterImpl;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +
    +/**
    + * Implementation of a single row set with no indirection (selection)
    + * vector.
    + */
    +
    +public class DirectRowSet extends AbstractSingleRowSet implements ExtendableRowSet {
    +
    +  /**
    +   * Reader index that points directly to each row in the row set.
    +   * This index starts with pointing to the -1st row, so that the
    +   * reader can require a <tt>next()</tt> for every row, including
    +   * the first. (This is the JDBC RecordSet convention.)
    +   */
    +
    +  private static class DirectRowIndex extends BoundedRowIndex {
    +
    +    public DirectRowIndex(int rowCount) {
    +      super(rowCount);
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Writer index that points to each row in the row set. The index starts at
    +   * the 0th row and advances one row on each increment. This allows writers to
    +   * start positioned at the first row. Writes happen in the current row.
    +   * Calling <tt>next()</tt> advances to the next position, effectively saving
    +   * the current row. The most recent row can be abandoned easily simply by not
    +   * calling <tt>next()</tt>. This means that the number of completed rows is
    +   * the same as the row index.
    +   */
    +
    +  private static class ExtendableRowIndex extends RowSetIndex {
    +
    +    private final int maxSize;
    +
    +    public ExtendableRowIndex(int maxSize) {
    +      this.maxSize = maxSize;
    +      rowIndex = 0;
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public boolean next() {
    +      if (++rowIndex <= maxSize ) {
    +        return true;
    +      } else {
    +        rowIndex--;
    +        return false;
    +      }
    +    }
    +
    +    @Override
    +    public int size() { return rowIndex; }
    +
    +    @Override
    +    public boolean valid() { return rowIndex < maxSize; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Implementation of a row set writer. Only available for newly-created,
    +   * empty, direct, single row sets. Rewriting is not allowed, nor is writing
    +   * to a hyper row set.
    +   */
    +
    +  public class RowSetWriterImpl extends TupleWriterImpl implements RowSetWriter {
    +
    +    private final ExtendableRowIndex index;
    +    private final ExtendableRowSet rowSet;
    +
    +    protected RowSetWriterImpl(ExtendableRowSet rowSet, TupleSchema schema, ExtendableRowIndex index, AbstractColumnWriter[] writers) {
    +      super(schema, writers);
    +      this.rowSet = rowSet;
    +      this.index = index;
    +      start();
    +    }
    +
    +    @Override
    +    public void setRow(Object...values) {
    +      if (! index.valid()) {
    +        throw new IndexOutOfBoundsException("Write past end of row set");
    +      }
    +      for (int i = 0; i < values.length;  i++) {
    +        set(i, values[i]);
    +      }
    +      save();
    +    }
    +
    +    @Override
    +    public boolean valid() { return index.valid(); }
    +
    +    @Override
    +    public int index() { return index.position(); }
    +
    +    @Override
    +    public void save() {
    +      index.next();
    +      start();
    +    }
    +
    +    @Override
    +    public void done() {
    +      rowSet.setRowCount(index.size());
    +    }
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, BatchSchema schema) {
    +    super(allocator, schema);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorContainer container) {
    +    super(allocator, container);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorAccessible va) {
    +    super(allocator, toContainer(va, allocator));
    +  }
    +
    +  private static VectorContainer toContainer(VectorAccessible va, BufferAllocator allocator) {
    +    VectorContainer container = VectorContainer.getTransferClone(va, allocator);
    +    container.buildSchema(SelectionVectorMode.NONE);
    +    container.setRecordCount(va.getRecordCount());
    +    return container;
    +  }
    +
    +  @Override
    +  public void allocate(int recordCount) {
    +    for (final ValueVector v : valueVectors) {
    +      AllocationHelper.allocate(v, recordCount, 50, 10);
    +    }
    +  }
    +
    +  @Override
    +  public void setRowCount(int rowCount) {
    +    container.setRecordCount(rowCount);
    +    for (VectorWrapper<?> w : container) {
    +      w.getValueVector().getMutator().setValueCount(rowCount);
    +    }
    +  }
    +
    +  @Override
    +  public RowSetWriter writer() {
    +    return writer(10);
    +  }
    +
    +  @Override
    +  public RowSetWriter writer(int initialRowCount) {
    +    if (container.hasRecordCount()) {
    +      throw new IllegalStateException("Row set already contains data");
    +    }
    +    allocate(initialRowCount);
    +    return buildWriter(new ExtendableRowIndex(Character.MAX_VALUE));
    +  }
    +
    +  /**
    +   * Build writer objects for each column based on the column type.
    +   *
    +   * @param rowIndex the index which points to each row
    +   * @return an array of writers
    +   */
    +
    +  protected RowSetWriter buildWriter(ExtendableRowIndex rowIndex) {
    +    ValueVector[] valueVectors = vectors();
    +    AbstractColumnWriter[] writers = new AbstractColumnWriter[valueVectors.length];
    +    int posn = 0;
    +    for (int i = 0; i < writers.length; i++) {
    +      writers[posn] = ColumnAccessorFactory.newWriter(valueVectors[i].getField().getType());
    +      writers[posn].bind(rowIndex, valueVectors[i]);
    --- End diff --
    
    Not sure if the during merge it ended up to older change. But still I see `posn` instead of `i` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r108808598
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/AbstractSingleRowSet.java ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.expr.TypeHelper;
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.physical.impl.spill.RecordBatchSizer;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.complex.MapVector;
    +import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
    +import org.apache.drill.test.rowSet.RowSetSchema.LogicalColumn;
    +import org.apache.drill.test.rowSet.RowSetSchema.PhysicalSchema;
    +
    +public abstract class AbstractSingleRowSet extends AbstractRowSet implements SingleRowSet {
    +
    +  public abstract static class StructureBuilder {
    +    protected final PhysicalSchema schema;
    +    protected final BufferAllocator allocator;
    +    protected final ValueVector[] valueVectors;
    +    protected final MapVector[] mapVectors;
    +    protected int vectorIndex;
    +    protected int mapIndex;
    +
    +    public StructureBuilder(BufferAllocator allocator, RowSetSchema schema) {
    +      this.allocator = allocator;
    +      this.schema = schema.physical();
    +      valueVectors = new ValueVector[schema.access().count()];
    +      if (schema.access().mapCount() == 0) {
    +        mapVectors = null;
    +      } else {
    +        mapVectors = new MapVector[schema.access().mapCount()];
    +      }
    +    }
    +  }
    +
    +  public static class VectorBuilder extends StructureBuilder {
    +
    +    public VectorBuilder(BufferAllocator allocator, RowSetSchema schema) {
    +      super(allocator, schema);
    +    }
    +
    +    public ValueVector[] buildContainer(VectorContainer container) {
    +      for (int i = 0; i < schema.count(); i++) {
    +        LogicalColumn colSchema = schema.column(i);
    +        @SuppressWarnings("resource")
    +        ValueVector v = TypeHelper.getNewVector(colSchema.field, allocator, null);
    +        container.add(v);
    +        if (colSchema.field.getType().getMinorType() == MinorType.MAP) {
    +          MapVector mv = (MapVector) v;
    +          mapVectors[mapIndex++] = mv;
    +          buildMap(mv, colSchema.mapSchema);
    +        } else {
    +          valueVectors[vectorIndex++] = v;
    +        }
    +      }
    +      container.buildSchema(SelectionVectorMode.NONE);
    +      return valueVectors;
    +    }
    +
    +    private void buildMap(MapVector mapVector, PhysicalSchema mapSchema) {
    +      for (int i = 0; i < mapSchema.count(); i++) {
    +        LogicalColumn colSchema = mapSchema.column(i);
    +        MajorType type = colSchema.field.getType();
    +        Class<? extends ValueVector> vectorClass = TypeHelper.getValueVectorClass(type.getMinorType(), type.getMode());
    +        @SuppressWarnings("resource")
    +        ValueVector v = mapVector.addOrGet(colSchema.field.getName(), type, vectorClass);
    +        if (type.getMinorType() == MinorType.MAP) {
    +          MapVector mv = (MapVector) v;
    +          mapVectors[mapIndex++] = mv;
    +          buildMap(mv, colSchema.mapSchema);
    +        } else {
    +          valueVectors[vectorIndex++] = v;
    +        }
    +      }
    +    }
    +  }
    +
    +  public static class VectorMapper extends StructureBuilder {
    +
    +    public VectorMapper(BufferAllocator allocator, RowSetSchema schema) {
    +      super(allocator, schema);
    +    }
    +
    +    public ValueVector[] mapContainer(VectorContainer container) {
    +      for (VectorWrapper<?> w : container) {
    +        @SuppressWarnings("resource")
    +        ValueVector v = w.getValueVector();
    +        if (v.getField().getType().getMinorType() == MinorType.MAP) {
    +          MapVector mv = (MapVector) v;
    +          mapVectors[mapIndex++] = mv;
    +          buildMap(mv);
    +        } else {
    +          valueVectors[vectorIndex++] = v;
    +        }
    +      }
    +      return valueVectors;
    +    }
    +
    +    private void buildMap(MapVector mapVector) {
    +      for (ValueVector v : mapVector) {
    +        if (v.getField().getType().getMinorType() == MinorType.MAP) {
    +          MapVector mv = (MapVector) v;
    +          mapVectors[mapIndex++] = mv;
    +          buildMap(mv);
    +        } else {
    +          valueVectors[vectorIndex++] = v;
    +        }
    +      }
    +    }
    +  }
    +
    +  protected final ValueVector[] valueVectors;
    +
    +  public AbstractSingleRowSet(BufferAllocator allocator, BatchSchema schema) {
    +    super(allocator, schema, new VectorContainer());
    +    valueVectors = new VectorBuilder(allocator, super.schema).buildContainer(container);
    +  }
    +
    +  public AbstractSingleRowSet(BufferAllocator allocator, VectorContainer container) {
    +    super(allocator, container.getSchema(), container);
    +    valueVectors = new VectorMapper(allocator, super.schema).mapContainer(container);
    +  }
    +
    +  public AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
    +    super(rowSet.allocator, rowSet.schema.batch(), rowSet.container);
    +    valueVectors = rowSet.valueVectors;
    +  }
    +
    +  @Override
    +  public ValueVector[] vectors() { return valueVectors; }
    +
    +  @Override
    +  public int getSize() {
    +    RecordBatchSizer sizer = new RecordBatchSizer(container);
    +    return sizer.actualSize();
    --- End diff --
    
    This method internally access the container's recordCount but while building the container I don't see we are setting the RecordCount explicitly by calling setRecordCount on it. Neither it looks like while adding valueVector inside the container, that count is maintained. Shouldn't we set the recordCount ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r108758552
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/HyperRowSetImpl.java ---
    @@ -0,0 +1,221 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.HyperVectorWrapper;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.AccessorUtilities;
    +import org.apache.drill.exec.vector.complex.AbstractMapVector;
    +import org.apache.drill.test.rowSet.AbstractRowSetAccessor.BoundedRowIndex;
    +import org.apache.drill.test.rowSet.RowSet.HyperRowSet;
    +import org.apache.drill.test.rowSet.RowSetSchema.LogicalColumn;
    +import org.apache.drill.test.rowSet.RowSetSchema.PhysicalSchema;
    +
    +public class HyperRowSetImpl extends AbstractRowSet implements HyperRowSet {
    +
    +  public static class HyperRowIndex extends BoundedRowIndex {
    +
    +    private final SelectionVector4 sv4;
    +
    +    public HyperRowIndex(SelectionVector4 sv4) {
    +      super(sv4.getCount());
    +      this.sv4 = sv4;
    +    }
    +
    +    @Override
    +    public int index() {
    +      return AccessorUtilities.sv4Index(sv4.get(rowIndex));
    +    }
    +
    +    @Override
    +    public int batch( ) {
    +      return AccessorUtilities.sv4Batch(sv4.get(rowIndex));
    +    }
    +  }
    +
    +  /**
    +   * Build a hyper row set by restructuring a hyper vector bundle into a uniform
    +   * shape. Consider this schema: <pre><code>
    +   * { a: 10, b: { c: 20, d: { e: 30 } } }</code></pre>
    +   * <p>
    +   * The hyper container, with two batches, has this structure:
    +   * <table border="1">
    +   * <tr><th>Batch</th><th>a</th><th>b</th></tr>
    +   * <tr><td>0</td><td>Int vector</td><td>Map Vector(Int vector, Map Vector(Int vector))</td></th>
    +   * <tr><td>1</td><td>Int vector</td><td>Map Vector(Int vector, Map Vector(Int vector))</td></th>
    +   * </table>
    +   * <p>
    +   * The above table shows that top-level scalar vectors (such as the Int Vector for column
    +   * a) appear "end-to-end" as a hyper-vector. Maps also appear end-to-end. But, the
    +   * contents of the map (column c) do not appear end-to-end. Instead, they appear as
    +   * contents in the map vector. To get to c, one indexes into the map vector, steps inside
    +   * the map to find c and indexes to the right row.
    +   * <p>
    +   * Similarly, the maps for d do not appear end-to-end, one must step to the right batch
    +   * in b, then step to d.
    +   * <p>
    +   * Finally, to get to e, one must step
    +   * into the hyper vector for b, then steps to the proper batch, steps to d, step to e
    +   * and finally step to the row within e. This is a very complex, costly indexing scheme
    +   * that differs depending on map nesting depth.
    +   * <p>
    +   * To simplify access, this class restructures the maps to flatten the scalar vectors
    +   * into end-to-end hyper vectors. For example, for the above:
    +   * <p>
    +   * <table border="1">
    +   * <tr><th>Batch</th><th>a</th><th>c</th><th>d</th></tr>
    +   * <tr><td>0</td><td>Int vector</td><td>Int vector</td><td>Int vector</td></th>
    +   * <tr><td>1</td><td>Int vector</td><td>Int vector</td><td>Int vector</td></th>
    +   * </table>
    +   *
    +   * The maps are still available as hyper vectors, but separated into map fields.
    +   * (Scalar access no longer needs to access the maps.) The result is a uniform
    +   * addressing scheme for both top-level and nested vectors.
    +   */
    +
    +  public static class HyperVectorBuilder {
    +
    +    protected final HyperVectorWrapper<?> valueVectors[];
    +    protected final HyperVectorWrapper<AbstractMapVector> mapVectors[];
    +    private final List<ValueVector> nestedScalars[];
    +    private int vectorIndex;
    +    private int mapIndex;
    +    private final PhysicalSchema physicalSchema;
    +
    +    @SuppressWarnings("unchecked")
    +    public HyperVectorBuilder(RowSetSchema schema) {
    +      physicalSchema = schema.physical();
    +      valueVectors = new HyperVectorWrapper<?>[schema.access().count()];
    +      if (schema.access().mapCount() == 0) {
    +        mapVectors = null;
    +        nestedScalars = null;
    +      } else {
    +        mapVectors = (HyperVectorWrapper<AbstractMapVector>[])
    +            new HyperVectorWrapper<?>[schema.access().mapCount()];
    +        nestedScalars = new ArrayList[schema.access().count()];
    +      }
    +    }
    +
    +    @SuppressWarnings("unchecked")
    +    public HyperVectorWrapper<ValueVector>[] mapContainer(VectorContainer container) {
    +      int i = 0;
    +      for (VectorWrapper<?> w : container) {
    +        HyperVectorWrapper<?> hvw = (HyperVectorWrapper<?>) w;
    +        if (w.getField().getType().getMinorType() == MinorType.MAP) {
    +          HyperVectorWrapper<AbstractMapVector> mw = (HyperVectorWrapper<AbstractMapVector>) hvw;
    +          mapVectors[mapIndex++] = mw;
    +          buildHyperMap(physicalSchema.column(i).mapSchema(), mw);
    --- End diff --
    
    we are not checking if physicalSchema has any column or not before accessing. While constructing RowSchema using batchSchema if there are no fields in batchSchema, then physicalSchema still has reference to valid object but it's count will be zero.
    
    Hence the access here _physicalSchema.column(i)_ can throw IndexOutOfBound exception ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111865581
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
    @@ -0,0 +1,240 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
    +import org.apache.drill.exec.vector.accessor.impl.TupleWriterImpl;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +
    +/**
    + * Implementation of a single row set with no indirection (selection)
    + * vector.
    + */
    +
    +public class DirectRowSet extends AbstractSingleRowSet implements ExtendableRowSet {
    +
    +  /**
    +   * Reader index that points directly to each row in the row set.
    +   * This index starts with pointing to the -1st row, so that the
    +   * reader can require a <tt>next()</tt> for every row, including
    +   * the first. (This is the JDBC RecordSet convention.)
    +   */
    +
    +  private static class DirectRowIndex extends BoundedRowIndex {
    +
    +    public DirectRowIndex(int rowCount) {
    +      super(rowCount);
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Writer index that points to each row in the row set. The index starts at
    +   * the 0th row and advances one row on each increment. This allows writers to
    +   * start positioned at the first row. Writes happen in the current row.
    +   * Calling <tt>next()</tt> advances to the next position, effectively saving
    +   * the current row. The most recent row can be abandoned easily simply by not
    +   * calling <tt>next()</tt>. This means that the number of completed rows is
    +   * the same as the row index.
    +   */
    +
    +  private static class ExtendableRowIndex extends RowSetIndex {
    +
    +    private final int maxSize;
    +
    +    public ExtendableRowIndex(int maxSize) {
    +      this.maxSize = maxSize;
    +      rowIndex = 0;
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public boolean next() {
    +      if (++rowIndex <= maxSize ) {
    +        return true;
    +      } else {
    +        rowIndex--;
    +        return false;
    +      }
    +    }
    +
    +    @Override
    +    public int size() { return rowIndex; }
    +
    +    @Override
    +    public boolean valid() { return rowIndex < maxSize; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Implementation of a row set writer. Only available for newly-created,
    +   * empty, direct, single row sets. Rewriting is not allowed, nor is writing
    +   * to a hyper row set.
    +   */
    +
    +  public class RowSetWriterImpl extends TupleWriterImpl implements RowSetWriter {
    +
    +    private final ExtendableRowIndex index;
    +    private final ExtendableRowSet rowSet;
    +
    +    protected RowSetWriterImpl(ExtendableRowSet rowSet, TupleSchema schema, ExtendableRowIndex index, AbstractColumnWriter[] writers) {
    +      super(schema, writers);
    +      this.rowSet = rowSet;
    +      this.index = index;
    +      start();
    +    }
    +
    +    @Override
    +    public void setRow(Object...values) {
    +      if (! index.valid()) {
    +        throw new IndexOutOfBoundsException("Write past end of row set");
    +      }
    +      for (int i = 0; i < values.length;  i++) {
    +        set(i, values[i]);
    +      }
    +      save();
    +    }
    +
    +    @Override
    +    public boolean valid() { return index.valid(); }
    +
    +    @Override
    +    public int index() { return index.position(); }
    +
    +    @Override
    +    public void save() {
    +      index.next();
    +      start();
    +    }
    +
    +    @Override
    +    public void done() {
    +      rowSet.setRowCount(index.size());
    +    }
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, BatchSchema schema) {
    +    super(allocator, schema);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorContainer container) {
    +    super(allocator, container);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorAccessible va) {
    +    super(allocator, toContainer(va, allocator));
    +  }
    +
    +  private static VectorContainer toContainer(VectorAccessible va, BufferAllocator allocator) {
    +    VectorContainer container = VectorContainer.getTransferClone(va, allocator);
    +    container.buildSchema(SelectionVectorMode.NONE);
    +    container.setRecordCount(va.getRecordCount());
    +    return container;
    +  }
    +
    +  @Override
    +  public void allocate(int recordCount) {
    +    for (final ValueVector v : valueVectors) {
    +      AllocationHelper.allocate(v, recordCount, 50, 10);
    +    }
    +  }
    +
    +  @Override
    +  public void setRowCount(int rowCount) {
    +    container.setRecordCount(rowCount);
    +    for (VectorWrapper<?> w : container) {
    +      w.getValueVector().getMutator().setValueCount(rowCount);
    +    }
    +  }
    +
    +  @Override
    +  public RowSetWriter writer() {
    +    return writer(10);
    +  }
    +
    +  @Override
    +  public RowSetWriter writer(int initialRowCount) {
    +    if (container.hasRecordCount()) {
    +      throw new IllegalStateException("Row set already contains data");
    +    }
    +    allocate(initialRowCount);
    +    return buildWriter(new ExtendableRowIndex(Character.MAX_VALUE));
    +  }
    +
    +  /**
    +   * Build writer objects for each column based on the column type.
    +   *
    +   * @param rowIndex the index which points to each row
    +   * @return an array of writers
    +   */
    +
    +  protected RowSetWriter buildWriter(ExtendableRowIndex rowIndex) {
    +    ValueVector[] valueVectors = vectors();
    +    AbstractColumnWriter[] writers = new AbstractColumnWriter[valueVectors.length];
    +    int posn = 0;
    +    for (int i = 0; i < writers.length; i++) {
    +      writers[posn] = ColumnAccessorFactory.newWriter(valueVectors[i].getField().getType());
    +      writers[posn].bind(rowIndex, valueVectors[i]);
    --- End diff --
    
    Left over from an earlier version. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111005446
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
    @@ -0,0 +1,240 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
    +import org.apache.drill.exec.vector.accessor.impl.TupleWriterImpl;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +
    +/**
    + * Implementation of a single row set with no indirection (selection)
    + * vector.
    + */
    +
    +public class DirectRowSet extends AbstractSingleRowSet implements ExtendableRowSet {
    +
    +  /**
    +   * Reader index that points directly to each row in the row set.
    +   * This index starts with pointing to the -1st row, so that the
    +   * reader can require a <tt>next()</tt> for every row, including
    +   * the first. (This is the JDBC RecordSet convention.)
    +   */
    +
    +  private static class DirectRowIndex extends BoundedRowIndex {
    +
    +    public DirectRowIndex(int rowCount) {
    +      super(rowCount);
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Writer index that points to each row in the row set. The index starts at
    +   * the 0th row and advances one row on each increment. This allows writers to
    +   * start positioned at the first row. Writes happen in the current row.
    +   * Calling <tt>next()</tt> advances to the next position, effectively saving
    +   * the current row. The most recent row can be abandoned easily simply by not
    +   * calling <tt>next()</tt>. This means that the number of completed rows is
    +   * the same as the row index.
    +   */
    +
    +  private static class ExtendableRowIndex extends RowSetIndex {
    +
    +    private final int maxSize;
    +
    +    public ExtendableRowIndex(int maxSize) {
    +      this.maxSize = maxSize;
    +      rowIndex = 0;
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public boolean next() {
    +      if (++rowIndex <= maxSize ) {
    +        return true;
    +      } else {
    +        rowIndex--;
    +        return false;
    +      }
    +    }
    +
    +    @Override
    +    public int size() { return rowIndex; }
    +
    +    @Override
    +    public boolean valid() { return rowIndex < maxSize; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Implementation of a row set writer. Only available for newly-created,
    +   * empty, direct, single row sets. Rewriting is not allowed, nor is writing
    +   * to a hyper row set.
    +   */
    +
    +  public class RowSetWriterImpl extends TupleWriterImpl implements RowSetWriter {
    +
    +    private final ExtendableRowIndex index;
    +    private final ExtendableRowSet rowSet;
    +
    +    protected RowSetWriterImpl(ExtendableRowSet rowSet, TupleSchema schema, ExtendableRowIndex index, AbstractColumnWriter[] writers) {
    +      super(schema, writers);
    +      this.rowSet = rowSet;
    +      this.index = index;
    +      start();
    +    }
    +
    +    @Override
    +    public void setRow(Object...values) {
    +      if (! index.valid()) {
    +        throw new IndexOutOfBoundsException("Write past end of row set");
    +      }
    +      for (int i = 0; i < values.length;  i++) {
    +        set(i, values[i]);
    +      }
    +      save();
    +    }
    +
    +    @Override
    +    public boolean valid() { return index.valid(); }
    +
    +    @Override
    +    public int index() { return index.position(); }
    +
    +    @Override
    +    public void save() {
    +      index.next();
    +      start();
    +    }
    +
    +    @Override
    +    public void done() {
    +      rowSet.setRowCount(index.size());
    +    }
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, BatchSchema schema) {
    +    super(allocator, schema);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorContainer container) {
    +    super(allocator, container);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorAccessible va) {
    +    super(allocator, toContainer(va, allocator));
    +  }
    +
    +  private static VectorContainer toContainer(VectorAccessible va, BufferAllocator allocator) {
    +    VectorContainer container = VectorContainer.getTransferClone(va, allocator);
    +    container.buildSchema(SelectionVectorMode.NONE);
    +    container.setRecordCount(va.getRecordCount());
    +    return container;
    +  }
    +
    +  @Override
    +  public void allocate(int recordCount) {
    +    for (final ValueVector v : valueVectors) {
    +      AllocationHelper.allocate(v, recordCount, 50, 10);
    +    }
    +  }
    +
    +  @Override
    +  public void setRowCount(int rowCount) {
    +    container.setRecordCount(rowCount);
    +    for (VectorWrapper<?> w : container) {
    +      w.getValueVector().getMutator().setValueCount(rowCount);
    +    }
    +  }
    +
    +  @Override
    +  public RowSetWriter writer() {
    +    return writer(10);
    +  }
    +
    +  @Override
    +  public RowSetWriter writer(int initialRowCount) {
    +    if (container.hasRecordCount()) {
    +      throw new IllegalStateException("Row set already contains data");
    +    }
    +    allocate(initialRowCount);
    +    return buildWriter(new ExtendableRowIndex(Character.MAX_VALUE));
    +  }
    +
    +  /**
    +   * Build writer objects for each column based on the column type.
    +   *
    +   * @param rowIndex the index which points to each row
    +   * @return an array of writers
    +   */
    +
    +  protected RowSetWriter buildWriter(ExtendableRowIndex rowIndex) {
    +    ValueVector[] valueVectors = vectors();
    +    AbstractColumnWriter[] writers = new AbstractColumnWriter[valueVectors.length];
    +    int posn = 0;
    +    for (int i = 0; i < writers.length; i++) {
    +      writers[posn] = ColumnAccessorFactory.newWriter(valueVectors[i].getField().getType());
    +      writers[posn].bind(rowIndex, valueVectors[i]);
    +      posn++;
    +    }
    +    TupleSchema accessSchema = schema().hierarchicalAccess();
    +    return new RowSetWriterImpl(this, accessSchema, rowIndex, writers);
    +  }
    +
    +  @Override
    +  public RowSetReader reader() {
    +    return buildReader(new DirectRowIndex(rowCount()));
    +  }
    +
    +  @Override
    +  public boolean isExtendable() { return true; }
    +
    +  @Override
    +  public boolean isWritable() { return true; }
    +
    --- End diff --
    
    Shouldn't `isWritable` be based on if the `WriterIndex` is valid or not ? 
    
    Also am not sure what you mean by `isExtendable` here ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/785
  
    Rebased onto latest master and squashed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111003542
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
    @@ -0,0 +1,240 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
    +import org.apache.drill.exec.vector.accessor.impl.TupleWriterImpl;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +
    +/**
    + * Implementation of a single row set with no indirection (selection)
    + * vector.
    + */
    +
    +public class DirectRowSet extends AbstractSingleRowSet implements ExtendableRowSet {
    +
    +  /**
    +   * Reader index that points directly to each row in the row set.
    +   * This index starts with pointing to the -1st row, so that the
    +   * reader can require a <tt>next()</tt> for every row, including
    +   * the first. (This is the JDBC RecordSet convention.)
    +   */
    +
    +  private static class DirectRowIndex extends BoundedRowIndex {
    +
    +    public DirectRowIndex(int rowCount) {
    +      super(rowCount);
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Writer index that points to each row in the row set. The index starts at
    +   * the 0th row and advances one row on each increment. This allows writers to
    +   * start positioned at the first row. Writes happen in the current row.
    +   * Calling <tt>next()</tt> advances to the next position, effectively saving
    +   * the current row. The most recent row can be abandoned easily simply by not
    +   * calling <tt>next()</tt>. This means that the number of completed rows is
    +   * the same as the row index.
    +   */
    +
    +  private static class ExtendableRowIndex extends RowSetIndex {
    +
    +    private final int maxSize;
    +
    +    public ExtendableRowIndex(int maxSize) {
    +      this.maxSize = maxSize;
    +      rowIndex = 0;
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public boolean next() {
    +      if (++rowIndex <= maxSize ) {
    +        return true;
    +      } else {
    +        rowIndex--;
    +        return false;
    +      }
    +    }
    +
    +    @Override
    +    public int size() { return rowIndex; }
    +
    +    @Override
    +    public boolean valid() { return rowIndex < maxSize; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Implementation of a row set writer. Only available for newly-created,
    +   * empty, direct, single row sets. Rewriting is not allowed, nor is writing
    +   * to a hyper row set.
    +   */
    +
    +  public class RowSetWriterImpl extends TupleWriterImpl implements RowSetWriter {
    +
    +    private final ExtendableRowIndex index;
    +    private final ExtendableRowSet rowSet;
    +
    +    protected RowSetWriterImpl(ExtendableRowSet rowSet, TupleSchema schema, ExtendableRowIndex index, AbstractColumnWriter[] writers) {
    +      super(schema, writers);
    +      this.rowSet = rowSet;
    +      this.index = index;
    +      start();
    +    }
    +
    +    @Override
    +    public void setRow(Object...values) {
    +      if (! index.valid()) {
    +        throw new IndexOutOfBoundsException("Write past end of row set");
    +      }
    +      for (int i = 0; i < values.length;  i++) {
    +        set(i, values[i]);
    +      }
    +      save();
    +    }
    +
    +    @Override
    +    public boolean valid() { return index.valid(); }
    +
    +    @Override
    +    public int index() { return index.position(); }
    +
    +    @Override
    +    public void save() {
    +      index.next();
    +      start();
    +    }
    +
    +    @Override
    +    public void done() {
    +      rowSet.setRowCount(index.size());
    +    }
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, BatchSchema schema) {
    +    super(allocator, schema);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorContainer container) {
    +    super(allocator, container);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorAccessible va) {
    +    super(allocator, toContainer(va, allocator));
    +  }
    +
    +  private static VectorContainer toContainer(VectorAccessible va, BufferAllocator allocator) {
    +    VectorContainer container = VectorContainer.getTransferClone(va, allocator);
    +    container.buildSchema(SelectionVectorMode.NONE);
    +    container.setRecordCount(va.getRecordCount());
    +    return container;
    +  }
    +
    +  @Override
    +  public void allocate(int recordCount) {
    +    for (final ValueVector v : valueVectors) {
    +      AllocationHelper.allocate(v, recordCount, 50, 10);
    +    }
    +  }
    +
    +  @Override
    +  public void setRowCount(int rowCount) {
    +    container.setRecordCount(rowCount);
    +    for (VectorWrapper<?> w : container) {
    --- End diff --
    
    Why not use VectorAccessibleUtilities::setValueCount ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111867555
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java ---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    +    }
    +
    +    public T get(String key) {
    +      Integer index = getIndex(key);
    +      if (index == -1) {
    +        return null;
    +      }
    +      return get(index);
    +    }
    +
    +    public int getIndex(String key) {
    +      Integer index = nameSpace.get(key);
    +      if (index == null) {
    +        return -1;
    +      }
    +      return index;
    +    }
    +
    +    public int count() { return columns.size(); }
    +  }
    +
    +  private static class TupleSchemaImpl implements TupleSchema {
    +
    +    private NameSpace<LogicalColumn> columns;
    +
    +    public TupleSchemaImpl(NameSpace<LogicalColumn> ns) {
    +      this.columns = ns;
    +    }
    +
    +    @Override
    +    public MaterializedField column(int index) {
    +      return logicalColumn(index).field();
    --- End diff --
    
    No need. If index is invalid, the logicalColumn() method will throw an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110056242
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Builder of a row set schema expressed as a list of materialized
    + * fields. Optimized for use when creating schemas by hand in tests.
    + * <p>
    + * Example usage to create the following schema: <br>
    + * <tt>(c: INT, a: MAP(c: VARCHAR, d: INT, e: MAP(f: VARCHAR), g: INT), h: BIGINT)</tt>
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on the issue:

    https://github.com/apache/drill/pull/785
  
    Thanks for the changes, LGTM.
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111868661
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Builder of a row set schema expressed as a list of materialized
    + * fields. Optimized for use when creating schemas by hand in tests.
    + * <p>
    + * Example usage to create the following schema: <br>
    + * <tt>(c: INT, a: MAP(c: VARCHAR, d: INT, e: MAP(f: VARCHAR), g: INT), h: BIGINT)</tt>
    --- End diff --
    
    Fixed again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110055632
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/HyperRowSetImpl.java ---
    @@ -0,0 +1,221 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.HyperVectorWrapper;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.AccessorUtilities;
    +import org.apache.drill.exec.vector.complex.AbstractMapVector;
    +import org.apache.drill.test.rowSet.AbstractRowSetAccessor.BoundedRowIndex;
    +import org.apache.drill.test.rowSet.RowSet.HyperRowSet;
    +import org.apache.drill.test.rowSet.RowSetSchema.LogicalColumn;
    +import org.apache.drill.test.rowSet.RowSetSchema.PhysicalSchema;
    +
    +public class HyperRowSetImpl extends AbstractRowSet implements HyperRowSet {
    +
    +  public static class HyperRowIndex extends BoundedRowIndex {
    +
    +    private final SelectionVector4 sv4;
    +
    +    public HyperRowIndex(SelectionVector4 sv4) {
    +      super(sv4.getCount());
    +      this.sv4 = sv4;
    +    }
    +
    +    @Override
    +    public int index() {
    +      return AccessorUtilities.sv4Index(sv4.get(rowIndex));
    +    }
    +
    +    @Override
    +    public int batch( ) {
    +      return AccessorUtilities.sv4Batch(sv4.get(rowIndex));
    +    }
    +  }
    +
    +  /**
    +   * Build a hyper row set by restructuring a hyper vector bundle into a uniform
    +   * shape. Consider this schema: <pre><code>
    +   * { a: 10, b: { c: 20, d: { e: 30 } } }</code></pre>
    +   * <p>
    +   * The hyper container, with two batches, has this structure:
    +   * <table border="1">
    +   * <tr><th>Batch</th><th>a</th><th>b</th></tr>
    +   * <tr><td>0</td><td>Int vector</td><td>Map Vector(Int vector, Map Vector(Int vector))</td></th>
    +   * <tr><td>1</td><td>Int vector</td><td>Map Vector(Int vector, Map Vector(Int vector))</td></th>
    +   * </table>
    +   * <p>
    +   * The above table shows that top-level scalar vectors (such as the Int Vector for column
    +   * a) appear "end-to-end" as a hyper-vector. Maps also appear end-to-end. But, the
    +   * contents of the map (column c) do not appear end-to-end. Instead, they appear as
    +   * contents in the map vector. To get to c, one indexes into the map vector, steps inside
    +   * the map to find c and indexes to the right row.
    +   * <p>
    +   * Similarly, the maps for d do not appear end-to-end, one must step to the right batch
    +   * in b, then step to d.
    +   * <p>
    +   * Finally, to get to e, one must step
    +   * into the hyper vector for b, then steps to the proper batch, steps to d, step to e
    +   * and finally step to the row within e. This is a very complex, costly indexing scheme
    +   * that differs depending on map nesting depth.
    +   * <p>
    +   * To simplify access, this class restructures the maps to flatten the scalar vectors
    +   * into end-to-end hyper vectors. For example, for the above:
    +   * <p>
    +   * <table border="1">
    +   * <tr><th>Batch</th><th>a</th><th>c</th><th>d</th></tr>
    +   * <tr><td>0</td><td>Int vector</td><td>Int vector</td><td>Int vector</td></th>
    +   * <tr><td>1</td><td>Int vector</td><td>Int vector</td><td>Int vector</td></th>
    +   * </table>
    +   *
    +   * The maps are still available as hyper vectors, but separated into map fields.
    +   * (Scalar access no longer needs to access the maps.) The result is a uniform
    +   * addressing scheme for both top-level and nested vectors.
    +   */
    +
    +  public static class HyperVectorBuilder {
    +
    +    protected final HyperVectorWrapper<?> valueVectors[];
    +    protected final HyperVectorWrapper<AbstractMapVector> mapVectors[];
    +    private final List<ValueVector> nestedScalars[];
    +    private int vectorIndex;
    +    private int mapIndex;
    +    private final PhysicalSchema physicalSchema;
    +
    +    @SuppressWarnings("unchecked")
    +    public HyperVectorBuilder(RowSetSchema schema) {
    +      physicalSchema = schema.physical();
    +      valueVectors = new HyperVectorWrapper<?>[schema.access().count()];
    +      if (schema.access().mapCount() == 0) {
    +        mapVectors = null;
    +        nestedScalars = null;
    +      } else {
    +        mapVectors = (HyperVectorWrapper<AbstractMapVector>[])
    +            new HyperVectorWrapper<?>[schema.access().mapCount()];
    +        nestedScalars = new ArrayList[schema.access().count()];
    +      }
    +    }
    +
    +    @SuppressWarnings("unchecked")
    +    public HyperVectorWrapper<ValueVector>[] mapContainer(VectorContainer container) {
    +      int i = 0;
    +      for (VectorWrapper<?> w : container) {
    +        HyperVectorWrapper<?> hvw = (HyperVectorWrapper<?>) w;
    +        if (w.getField().getType().getMinorType() == MinorType.MAP) {
    +          HyperVectorWrapper<AbstractMapVector> mw = (HyperVectorWrapper<AbstractMapVector>) hvw;
    +          mapVectors[mapIndex++] = mw;
    +          buildHyperMap(physicalSchema.column(i).mapSchema(), mw);
    --- End diff --
    
    The physical columns have a 1:1 correspondence to vector wrappers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110783017
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java ---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    --- End diff --
    
    Can throw IndexOutOfBoundsException. Check for index with columns.size()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r108768096
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetUtilities.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.accessor.AccessorUtilities;
    +import org.apache.drill.exec.vector.accessor.ColumnAccessor.ValueType;
    +import org.apache.drill.exec.vector.accessor.ColumnWriter;
    +import org.apache.drill.test.rowSet.RowSet.RowSetWriter;
    +import org.joda.time.Duration;
    +import org.joda.time.Period;
    +
    +public class RowSetUtilities {
    +
    +  private RowSetUtilities() { }
    +
    +  public static void reverse(SelectionVector2 sv2) {
    +    int count = sv2.getCount();
    +    for (int i = 0; i < count / 2; i++) {
    +      char temp = sv2.getIndex(i);
    +      int dest = count - 1 - i;
    +      sv2.setIndex(i, sv2.getIndex(dest));
    +      sv2.setIndex(dest, temp);
    +    }
    +  }
    +
    +  /**
    +   * Set a test data value from an int. Uses the type information of the
    +   * column to handle interval types. Else, uses the value type of the
    +   * accessor. The value set here is purely for testing; the mapping
    +   * from ints to intervals has no real meaning.
    +   *
    +   * @param rowWriter
    +   * @param index
    +   * @param value
    +   */
    +
    +  public static void setFromInt(RowSetWriter rowWriter, int index, int value) {
    +    ColumnWriter writer = rowWriter.column(index);
    +    if (writer.valueType() == ValueType.PERIOD) {
    +      setPeriodFromInt(writer, rowWriter.schema().column(index).getType().getMinorType(), value);
    +    } else {
    +      AccessorUtilities.setFromInt(writer, value);
    +    }
    +  }
    +
    +  public static void setPeriodFromInt(ColumnWriter writer, MinorType minorType,
    +      int value) {
    +    switch (minorType) {
    +    case INTERVAL:
    +      writer.setPeriod(Duration.millis(value).toPeriod());
    +      break;
    +    case INTERVALYEAR:
    +      writer.setPeriod(Period.years(value / 12).withMonths(value % 12));
    +      break;
    +    case INTERVALDAY:
    +      int sec = value % 60;
    +      value = value / 60;
    +      int min = value % 60;
    +      value = value / 60;
    +      writer.setPeriod(Period.days(value).withMinutes(min).withSeconds(sec));
    --- End diff --
    
    Looks like it's missing calculation to hours and then days


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111865207
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
    @@ -0,0 +1,153 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.test.rowSet.AbstractRowSetAccessor.AbstractRowIndex;
    +import org.apache.drill.test.rowSet.AbstractRowSetAccessor.BoundedRowIndex;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +
    +public class DirectRowSet extends AbstractSingleRowSet implements ExtendableRowSet {
    +
    +  private static class DirectRowIndex extends BoundedRowIndex {
    +
    +    public DirectRowIndex(int rowCount) {
    +      super(rowCount);
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public int batch() { return 0; }
    --- End diff --
    
    Technically that is true. Kept here to make clear that this is a single-batch index. (There are only three subclasses, so keeping it here means we have two copies.) Moving it to the parent says that even a hyper-batch defaults to a single row set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111865502
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
    @@ -0,0 +1,240 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
    +import org.apache.drill.exec.vector.accessor.impl.TupleWriterImpl;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +
    +/**
    + * Implementation of a single row set with no indirection (selection)
    + * vector.
    + */
    +
    +public class DirectRowSet extends AbstractSingleRowSet implements ExtendableRowSet {
    +
    +  /**
    +   * Reader index that points directly to each row in the row set.
    +   * This index starts with pointing to the -1st row, so that the
    +   * reader can require a <tt>next()</tt> for every row, including
    +   * the first. (This is the JDBC RecordSet convention.)
    +   */
    +
    +  private static class DirectRowIndex extends BoundedRowIndex {
    +
    +    public DirectRowIndex(int rowCount) {
    +      super(rowCount);
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Writer index that points to each row in the row set. The index starts at
    +   * the 0th row and advances one row on each increment. This allows writers to
    +   * start positioned at the first row. Writes happen in the current row.
    +   * Calling <tt>next()</tt> advances to the next position, effectively saving
    +   * the current row. The most recent row can be abandoned easily simply by not
    +   * calling <tt>next()</tt>. This means that the number of completed rows is
    +   * the same as the row index.
    +   */
    +
    +  private static class ExtendableRowIndex extends RowSetIndex {
    +
    +    private final int maxSize;
    +
    +    public ExtendableRowIndex(int maxSize) {
    +      this.maxSize = maxSize;
    +      rowIndex = 0;
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public boolean next() {
    +      if (++rowIndex <= maxSize ) {
    +        return true;
    +      } else {
    +        rowIndex--;
    +        return false;
    +      }
    +    }
    +
    +    @Override
    +    public int size() { return rowIndex; }
    +
    +    @Override
    +    public boolean valid() { return rowIndex < maxSize; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Implementation of a row set writer. Only available for newly-created,
    +   * empty, direct, single row sets. Rewriting is not allowed, nor is writing
    +   * to a hyper row set.
    +   */
    +
    +  public class RowSetWriterImpl extends TupleWriterImpl implements RowSetWriter {
    +
    +    private final ExtendableRowIndex index;
    +    private final ExtendableRowSet rowSet;
    +
    +    protected RowSetWriterImpl(ExtendableRowSet rowSet, TupleSchema schema, ExtendableRowIndex index, AbstractColumnWriter[] writers) {
    +      super(schema, writers);
    +      this.rowSet = rowSet;
    +      this.index = index;
    +      start();
    +    }
    +
    +    @Override
    +    public void setRow(Object...values) {
    +      if (! index.valid()) {
    +        throw new IndexOutOfBoundsException("Write past end of row set");
    +      }
    +      for (int i = 0; i < values.length;  i++) {
    +        set(i, values[i]);
    +      }
    +      save();
    +    }
    +
    +    @Override
    +    public boolean valid() { return index.valid(); }
    +
    +    @Override
    +    public int index() { return index.position(); }
    +
    +    @Override
    +    public void save() {
    +      index.next();
    +      start();
    +    }
    +
    +    @Override
    +    public void done() {
    +      rowSet.setRowCount(index.size());
    +    }
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, BatchSchema schema) {
    +    super(allocator, schema);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorContainer container) {
    +    super(allocator, container);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorAccessible va) {
    +    super(allocator, toContainer(va, allocator));
    +  }
    +
    +  private static VectorContainer toContainer(VectorAccessible va, BufferAllocator allocator) {
    +    VectorContainer container = VectorContainer.getTransferClone(va, allocator);
    +    container.buildSchema(SelectionVectorMode.NONE);
    +    container.setRecordCount(va.getRecordCount());
    +    return container;
    +  }
    +
    +  @Override
    +  public void allocate(int recordCount) {
    +    for (final ValueVector v : valueVectors) {
    +      AllocationHelper.allocate(v, recordCount, 50, 10);
    +    }
    +  }
    +
    +  @Override
    +  public void setRowCount(int rowCount) {
    +    container.setRecordCount(rowCount);
    +    for (VectorWrapper<?> w : container) {
    --- End diff --
    
    Good catch. Changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #785: DRILL-5323: Test tools for row sets

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on the issue:

    https://github.com/apache/drill/pull/785
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111865822
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
    @@ -0,0 +1,240 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
    +import org.apache.drill.exec.vector.accessor.impl.TupleWriterImpl;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +
    +/**
    + * Implementation of a single row set with no indirection (selection)
    + * vector.
    + */
    +
    +public class DirectRowSet extends AbstractSingleRowSet implements ExtendableRowSet {
    +
    +  /**
    +   * Reader index that points directly to each row in the row set.
    +   * This index starts with pointing to the -1st row, so that the
    +   * reader can require a <tt>next()</tt> for every row, including
    +   * the first. (This is the JDBC RecordSet convention.)
    +   */
    +
    +  private static class DirectRowIndex extends BoundedRowIndex {
    +
    +    public DirectRowIndex(int rowCount) {
    +      super(rowCount);
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Writer index that points to each row in the row set. The index starts at
    +   * the 0th row and advances one row on each increment. This allows writers to
    +   * start positioned at the first row. Writes happen in the current row.
    +   * Calling <tt>next()</tt> advances to the next position, effectively saving
    +   * the current row. The most recent row can be abandoned easily simply by not
    +   * calling <tt>next()</tt>. This means that the number of completed rows is
    +   * the same as the row index.
    +   */
    +
    +  private static class ExtendableRowIndex extends RowSetIndex {
    +
    +    private final int maxSize;
    +
    +    public ExtendableRowIndex(int maxSize) {
    +      this.maxSize = maxSize;
    +      rowIndex = 0;
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public boolean next() {
    +      if (++rowIndex <= maxSize ) {
    +        return true;
    +      } else {
    +        rowIndex--;
    +        return false;
    +      }
    +    }
    +
    +    @Override
    +    public int size() { return rowIndex; }
    +
    +    @Override
    +    public boolean valid() { return rowIndex < maxSize; }
    +
    +    @Override
    +    public int batch() { return 0; }
    +  }
    +
    +  /**
    +   * Implementation of a row set writer. Only available for newly-created,
    +   * empty, direct, single row sets. Rewriting is not allowed, nor is writing
    +   * to a hyper row set.
    +   */
    +
    +  public class RowSetWriterImpl extends TupleWriterImpl implements RowSetWriter {
    +
    +    private final ExtendableRowIndex index;
    +    private final ExtendableRowSet rowSet;
    +
    +    protected RowSetWriterImpl(ExtendableRowSet rowSet, TupleSchema schema, ExtendableRowIndex index, AbstractColumnWriter[] writers) {
    +      super(schema, writers);
    +      this.rowSet = rowSet;
    +      this.index = index;
    +      start();
    +    }
    +
    +    @Override
    +    public void setRow(Object...values) {
    +      if (! index.valid()) {
    +        throw new IndexOutOfBoundsException("Write past end of row set");
    +      }
    +      for (int i = 0; i < values.length;  i++) {
    +        set(i, values[i]);
    +      }
    +      save();
    +    }
    +
    +    @Override
    +    public boolean valid() { return index.valid(); }
    +
    +    @Override
    +    public int index() { return index.position(); }
    +
    +    @Override
    +    public void save() {
    +      index.next();
    +      start();
    +    }
    +
    +    @Override
    +    public void done() {
    +      rowSet.setRowCount(index.size());
    +    }
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, BatchSchema schema) {
    +    super(allocator, schema);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorContainer container) {
    +    super(allocator, container);
    +  }
    +
    +  public DirectRowSet(BufferAllocator allocator, VectorAccessible va) {
    +    super(allocator, toContainer(va, allocator));
    +  }
    +
    +  private static VectorContainer toContainer(VectorAccessible va, BufferAllocator allocator) {
    +    VectorContainer container = VectorContainer.getTransferClone(va, allocator);
    +    container.buildSchema(SelectionVectorMode.NONE);
    +    container.setRecordCount(va.getRecordCount());
    +    return container;
    +  }
    +
    +  @Override
    +  public void allocate(int recordCount) {
    +    for (final ValueVector v : valueVectors) {
    +      AllocationHelper.allocate(v, recordCount, 50, 10);
    +    }
    +  }
    +
    +  @Override
    +  public void setRowCount(int rowCount) {
    +    container.setRecordCount(rowCount);
    +    for (VectorWrapper<?> w : container) {
    +      w.getValueVector().getMutator().setValueCount(rowCount);
    +    }
    +  }
    +
    +  @Override
    +  public RowSetWriter writer() {
    +    return writer(10);
    +  }
    +
    +  @Override
    +  public RowSetWriter writer(int initialRowCount) {
    +    if (container.hasRecordCount()) {
    +      throw new IllegalStateException("Row set already contains data");
    +    }
    +    allocate(initialRowCount);
    +    return buildWriter(new ExtendableRowIndex(Character.MAX_VALUE));
    +  }
    +
    +  /**
    +   * Build writer objects for each column based on the column type.
    +   *
    +   * @param rowIndex the index which points to each row
    +   * @return an array of writers
    +   */
    +
    +  protected RowSetWriter buildWriter(ExtendableRowIndex rowIndex) {
    +    ValueVector[] valueVectors = vectors();
    +    AbstractColumnWriter[] writers = new AbstractColumnWriter[valueVectors.length];
    +    int posn = 0;
    +    for (int i = 0; i < writers.length; i++) {
    +      writers[posn] = ColumnAccessorFactory.newWriter(valueVectors[i].getField().getType());
    +      writers[posn].bind(rowIndex, valueVectors[i]);
    +      posn++;
    +    }
    +    TupleSchema accessSchema = schema().hierarchicalAccess();
    +    return new RowSetWriterImpl(this, accessSchema, rowIndex, writers);
    +  }
    +
    +  @Override
    +  public RowSetReader reader() {
    +    return buildReader(new DirectRowIndex(rowCount()));
    +  }
    +
    +  @Override
    +  public boolean isExtendable() { return true; }
    +
    +  @Override
    +  public boolean isWritable() { return true; }
    +
    --- End diff --
    
    Good point. I'm being a bit lazy. There really should be two classes: a DirectRowSet which is read-only, and an ExtendableRowSet which is writable (but not readable.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110056124
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetWriterImpl.java ---
    @@ -0,0 +1,110 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.math.BigDecimal;
    +
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.ColumnAccessorFactory;
    +import org.apache.drill.exec.vector.accessor.ColumnWriter;
    +import org.apache.drill.test.rowSet.RowSet.RowSetWriter;
    +import org.joda.time.Period;
    +
    +/**
    + * Implements a row set writer on top of a {@link RowSet}
    + * container.
    + */
    +
    +public class RowSetWriterImpl extends AbstractRowSetAccessor implements RowSetWriter {
    +
    +  private final AbstractColumnWriter writers[];
    +
    +  public RowSetWriterImpl(AbstractSingleRowSet recordSet, AbstractRowIndex rowIndex) {
    +    super(rowIndex, recordSet.schema().access());
    +    ValueVector[] valueVectors = recordSet.vectors();
    +    writers = new AbstractColumnWriter[valueVectors.length];
    +    int posn = 0;
    +    for (int i = 0; i < writers.length; i++) {
    +      writers[posn] = ColumnAccessorFactory.newWriter(valueVectors[i].getField().getType());
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r108823826
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetWriterImpl.java ---
    @@ -0,0 +1,110 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.math.BigDecimal;
    +
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.ColumnAccessorFactory;
    +import org.apache.drill.exec.vector.accessor.ColumnWriter;
    +import org.apache.drill.test.rowSet.RowSet.RowSetWriter;
    +import org.joda.time.Period;
    +
    +/**
    + * Implements a row set writer on top of a {@link RowSet}
    + * container.
    + */
    +
    +public class RowSetWriterImpl extends AbstractRowSetAccessor implements RowSetWriter {
    +
    +  private final AbstractColumnWriter writers[];
    +
    +  public RowSetWriterImpl(AbstractSingleRowSet recordSet, AbstractRowIndex rowIndex) {
    +    super(rowIndex, recordSet.schema().access());
    +    ValueVector[] valueVectors = recordSet.vectors();
    +    writers = new AbstractColumnWriter[valueVectors.length];
    +    int posn = 0;
    +    for (int i = 0; i < writers.length; i++) {
    +      writers[posn] = ColumnAccessorFactory.newWriter(valueVectors[i].getField().getType());
    --- End diff --
    
    we can use _"i"_ instead of _"posn"_


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r109563707
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/AbstractSingleRowSet.java ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.expr.TypeHelper;
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.physical.impl.spill.RecordBatchSizer;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.complex.MapVector;
    +import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
    +import org.apache.drill.test.rowSet.RowSetSchema.LogicalColumn;
    +import org.apache.drill.test.rowSet.RowSetSchema.PhysicalSchema;
    +
    +public abstract class AbstractSingleRowSet extends AbstractRowSet implements SingleRowSet {
    +
    +  public abstract static class StructureBuilder {
    +    protected final PhysicalSchema schema;
    +    protected final BufferAllocator allocator;
    +    protected final ValueVector[] valueVectors;
    +    protected final MapVector[] mapVectors;
    +    protected int vectorIndex;
    +    protected int mapIndex;
    +
    +    public StructureBuilder(BufferAllocator allocator, RowSetSchema schema) {
    +      this.allocator = allocator;
    +      this.schema = schema.physical();
    +      valueVectors = new ValueVector[schema.access().count()];
    +      if (schema.access().mapCount() == 0) {
    +        mapVectors = null;
    +      } else {
    +        mapVectors = new MapVector[schema.access().mapCount()];
    +      }
    +    }
    +  }
    +
    +  public static class VectorBuilder extends StructureBuilder {
    +
    +    public VectorBuilder(BufferAllocator allocator, RowSetSchema schema) {
    +      super(allocator, schema);
    +    }
    +
    +    public ValueVector[] buildContainer(VectorContainer container) {
    +      for (int i = 0; i < schema.count(); i++) {
    +        LogicalColumn colSchema = schema.column(i);
    +        @SuppressWarnings("resource")
    +        ValueVector v = TypeHelper.getNewVector(colSchema.field, allocator, null);
    +        container.add(v);
    +        if (colSchema.field.getType().getMinorType() == MinorType.MAP) {
    +          MapVector mv = (MapVector) v;
    +          mapVectors[mapIndex++] = mv;
    +          buildMap(mv, colSchema.mapSchema);
    +        } else {
    +          valueVectors[vectorIndex++] = v;
    +        }
    +      }
    +      container.buildSchema(SelectionVectorMode.NONE);
    +      return valueVectors;
    +    }
    +
    +    private void buildMap(MapVector mapVector, PhysicalSchema mapSchema) {
    +      for (int i = 0; i < mapSchema.count(); i++) {
    +        LogicalColumn colSchema = mapSchema.column(i);
    +        MajorType type = colSchema.field.getType();
    +        Class<? extends ValueVector> vectorClass = TypeHelper.getValueVectorClass(type.getMinorType(), type.getMode());
    +        @SuppressWarnings("resource")
    +        ValueVector v = mapVector.addOrGet(colSchema.field.getName(), type, vectorClass);
    +        if (type.getMinorType() == MinorType.MAP) {
    +          MapVector mv = (MapVector) v;
    +          mapVectors[mapIndex++] = mv;
    +          buildMap(mv, colSchema.mapSchema);
    +        } else {
    +          valueVectors[vectorIndex++] = v;
    +        }
    +      }
    +    }
    +  }
    +
    +  public static class VectorMapper extends StructureBuilder {
    +
    +    public VectorMapper(BufferAllocator allocator, RowSetSchema schema) {
    +      super(allocator, schema);
    +    }
    +
    +    public ValueVector[] mapContainer(VectorContainer container) {
    +      for (VectorWrapper<?> w : container) {
    +        @SuppressWarnings("resource")
    +        ValueVector v = w.getValueVector();
    +        if (v.getField().getType().getMinorType() == MinorType.MAP) {
    +          MapVector mv = (MapVector) v;
    +          mapVectors[mapIndex++] = mv;
    +          buildMap(mv);
    +        } else {
    +          valueVectors[vectorIndex++] = v;
    +        }
    +      }
    +      return valueVectors;
    +    }
    +
    +    private void buildMap(MapVector mapVector) {
    +      for (ValueVector v : mapVector) {
    +        if (v.getField().getType().getMinorType() == MinorType.MAP) {
    +          MapVector mv = (MapVector) v;
    +          mapVectors[mapIndex++] = mv;
    +          buildMap(mv);
    +        } else {
    +          valueVectors[vectorIndex++] = v;
    +        }
    +      }
    +    }
    +  }
    +
    +  protected final ValueVector[] valueVectors;
    +
    +  public AbstractSingleRowSet(BufferAllocator allocator, BatchSchema schema) {
    +    super(allocator, schema, new VectorContainer());
    +    valueVectors = new VectorBuilder(allocator, super.schema).buildContainer(container);
    +  }
    +
    +  public AbstractSingleRowSet(BufferAllocator allocator, VectorContainer container) {
    +    super(allocator, container.getSchema(), container);
    +    valueVectors = new VectorMapper(allocator, super.schema).mapContainer(container);
    +  }
    +
    +  public AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
    +    super(rowSet.allocator, rowSet.schema.batch(), rowSet.container);
    +    valueVectors = rowSet.valueVectors;
    +  }
    +
    +  @Override
    +  public ValueVector[] vectors() { return valueVectors; }
    +
    +  @Override
    +  public int getSize() {
    +    RecordBatchSizer sizer = new RecordBatchSizer(container);
    +    return sizer.actualSize();
    --- End diff --
    
    Good eyes! This bit is just a bit tricky. We create the batch (container). Then, we need to use a `RowSetWriterImpl` to populate the batch. The `done()` method of the writer sets the row count.
    
    Or, if we create a row set from an existing container, the container will already carry a row count.
    
    Still, it is a good idea to add a unit test to verify this case, which I will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110783153
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java ---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    +    }
    +
    +    public T get(String key) {
    +      Integer index = getIndex(key);
    +      if (index == -1) {
    +        return null;
    +      }
    +      return get(index);
    +    }
    +
    +    public int getIndex(String key) {
    +      Integer index = nameSpace.get(key);
    +      if (index == null) {
    +        return -1;
    +      }
    +      return index;
    +    }
    +
    +    public int count() { return columns.size(); }
    +  }
    +
    +  private static class TupleSchemaImpl implements TupleSchema {
    +
    +    private NameSpace<LogicalColumn> columns;
    +
    +    public TupleSchemaImpl(NameSpace<LogicalColumn> ns) {
    +      this.columns = ns;
    +    }
    +
    +    @Override
    +    public MaterializedField column(int index) {
    +      return logicalColumn(index).field();
    --- End diff --
    
    check for `logicalColumn` returning `null`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111867303
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java ---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    --- End diff --
    
    True. In all methods that take a column index, it is assumed that the caller knows the column count either because the caller created the schema, or asked for the column count. Since the list get method does bounds checking, saw no reason to repeat the checking in this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r108777101
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetPrinter.java ---
    @@ -0,0 +1,97 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.io.PrintStream;
    +
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.test.rowSet.RowSet.RowSetReader;
    +import org.apache.drill.test.rowSet.RowSetSchema.AccessSchema;
    +
    +public class RowSetPrinter {
    +  private RowSet rowSet;
    +
    +  public RowSetPrinter(RowSet rowSet) {
    +    this.rowSet = rowSet;
    +  }
    +
    +  public void print() {
    +    print(System.out);
    +  }
    +
    +  public void print(PrintStream out) {
    +    SelectionVectorMode selectionMode = rowSet.getIndirectionType();
    +    RowSetReader reader = rowSet.reader();
    +    int colCount = reader.width();
    +    printSchema(out, selectionMode);
    +    while (reader.next()) {
    +      printHeader(out, reader, selectionMode);
    +      for (int i = 0; i < colCount; i++) {
    +        if (i > 0) {
    +          out.print(", ");
    +        }
    +        out.print(reader.getAsString(i));
    +      }
    +      out.println();
    +    }
    +  }
    +
    +  private void printSchema(PrintStream out, SelectionVectorMode selectionMode) {
    +    out.print("#");
    +    switch (selectionMode) {
    +    case FOUR_BYTE:
    +      out.print(" (batch #, row #)");
    +      break;
    +    case TWO_BYTE:
    +      out.print(" (row #)");
    +      break;
    +    default:
    +      break;
    +    }
    +    out.print(": ");
    +    AccessSchema schema = rowSet.schema().access();
    +    for (int i = 0; i < schema.count(); i++) {
    +      if (i > 0) {
    --- End diff --
    
    How about _maps_ inside AccessSchema ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111867056
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetBuilder.java ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.test.rowSet.RowSet.RowSetWriter;
    +import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
    +
    +/**
    + * Fluent builder to quickly build up an row set (record batch)
    + * programmatically. Starting with an {@link OperatorFixture}:
    + * <pre></code>
    + * OperatorFixture fixture = ...
    + * RowSet rowSet = fixture.rowSetBuilder(batchSchema)
    + *   .addRow(10, "string", new int[] {10.3, 10.4})
    + *   ...
    + *   .build();</code></pre>
    + */
    +
    +public final class RowSetBuilder {
    +
    +  private DirectRowSet rowSet;
    +  private RowSetWriter writer;
    +  private boolean withSv2;
    +
    +  public RowSetBuilder(BufferAllocator allocator, BatchSchema schema) {
    +    this(allocator, schema, 10);
    +  }
    +
    +  public RowSetBuilder(BufferAllocator allocator, BatchSchema schema, int capacity) {
    +    rowSet = new DirectRowSet(allocator, schema);
    +    writer = rowSet.writer(capacity);
    +  }
    +
    +  public RowSetBuilder add(Object...values) {
    +    if (! writer.valid()) {
    +      throw new IllegalStateException( "Write past end of row set" );
    +    }
    +    for (int i = 0; i < values.length;  i++) {
    +      writer.set(i, values[i]);
    +    }
    +    writer.save();
    +    return this;
    --- End diff --
    
    Good catch. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r108814142
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
    @@ -0,0 +1,153 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.test.rowSet.AbstractRowSetAccessor.AbstractRowIndex;
    +import org.apache.drill.test.rowSet.AbstractRowSetAccessor.BoundedRowIndex;
    +import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
    +
    +public class DirectRowSet extends AbstractSingleRowSet implements ExtendableRowSet {
    +
    +  private static class DirectRowIndex extends BoundedRowIndex {
    +
    +    public DirectRowIndex(int rowCount) {
    +      super(rowCount);
    +    }
    +
    +    @Override
    +    public int index() { return rowIndex; }
    +
    +    @Override
    +    public int batch() { return 0; }
    --- End diff --
    
    Can be moved to AbstractRowIndex class to return 0 by default. Override by derived class like HyperRowIndex to have different implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110794477
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetBuilder.java ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.test.rowSet.RowSet.RowSetWriter;
    +import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
    +
    +/**
    + * Fluent builder to quickly build up an row set (record batch)
    + * programmatically. Starting with an {@link OperatorFixture}:
    + * <pre></code>
    + * OperatorFixture fixture = ...
    + * RowSet rowSet = fixture.rowSetBuilder(batchSchema)
    + *   .addRow(10, "string", new int[] {10.3, 10.4})
    + *   ...
    + *   .build();</code></pre>
    + */
    +
    +public final class RowSetBuilder {
    +
    +  private DirectRowSet rowSet;
    +  private RowSetWriter writer;
    +  private boolean withSv2;
    +
    +  public RowSetBuilder(BufferAllocator allocator, BatchSchema schema) {
    +    this(allocator, schema, 10);
    +  }
    +
    +  public RowSetBuilder(BufferAllocator allocator, BatchSchema schema, int capacity) {
    +    rowSet = new DirectRowSet(allocator, schema);
    +    writer = rowSet.writer(capacity);
    +  }
    +
    +  public RowSetBuilder add(Object...values) {
    +    if (! writer.valid()) {
    +      throw new IllegalStateException( "Write past end of row set" );
    +    }
    +    for (int i = 0; i < values.length;  i++) {
    +      writer.set(i, values[i]);
    +    }
    +    writer.save();
    +    return this;
    --- End diff --
    
    For line 52-58. You can just call `writer.setRow(values)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110781500
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java ---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    +    }
    +
    +    public T get(String key) {
    +      Integer index = getIndex(key);
    --- End diff --
    
    `int` rather than `Integer`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r110783456
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java ---
    @@ -0,0 +1,252 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.vector.accessor.TupleAccessor.TupleSchema;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Row set schema presented as a number of distinct "views" for various
    + * purposes:
    + * <ul>
    + * <li>Batch schema: the schema used by a VectorContainer.</li>
    + * <li>Physical schema: the schema expressed as a hierarchy of
    + * tuples with the top tuple representing the row, nested tuples
    + * representing maps.</li>
    + * <li>Access schema: a flattened schema with all scalar columns
    + * at the top level, and with map columns pulled out into a separate
    + * collection. The flattened-scalar view is the one used to write to,
    + * and read from, the row set.</li>
    + * </ul>
    + * Allows easy creation of multiple row sets from the same schema.
    + * Each schema is immutable, which is fine for tests in which we
    + * want known inputs and outputs.
    + */
    +
    +public class RowSetSchema {
    +
    +  /**
    +   * Logical description of a column. A logical column is a
    +   * materialized field. For maps, also includes a logical schema
    +   * of the map.
    +   */
    +
    +  public static class LogicalColumn {
    +    protected final String fullName;
    +    protected final int accessIndex;
    +    protected int flatIndex;
    +    protected final MaterializedField field;
    +
    +    /**
    +     * Schema of the map. Includes only those fields directly within
    +     * the map; does not include fields from nested tuples.
    +     */
    +
    +    protected PhysicalSchema mapSchema;
    +
    +    public LogicalColumn(String fullName, int accessIndex, MaterializedField field) {
    +      this.fullName = fullName;
    +      this.accessIndex = accessIndex;
    +      this.field = field;
    +    }
    +
    +    private void updateStructure(int index, PhysicalSchema children) {
    +      flatIndex = index;
    +      mapSchema = children;
    +    }
    +
    +    public int accessIndex() { return accessIndex; }
    +    public int flatIndex() { return flatIndex; }
    +    public boolean isMap() { return mapSchema != null; }
    +    public PhysicalSchema mapSchema() { return mapSchema; }
    +    public MaterializedField field() { return field; }
    +    public String fullName() { return fullName; }
    +  }
    +
    +  /**
    +   * Implementation of a tuple name space. Tuples allow both indexed and
    +   * named access to their members.
    +   *
    +   * @param <T> the type of object representing each column
    +   */
    +
    +  public static class NameSpace<T> {
    +    private final Map<String,Integer> nameSpace = new HashMap<>();
    +    private final List<T> columns = new ArrayList<>();
    +
    +    public int add(String key, T value) {
    +      int index = columns.size();
    +      nameSpace.put(key, index);
    +      columns.add(value);
    +      return index;
    +    }
    +
    +    public T get(int index) {
    +      return columns.get(index);
    +    }
    +
    +    public T get(String key) {
    +      Integer index = getIndex(key);
    +      if (index == -1) {
    +        return null;
    +      }
    +      return get(index);
    +    }
    +
    +    public int getIndex(String key) {
    +      Integer index = nameSpace.get(key);
    +      if (index == null) {
    +        return -1;
    +      }
    +      return index;
    +    }
    +
    +    public int count() { return columns.size(); }
    +  }
    +
    +  private static class TupleSchemaImpl implements TupleSchema {
    +
    +    private NameSpace<LogicalColumn> columns;
    +
    +    public TupleSchemaImpl(NameSpace<LogicalColumn> ns) {
    +      this.columns = ns;
    --- End diff --
    
    what if input `Namspace<LogicalColumn>` is `null` ? We are not checking anywhere in accessor functions below and can throw NPE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r108761814
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.common.types.TypeProtos.MajorType;
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.MaterializedField;
    +
    +/**
    + * Builder of a row set schema expressed as a list of materialized
    + * fields. Optimized for use when creating schemas by hand in tests.
    + * <p>
    + * Example usage to create the following schema: <br>
    + * <tt>(c: INT, a: MAP(c: VARCHAR, d: INT, e: MAP(f: VARCHAR), g: INT), h: BIGINT)</tt>
    --- End diff --
    
    ... a: MAP(_b: VARCHAR_ .. instead of _c:VARCHAR_



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #785: DRILL-5323: Test tools for row sets

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/785#discussion_r111866420
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/HyperRowSetImpl.java ---
    @@ -0,0 +1,221 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.test.rowSet;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.MinorType;
    +import org.apache.drill.exec.memory.BufferAllocator;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.HyperVectorWrapper;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.accessor.AccessorUtilities;
    +import org.apache.drill.exec.vector.complex.AbstractMapVector;
    +import org.apache.drill.test.rowSet.AbstractRowSetAccessor.BoundedRowIndex;
    +import org.apache.drill.test.rowSet.RowSet.HyperRowSet;
    +import org.apache.drill.test.rowSet.RowSetSchema.LogicalColumn;
    +import org.apache.drill.test.rowSet.RowSetSchema.PhysicalSchema;
    +
    +public class HyperRowSetImpl extends AbstractRowSet implements HyperRowSet {
    +
    +  public static class HyperRowIndex extends BoundedRowIndex {
    +
    +    private final SelectionVector4 sv4;
    +
    +    public HyperRowIndex(SelectionVector4 sv4) {
    +      super(sv4.getCount());
    +      this.sv4 = sv4;
    +    }
    +
    +    @Override
    +    public int index() {
    +      return AccessorUtilities.sv4Index(sv4.get(rowIndex));
    +    }
    +
    +    @Override
    +    public int batch( ) {
    +      return AccessorUtilities.sv4Batch(sv4.get(rowIndex));
    +    }
    +  }
    +
    +  /**
    +   * Build a hyper row set by restructuring a hyper vector bundle into a uniform
    +   * shape. Consider this schema: <pre><code>
    +   * { a: 10, b: { c: 20, d: { e: 30 } } }</code></pre>
    +   * <p>
    +   * The hyper container, with two batches, has this structure:
    +   * <table border="1">
    +   * <tr><th>Batch</th><th>a</th><th>b</th></tr>
    +   * <tr><td>0</td><td>Int vector</td><td>Map Vector(Int vector, Map Vector(Int vector))</td></th>
    +   * <tr><td>1</td><td>Int vector</td><td>Map Vector(Int vector, Map Vector(Int vector))</td></th>
    +   * </table>
    +   * <p>
    +   * The above table shows that top-level scalar vectors (such as the Int Vector for column
    +   * a) appear "end-to-end" as a hyper-vector. Maps also appear end-to-end. But, the
    +   * contents of the map (column c) do not appear end-to-end. Instead, they appear as
    +   * contents in the map vector. To get to c, one indexes into the map vector, steps inside
    +   * the map to find c and indexes to the right row.
    +   * <p>
    +   * Similarly, the maps for d do not appear end-to-end, one must step to the right batch
    +   * in b, then step to d.
    +   * <p>
    +   * Finally, to get to e, one must step
    +   * into the hyper vector for b, then steps to the proper batch, steps to d, step to e
    +   * and finally step to the row within e. This is a very complex, costly indexing scheme
    +   * that differs depending on map nesting depth.
    +   * <p>
    +   * To simplify access, this class restructures the maps to flatten the scalar vectors
    +   * into end-to-end hyper vectors. For example, for the above:
    +   * <p>
    +   * <table border="1">
    +   * <tr><th>Batch</th><th>a</th><th>c</th><th>d</th></tr>
    +   * <tr><td>0</td><td>Int vector</td><td>Int vector</td><td>Int vector</td></th>
    +   * <tr><td>1</td><td>Int vector</td><td>Int vector</td><td>Int vector</td></th>
    +   * </table>
    +   *
    +   * The maps are still available as hyper vectors, but separated into map fields.
    +   * (Scalar access no longer needs to access the maps.) The result is a uniform
    +   * addressing scheme for both top-level and nested vectors.
    +   */
    +
    +  public static class HyperVectorBuilder {
    +
    +    protected final HyperVectorWrapper<?> valueVectors[];
    +    protected final HyperVectorWrapper<AbstractMapVector> mapVectors[];
    +    private final List<ValueVector> nestedScalars[];
    +    private int vectorIndex;
    +    private int mapIndex;
    +    private final PhysicalSchema physicalSchema;
    +
    +    @SuppressWarnings("unchecked")
    +    public HyperVectorBuilder(RowSetSchema schema) {
    +      physicalSchema = schema.physical();
    +      valueVectors = new HyperVectorWrapper<?>[schema.access().count()];
    +      if (schema.access().mapCount() == 0) {
    +        mapVectors = null;
    +        nestedScalars = null;
    +      } else {
    +        mapVectors = (HyperVectorWrapper<AbstractMapVector>[])
    +            new HyperVectorWrapper<?>[schema.access().mapCount()];
    +        nestedScalars = new ArrayList[schema.access().count()];
    +      }
    +    }
    +
    +    @SuppressWarnings("unchecked")
    +    public HyperVectorWrapper<ValueVector>[] mapContainer(VectorContainer container) {
    +      int i = 0;
    +      for (VectorWrapper<?> w : container) {
    +        HyperVectorWrapper<?> hvw = (HyperVectorWrapper<?>) w;
    +        if (w.getField().getType().getMinorType() == MinorType.MAP) {
    +          HyperVectorWrapper<AbstractMapVector> mw = (HyperVectorWrapper<AbstractMapVector>) hvw;
    +          mapVectors[mapIndex++] = mw;
    +          buildHyperMap(physicalSchema.column(i).mapSchema(), mw);
    --- End diff --
    
    Also, we assume that the caller either knows the number of columns (because the caller created the schema), or the caller checked the column count. Accessing a column out of range will throw an exception somewhere; there did not seem to be a burning need to add an extra check on top of those provided "naturally."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---