You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by ilooner <gi...@git.apache.org> on 2017/10/11 04:27:52 UTC

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

GitHub user ilooner opened a pull request:

    https://github.com/apache/drill/pull/984

    DRILL-5783 Made a unit test for generated Priority Queue. DRILL-5841 Fix tmp folder errors.

    ## DRILL-5783
    
    - A unit test is created for the priority queue in the TopN operator
    - The code generation classes passed around a completely unused function registry reference in some places so I removed it.
    - The priority queue had unused parameters for some of its methods so I removed them.
    - I added an add method to value vectors so that value vectors can easily be generated in unit tests.
    
    ## DRILL-5841
    
    - There were many many ways in which temporary folders were created in unit tests. I have unified the way these folders are created with the DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests have been updated to use these. The test watchers create temp directories in ./target/<fully qualified test class name>/<method name>. So all the files generated and used in the context of a test can easily be found in the same consistent location.
    - This change should fix the sporadic hashagg test failures, as well as failures caused by stray files in /tmp
    
    Misc
    
    - Added a TableBuilder so we don't have to becom human json parsers to generate test data.
    - A Pstore test could fail sporadically
    - General code cleanup. 
    - There are many places where String.format is used unnecessarily. The test builder methods already use String.format for you when you pass them args. I cleaned some of these up.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ilooner/drill DRILL-5783

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/984.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #984
    
----
commit c9c554340a4a2ca8bab257a478d8201cd5f4c92a
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-18T23:21:44Z

     - Added property for codegen dump.
     - Removed unnecessary FunctionRegistry argument to the code generator.

commit 68f77a2e9a904bcd5a808fefd824f2f108134aec
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-18T23:31:38Z

     - Removed unused imports

commit eff908b4b18fe5ccb67480f6e0c36a61e27d12e2
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-18T23:34:33Z

     - Removed unnecessary modifiers from the priority queue interface.

commit 319fae58faaf78f807d8d6f9c0928f801ec7e352
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-19T00:28:58Z

     - Made creating a priority queue more modular by refactoring the createNewPriorityQueue method

commit e82e316a628e3c2f3a1f9ef7005299065eb4379d
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-19T21:08:19Z

     - Have some codegen dump test code done

commit e2057d6c9abc301889ab46a12848cee7506f45d5
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-21T22:07:52Z

     - Made fix

commit 6289cff376f37c8d4e2984b309d7f6eb13703f2f
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-28T23:24:38Z

     - Removed unused imports
     - Removed commented out code
     - Formatting fixes

commit cb80f885c86a00cda82e114b3aec212890ce6d90
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-29T23:03:49Z

     - Added the option to the DirTestWatcher to not delete the tmp dir at the end of the test
     - Added a TableFileBuilder to be used in testing
     - Changed the External sort test to use the TableFileBuilder

commit bb58b21ad16d6adbeb11c66d5a38d11c20a0b05d
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-29T23:06:15Z

     - Delete the directory at the beginning of a test in DirTestWatcher

commit f0d89abf2dc3422081bcd49183b8bf0ae3742996
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-09-30T01:51:13Z

     - Used existing test framework for the TopNBatchTest

commit 7f8d978c12d0e2b2efd3b3c64f5450ca76f0ef0e
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-05T06:27:55Z

     - Removed passing around contexts unnecessarily in the TopNBatch operator.
     - Fixed some javadocs

commit c539ebd20c421d8672440ad3bb863c446f39e775
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-06T03:22:04Z

    - Added a record batch builder, and added an add method to value vector mutators.

commit 89a5f8576bd2f50ce540b633a28ace55b31f08e6
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-06T05:46:48Z

     - Added Javadoc to the PriorityQueue.
     - Removed an unnecessary method
     - Got the unit test to work

commit 540715d250601590367d7bc71614ab52e891c6ec
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-06T10:27:46Z

     - Added utilities for comparing vector batch results
     - Removed redundant unit tests

commit 9b0c557ec0c6030969dae620c5954e5dfb2119e9
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-06T10:30:40Z

     - Reduced the amount of data generated for the topN unit test.

commit 7ed3dcf7bf0ce807b81e72a17d91d909ac788415
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-06T16:14:21Z

     - Moved comparator function in DrillTestWrapper into a comparator class
     - Made the TopN unit test work

commit 3e9d8905699a56c36cbdbf253daf374ae0983e65
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-06T16:20:03Z

     - Fixed missing license

commit 55e35ab8a84b9eb32fdf1b674a23a3f45e1e9e55
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-06T16:31:07Z

     - Fixed checkstyle errors

commit 0c310225d47a78a35c9cafd70869b7520edfa17c
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-07T04:00:00Z

     - Fixed javadoc errors in FixtureBuilder

commit 133a44b88355de031695cf8d9c3503618219e4fe
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-07T04:10:32Z

     - Cleaning up unused variables and methods in FixtureBuilder

commit a685b5718f2c25d1797d378bdf35be9a88559048
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-07T04:34:33Z

     - Fixed errors after rebase

commit b920eb3ed4e29cf72a74ceaf43bc4c418f4dcefa
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-07T09:55:10Z

     - Used DirTestWatcher to create temporary directories in the ClusterFixture
     - Fixed broken unit tests

commit 96defcfcee6b602df0051e7ec8516ef627a44388
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-08T00:57:32Z

     - Made the RecordBatchBuilder and BatchUtils use the same table format as the DrillTestWrapper. Also added a more detailed comparison function for tables.

commit 8812043bfed74c6f0f6db1a23c0d029b4fd29222
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-08T00:59:40Z

     - Moved files in testutils package to test package

commit 1b1dada9c712c9283b4765cb7142af967ba5dd24
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-08T01:02:10Z

     - Moved more testing utilities into the org.apache.drill.test package

commit 5878073cf0ebb4d28a6da6fb4594ea52576d5b1d
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-08T01:05:35Z

     - Fixed broken javadoc links

commit d9ac7d485b6e580c7becf6caa8dc96ebe20465ed
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-08T01:08:27Z

     - Moved unit tests out of org.apache.drill.test since that package should only contain testing utilities

commit fe33936818d1b170651e6ea36b560d06af0033e7
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-10T09:19:24Z

     - Made all the tests use a uniform tmp directory system
     - Fixed sporadic failures due to stale files being present on build machines
     - Use an isolated workspace for each test not global /tmp folder
     - Fixed test failures due to stray files in /tmp

commit dd8a2c230008739f20432389cb61515c6459eb05
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-10T11:07:41Z

     - Fixed broken tests

commit 99115492fd621403d0862e2ff8b650479a5935e5
Author: Timothy Farkas <ti...@apache.org>
Date:   2017-10-10T13:07:03Z

     - Fixed more testing bugs

----


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144194940
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final OptionSet optionManager)
       public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + ".disable_cache";
     
       /**
    +   * Enables saving generated code for debugging
    +   */
    +  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + ".codegen.dump";
    --- End diff --
    
    This is the name of the option which holds the true/false flag to enable saving code for debugging. The destination directory is governed by the ClassBuilder.CODE_DIR_OPTION property.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145516858
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.junit.Assert;
    +
    +import java.io.UnsupportedEncodingException;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class BatchUtils {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
    +
    +  public static Map<String, List<Object>> containerToObjects(VectorContainer vectorContainer) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int recordCount = vectorContainer.getRecordCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      ValueVector.Accessor valueVectorAccessor = vectorContainer
    +        .getValueVector(columnIndex)
    +        .getValueVector()
    +        .getAccessor();
    +
    +      for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
    +        data.add(valueVectorAccessor.getObject(recordIndex));
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static Map<String, List<Object>> hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, SelectionVector4 selectionVector4) {
    --- End diff --
    
    See `HyperRowSet`


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147255211
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
    @@ -19,24 +19,33 @@
     
     import org.apache.drill.categories.OperatorTest;
     import org.apache.drill.categories.SqlTest;
    +import org.apache.drill.test.BaseTestQuery;
    +import org.junit.BeforeClass;
     import org.junit.Test;
     import org.junit.experimental.categories.Category;
     
     @Category({SqlTest.class, OperatorTest.class})
    -public class TestAltSortQueries extends BaseTestQuery{
    +public class TestAltSortQueries extends BaseTestQuery {
       static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
     
    +  @BeforeClass
    +  public static void setupTestFiles() {
    +    dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
    +    dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
    +    dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
    +  }
    +
       @Test
       public void testOrderBy() throws Exception{
         test("select R_REGIONKEY " +
    -         "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` " +
    +         "from dfs_test.`/sample-data/region.parquet` " +
    --- End diff --
    
    Updated package-info.java and ExampleTest.java



---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/984
  
    @arina-ielchiieva I can put **DRILL-5783** in a separate PR **DRILL-5841** and **DRILL-5894** are however tightly coupled and must be in the same PR. Unfortunately **DRILL-5841** and **DRILL-5894** touched many files because all the unit tests were updated so the PR will still remain large. 


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147008923
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestCTASPartitionFilter.java ---
    @@ -59,48 +58,48 @@ public void withDistribution() throws Exception {
         test("alter session set `planner.slice_target` = 1");
         test("alter session set `store.partition.hash_distribute` = true");
         test("use dfs_test.tmp");
    -    test(String.format("create table orders_distribution partition by (o_orderpriority) as select * from dfs_test.`%s/multilevel/parquet`", TEST_RES_PATH));
    +    test("create table orders_distribution partition by (o_orderpriority) as select * from dfs_test.`/multilevel/parquet`");
         String query = "select * from orders_distribution where o_orderpriority = '1-URGENT'";
    -    testExcludeFilter(query, 1, "Filter", 24);
    +    testExcludeFilter(query, 1, "Filter\\(", 24);
    --- End diff --
    
    It is no longer sufficient to match "Filter" because the test class name contains "Filter" and the test class name is used to create the tmp directory. And the fully qualified path of a queried file is included in the plan. We want to only match the <b>Filter</b> steps generated in the plan, not the <b>Filters</b> in our file paths. In order to do this I tell it to match "Filter(" which corresponds to a filter step in the plan.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144194617
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    +   * The elements in the given batch are added to the priority queue. Note that the priority queue
    +   * only retains the top elements that fit within the size specified by the {@link #init(int, BufferAllocator, boolean)}
    +   * method.
    +   * @param batch The batch containing elements we want to add.
    +   * @throws SchemaChangeException
    +   */
    +  void add(RecordBatchData batch) throws SchemaChangeException;
     
    +  /**
    +   * Initializes the priority queue. This method must be called before any other methods on the priority
    +   * queue are called.
    +   * @param limit The size of the priority queue.
    +   * @param allocator The {@link BufferAllocator} to use when creating the priority queue.
    +   * @param hasSv2 True when incoming batches have v2 selection vectors. False otherwise.
    +   * @throws SchemaChangeException
    +   */
    +  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    +
    +  /**
    +   * This method must be called before fetching the final heap hyper batch and final Sv4 vector.
    --- End diff --
    
    Old habit, I'm used to PriorityQueues being called Heaps. I agree it's confusing in this context I will fix it.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145522320
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ExampleTest.java ---
    @@ -59,6 +59,9 @@
     @Ignore
     public class ExampleTest {
     
    +  @Rule
    +  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
    --- End diff --
    
    This is meant as an example, so perhaps include some Javadoc to explain what this is and how to use it. Maybe explain how to use the temp directories, and include a new example test below that illustrates the main points. People are busy, they don't often don't have time to read documentation, but they often will copy & paste a good example...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145294064
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
    @@ -19,24 +19,33 @@
     
     import org.apache.drill.categories.OperatorTest;
     import org.apache.drill.categories.SqlTest;
    +import org.apache.drill.test.BaseTestQuery;
    +import org.junit.BeforeClass;
     import org.junit.Test;
     import org.junit.experimental.categories.Category;
     
     @Category({SqlTest.class, OperatorTest.class})
    -public class TestAltSortQueries extends BaseTestQuery{
    +public class TestAltSortQueries extends BaseTestQuery {
       static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
     
    +  @BeforeClass
    +  public static void setupTestFiles() {
    +    dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
    +    dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
    +    dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
    +  }
    +
       @Test
       public void testOrderBy() throws Exception{
         test("select R_REGIONKEY " +
    -         "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` " +
    +         "from dfs_test.`/sample-data/region.parquet` " +
    --- End diff --
    
    I like this improvement. Here we created files just for this one test. You are using a workspace to point to those files. Do so avoids the need to doctor up the strings.
    
    Is `dfs_test` setup to point to the per-query root directory? Seems to be.
    
    Since I have to ask these questions, would be good to explain in the `...drill.test` `package-info.java` file how to set up temporary files and the mapping of workspaces to these directories. That will allow others to readily use the mechanism. Otherwise, people will guess, and will go back to using what is familiar. See `ExampleTest.java` for an example of a "starter" file to help people get started.
    
    In fact, maybe we would add a new example test to that file that shows how to create a test-specific data file and query that file. (That would sure help me...)


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147261203
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
    @@ -226,43 +217,16 @@ private void createConfig(FixtureBuilder builder) throws Exception {
     
           serviceSet = null;
           usesZk = true;
    -      isLocal = false;
         } else {
           // Embedded Drillbit.
     
           serviceSet = RemoteServiceSet.getLocalServiceSet();
    -      isLocal = true;
         }
       }
     
    -  private void startDrillbits(FixtureBuilder builder) throws Exception {
    -//    // Ensure that Drill uses the log directory determined here rather than
    -//    // it's hard-coded defaults. WIP: seems to be needed some times but
    -//    // not others.
    -//
    -//    String logDir = null;
    -//    if (builder.tempDir != null) {
    -//      logDir = builder.tempDir.getAbsolutePath();
    -//    }
    -//    if (logDir == null) {
    -//      logDir = config.getString(ExecConstants.DRILL_TMP_DIR);
    -//      if (logDir != null) {
    -//        logDir += "/drill/log";
    -//      }
    -//    }
    -//    if (logDir == null) {
    -//      logDir = "/tmp/drill";
    -//    }
    -//    new File(logDir).mkdirs();
    -//    System.setProperty("drill.log-dir", logDir);
    -
    -    dfsTestTempDir = makeTempDir("dfs-test");
    -
    -    // Clean up any files that may have been left from the
    -    // last run.
    -
    -    preserveLocalFiles = builder.preserveLocalFiles;
    -    removeLocalFiles();
    +  private void startDrillbits() throws Exception {
    +    dfsTestTempDir = new File(getRootDir(), "dfs-test");
    +    dfsTestTempDir.mkdirs();
    --- End diff --
    
    This is possible through the BaseDirTestWatcher. It is configured through the BaseDireTestWatcher's constructor. If you pass it false it will not delete your directories at the end of each test.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145505524
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java ---
    @@ -76,43 +78,50 @@
       //        - detecting corrupt values must be deferred to actual data page reading
       //    - one from 1.4, where there is a proper created-by, but the corruption is present
       private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
    +      "/parquet/4203_corrupt_dates/mixed_drill_versions";
       // partitioned with 1.2.0, no certain metadata that these were written with Drill
       // the value will be checked to see that they look corrupt and they will be corrected
       // by default. Users can use the format plugin option autoCorrectCorruptDates to disable
       // this behavior if they have foreign parquet files with valid rare date values that are
       // in the similar range as Drill's corrupt values
    +  private static final String PARTITIONED_1_2_FOLDER = "partitioned_with_corruption_4203_1_2";
       private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
    +      Paths.get("/parquet/4203_corrupt_dates", PARTITIONED_1_2_FOLDER).toString();
    --- End diff --
    
    Again, do we want an absolute-looking path when we actually have a relative path?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145293197
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
    @@ -61,10 +57,8 @@ public void testHashAggNullableColumns() throws Exception {
     
       @Test  // StreamingAgg on nullable columns
       public void testStreamAggNullableColumns() throws Exception {
    -    String query1 = String.format("select t2.b2 from dfs_test.`%s/jsoninput/nullable2.json` t2 " +
    -                    " group by t2.b2", TEST_RES_PATH);
    -    String query2 = String.format("select t2.a2, t2.b2 from dfs_test.`%s/jsoninput/nullable2.json` t2 " +
    -        " group by t2.a2, t2.b2", TEST_RES_PATH);
    +    String query1 = "select t2.b2 from cp.`/jsoninput/nullable2.json` t2 group by t2.b2";
    +    String query2 = "select t2.a2, t2.b2 from cp.`/jsoninput/nullable2.json` t2 group by t2.a2, t2.b2";
    --- End diff --
    
    Very nice improvement; this has been bugging me for months. Why doctor up the query string when we have perfectly fine workspace mechanism.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144945381
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    +   * The elements in the given batch are added to the priority queue. Note that the priority queue
    +   * only retains the top elements that fit within the size specified by the {@link #init(int, BufferAllocator, boolean)}
    +   * method.
    +   * @param batch The batch containing elements we want to add.
    +   * @throws SchemaChangeException
    +   */
    +  void add(RecordBatchData batch) throws SchemaChangeException;
     
    +  /**
    +   * Initializes the priority queue. This method must be called before any other methods on the priority
    +   * queue are called.
    +   * @param limit The size of the priority queue.
    +   * @param allocator The {@link BufferAllocator} to use when creating the priority queue.
    +   * @param hasSv2 True when incoming batches have v2 selection vectors. False otherwise.
    +   * @throws SchemaChangeException
    +   */
    +  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    +
    +  /**
    +   * This method must be called before fetching the final heap hyper batch and final Sv4 vector.
    +   * @throws SchemaChangeException
    +   */
    +  void generate() throws SchemaChangeException;
    +
    +  /**
    +   * Retrieves the final priority queue HyperBatch containing the results. <b>Note:</b> this should be called
    +   * after {@link #generate()}.
    +   * @return The final priority queue HyperBatch containing the results.
    +   */
    +  VectorContainer getHyperBatch();
    +
    +  SelectionVector4 getSv4();
    +
    +  /**
    +   * Retrieves the selection vector used to select the elements in the priority queue from the hyper batch
    +   * provided by the {@link #getHyperBatch()} method. <b>Note:</b> this should be called after {@link #generate()}.
    +   * @return The selection vector used to select the elements in the priority queue.
    +   */
    +  SelectionVector4 getFinalSv4();
    --- End diff --
    
    Taking a step back... Suppose I do a top 100. Also, suppose my batches are big, say 100 MB each. Also, suppose I have fiendishly organized by data so that the top 100 values are the first values of 100 batches.
    
    If the priority queue uses an SV4, then it means that the queue holds onto the entire batch that holds the set of top values. In my case, since the top 100 values occur across 100 batches, I'm holding onto 100 of batches of 100 MB each for a total of 10 GB of memory.
    
    Is this how it works?
    
    Do we need to think about compaction at some point? Once we have buffered, say, five batches, we copy the values to a new, much smaller, batch and discard the inputs. We repeat this over and over to keep the number of buffered batches under control.
    
    Once we control batch size (in bytes; we already control records), we'll have the means to set a memory budget for TopN, then use consolidation to stay within that budget.
    
    Or, is this how the code already works?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145827440
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
    @@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
             "  nations.N_NAME,\n" +
             "  regions.R_NAME\n" +
             "FROM\n" +
    -        "  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` nations\n" +
    +        "  dfs.`/sample-data/nation.parquet` nations\n" +
    --- End diff --
    
    I accidentally changed it to dfs, will switch it back to dfs_test for consistency. <b>dfs</b> and <b>dfs_test>/b> are configured to point to the same temp directories so they are in fact interchangeable. This configuration happens in the openClient method of BaseTestQuery and in the configureStoragePlugins method of the ClusterFixture. 
    
    Throughout making this change I was scratching my head as to why <b>dfs_test</b> is used instead of <b>dfs<b>. After going through the code I don't see any reason as to why dfs_test was created in the first place. I think we should actually remove <b>dfs_test</b> and just use <b>dfs</b> everywhere for uniformity. I'll go ahead and do that now.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145506779
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java ---
    @@ -377,21 +386,21 @@ public void testReadOldMetadataCacheFileOverrideCorrection() throws Exception {
     
       @Test
       public void testReadNewMetadataCacheFileOverOldAndNewFiles() throws Exception {
    -    String table = format("dfs.`%s`", new Path(getDfsTestTmpSchemaLocation(), MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER));
    -    copyMetaDataCacheToTempReplacingInternalPaths(
    +    File meta = dirTestWatcher.copyResourceToRoot(
             "parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt",
    -        MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, Metadata.METADATA_FILENAME);
    +        Paths.get(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, Metadata.METADATA_FILENAME).toString());
    --- End diff --
    
    Seems awkward. We use a path to concatenate strings, then convert back to a string, then to a file. Further, for some crazy reason, Hadoop also has a `Path` class that many tests reference.
    
    Can we instead do:
    
    `File meta = ...toRoot(new File(new File(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER), Metadata.METADATA_FILENAME));
    ```
    
    Or, better, since seem to use this pattern over and over:
    
    ```
    File meta = ...toRoot(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, Metadata.METADATA_FILENAME);
    ```
    
    That is, the `copyResourceToRoot` has a form that takes a variable number of arguments to build up the target path. Each should be assumed to be relative "a/b", "c", "d.csv", say.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r150096444
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java ---
    @@ -0,0 +1,159 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test.rowSet.file;
    +
    +import com.google.common.base.Preconditions;
    +import com.google.common.collect.ImmutableMap;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.MaterializedField;
    +import org.apache.drill.exec.vector.accessor.ColumnAccessor;
    +import org.apache.drill.exec.vector.accessor.ColumnReader;
    +import org.apache.drill.test.rowSet.RowSet;
    +
    +import java.io.BufferedOutputStream;
    +import java.io.File;
    +import java.io.FileOutputStream;
    +import java.io.IOException;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class JsonFileBuilder
    +{
    +  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
    +  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
    +  public static final String DEFAULT_LONG_FORMATTER = "%d";
    +  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
    +  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
    +  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
    +
    +  public static final Map<String, String> DEFAULT_FORMATTERS = new ImmutableMap.Builder()
    +    .put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
    +    .put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
    +    .put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
    +    .put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
    +    .put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
    +    .put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
    +    .build();
    +
    +  private final RowSet rowSet;
    +  private final Map<String, String> customFormatters = Maps.newHashMap();
    +
    +  public JsonFileBuilder(RowSet rowSet) {
    +    this.rowSet = Preconditions.checkNotNull(rowSet);
    +    Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset is empty.");
    +  }
    +
    +  public JsonFileBuilder setCustomFormatter(final String columnName, final String columnFormatter) {
    +    Preconditions.checkNotNull(columnName);
    +    Preconditions.checkNotNull(columnFormatter);
    +
    +    Iterator<MaterializedField> fields = rowSet
    +      .schema()
    +      .batch()
    +      .iterator();
    +
    +    boolean hasColumn = false;
    +
    +    while (!hasColumn && fields.hasNext()) {
    +      hasColumn = fields.next()
    +        .getName()
    +        .equals(columnName);
    +    }
    +
    +    final String message = String.format("(%s) is not a valid column", columnName);
    +    Preconditions.checkArgument(hasColumn, message);
    +
    +    customFormatters.put(columnName, columnFormatter);
    +
    +    return this;
    +  }
    +
    +  public void build(File tableFile) throws IOException {
    --- End diff --
    
    Sounds Good


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144935089
  
    --- Diff: common/src/main/java/org/apache/drill/common/util/TestTools.java ---
    @@ -17,15 +17,28 @@
      */
     package org.apache.drill.common.util;
     
    +import java.io.File;
    +import java.io.IOException;
     import java.nio.file.Paths;
     
    +import org.apache.commons.io.FileUtils;
     import org.junit.rules.TestName;
     import org.junit.rules.TestRule;
     import org.junit.rules.Timeout;
     
    +import static org.apache.drill.common.util.TestTools.DataType.PROJECT;
    +import static org.apache.drill.common.util.TestTools.DataType.RESOURCE;
    +
     public class TestTools {
       // private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestTools.class);
     
    +  public enum DataType {
    --- End diff --
    
    `ResourceType`? `FileType`? `FileSource`? `ResourceScope`?
    
    This is a symbolic reference to a resource root, not really a kind of a data item...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144944018
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    +   * The elements in the given batch are added to the priority queue. Note that the priority queue
    +   * only retains the top elements that fit within the size specified by the {@link #init(int, BufferAllocator, boolean)}
    +   * method.
    +   * @param batch The batch containing elements we want to add.
    +   * @throws SchemaChangeException
    +   */
    +  void add(RecordBatchData batch) throws SchemaChangeException;
     
    +  /**
    +   * Initializes the priority queue. This method must be called before any other methods on the priority
    +   * queue are called.
    +   * @param limit The size of the priority queue.
    +   * @param allocator The {@link BufferAllocator} to use when creating the priority queue.
    +   * @param hasSv2 True when incoming batches have v2 selection vectors. False otherwise.
    +   * @throws SchemaChangeException
    +   */
    +  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    +
    +  /**
    +   * This method must be called before fetching the final heap hyper batch and final Sv4 vector.
    --- End diff --
    
    Ah, I see. Thanks.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147229627
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java ---
    @@ -76,43 +78,50 @@
       //        - detecting corrupt values must be deferred to actual data page reading
       //    - one from 1.4, where there is a proper created-by, but the corruption is present
       private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
    +      "/parquet/4203_corrupt_dates/mixed_drill_versions";
       // partitioned with 1.2.0, no certain metadata that these were written with Drill
       // the value will be checked to see that they look corrupt and they will be corrected
       // by default. Users can use the format plugin option autoCorrectCorruptDates to disable
       // this behavior if they have foreign parquet files with valid rare date values that are
       // in the similar range as Drill's corrupt values
    +  private static final String PARTITIONED_1_2_FOLDER = "partitioned_with_corruption_4203_1_2";
       private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
    +      Paths.get("/parquet/4203_corrupt_dates", PARTITIONED_1_2_FOLDER).toString();
    --- End diff --
    
    Updated all paths to be relative in the tests


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r148157140
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.junit.Assert;
    +
    +import java.io.UnsupportedEncodingException;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class BatchUtils {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
    +
    +  public static Map<String, List<Object>> containerToObjects(VectorContainer vectorContainer) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int recordCount = vectorContainer.getRecordCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      ValueVector.Accessor valueVectorAccessor = vectorContainer
    +        .getValueVector(columnIndex)
    +        .getValueVector()
    +        .getAccessor();
    +
    +      for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
    +        data.add(valueVectorAccessor.getObject(recordIndex));
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static Map<String, List<Object>> hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, SelectionVector4 selectionVector4) {
    --- End diff --
    
    Done


---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/984
  
    Resolved conflicts.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144131729
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final OptionSet optionManager)
       public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + ".disable_cache";
     
       /**
    +   * Enables saving generated code for debugging
    +   */
    +  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + ".codegen.dump";
    --- End diff --
    
    When saving generated code, I have to point to the directory from Eclipse. This means two things:
    
    1. It works best if the same directory is used from one run to the next. (That is, no randomly generated temp directory.)
    2. The directory has to be visible to the Eclipse UI. (That is, no hidden directories starting with dots.)
    
    Can we make the directory in a known location, and can we use a nice simple name like "codegen"?


---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/984
  
    After squashing the commits and rebasing I noticed the windows functional tests were failing. The issue was caused by replacing the '/' constant in FileUtils (now renamed to DrillFileUtils) in **ClassTransformer** and **FunctionInitializer** with File.separator. This broke the build because File.separator is '\' on windows and in the context of **ClassTransformer** and **FunctionInitializer** the separator is used to look things up in the classpath. Looking things up in the classpath requires '/' on both windows and linux, hence I added back the '/' constant to DrillFileUtils along with a comment explaining its purpose and used it in **ClassTransformer** and **FunctionInitializer**. *Note:* this was a minor fix that impacted only a few lines.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147007945
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
    @@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
             "  nations.N_NAME,\n" +
             "  regions.R_NAME\n" +
             "FROM\n" +
    -        "  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` nations\n" +
    +        "  dfs.`/sample-data/nation.parquet` nations\n" +
    --- End diff --
    
    Just mentioned this above but will repeat here.I have now removed **dfs_test** completely. There was no reason for it to be added and it was inconsistently being mixed with **dfs**. If you want to query a file on the local filesystem that is not on the classpath just using **dfs** will be sufficient now.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147257244
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
    @@ -0,0 +1,184 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.base.Charsets;
    +import org.apache.commons.io.FileUtils;
    +import org.apache.drill.common.util.TestTools;
    +import org.apache.drill.exec.util.TestUtilities;
    +import org.junit.runner.Description;
    +
    +import java.io.File;
    +import java.io.IOException;
    +
    +public class BaseDirTestWatcher extends DirTestWatcher {
    +  public enum DirType {
    +    ROOT,
    +    TEST_TMP;
    +  }
    +
    +  private File tmpDir;
    +  private File storeDir;
    +  private File dfsTestTmpParentDir;
    +  private File dfsTestTmpDir;
    +  private File rootDir;
    +
    +  public BaseDirTestWatcher() {
    +    super();
    +  }
    +
    +  @Override
    +  protected void starting(Description description) {
    +    super.starting(description);
    +
    +    rootDir = makeSubDir("root");
    +    tmpDir = new File(rootDir, "tmp");
    +    storeDir = new File(rootDir, "store");
    +    dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
    +
    +    tmpDir.mkdirs();
    +    storeDir.mkdirs();
    +    dfsTestTmpParentDir.mkdirs();
    +  }
    +
    +  public File getTmpDir() {
    +    return tmpDir;
    +  }
    +
    +  public File getStoreDir() {
    +    return storeDir;
    +  }
    +
    +  public File getDfsTestTmpParentDir() {
    +    return dfsTestTmpParentDir;
    +  }
    +
    +  public File getDfsTestTmpDir() {
    +    return dfsTestTmpDir;
    +  }
    +
    +  public String getDfsTestTmpDirPath() {
    +    return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
    +  }
    +
    +  public File getRootDir() {
    +    return rootDir;
    +  }
    +
    +  public String getRootDirPath() {
    +    return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
    +  }
    +
    +  public void newDfsTestTmpDir() {
    +    dfsTestTmpDir = TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
    +  }
    +
    +  private File getDir(DirType type) {
    +    switch (type) {
    +      case ROOT:
    +        return rootDir;
    +      case TEST_TMP:
    +        return dfsTestTmpDir;
    +      default:
    +        throw new IllegalArgumentException(String.format("Unsupported type %s", type));
    +    }
    +  }
    +
    +  public File makeRootSubDir(String relPath) {
    --- End diff --
    
    Fair enough. I would prefer to use File objects to represent concrete files or directories. Java.nio.Path objects are useful because they can represent relative paths, and provide a bunch of methods for manipulating paths. I'll update the methods I introduced to use jva.io.Files and java.nio.Paths


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145509549
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSimpleExternalSort.java ---
    @@ -147,7 +152,7 @@ public void sortOneKeyDescendingExternalSortLegacy() throws Throwable {
       }
     
       private void sortOneKeyDescendingExternalSort(boolean testLegacy) throws Throwable {
    -    FixtureBuilder builder = ClusterFixture.builder()
    +    FixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
    --- End diff --
    
    Nice, but I might move this to a method:
    
    ```
    FixtureBuilder builder = ClusterFixture.builder()
        .withTempDir(dirTestWatcher)
        .configProperty(...) ...
    ```
    
    Reason: if this has to go into the constructor, then every constructor use must change, and tests must create a temp directory even when not needed. But, if passed in using a builder method, then we can use it regardless of how we create the builder object, and the directory is optional.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145574026
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    +   * The elements in the given batch are added to the priority queue. Note that the priority queue
    +   * only retains the top elements that fit within the size specified by the {@link #init(int, BufferAllocator, boolean)}
    +   * method.
    +   * @param batch The batch containing elements we want to add.
    +   * @throws SchemaChangeException
    +   */
    +  void add(RecordBatchData batch) throws SchemaChangeException;
     
    +  /**
    +   * Initializes the priority queue. This method must be called before any other methods on the priority
    +   * queue are called.
    +   * @param limit The size of the priority queue.
    +   * @param allocator The {@link BufferAllocator} to use when creating the priority queue.
    +   * @param hasSv2 True when incoming batches have v2 selection vectors. False otherwise.
    +   * @throws SchemaChangeException
    +   */
    +  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    +
    +  /**
    +   * This method must be called before fetching the final heap hyper batch and final Sv4 vector.
    +   * @throws SchemaChangeException
    +   */
    +  void generate() throws SchemaChangeException;
    +
    +  /**
    +   * Retrieves the final priority queue HyperBatch containing the results. <b>Note:</b> this should be called
    +   * after {@link #generate()}.
    +   * @return The final priority queue HyperBatch containing the results.
    +   */
    +  VectorContainer getHyperBatch();
    +
    +  SelectionVector4 getSv4();
    +
    +  /**
    +   * Retrieves the selection vector used to select the elements in the priority queue from the hyper batch
    +   * provided by the {@link #getHyperBatch()} method. <b>Note:</b> this should be called after {@link #generate()}.
    +   * @return The selection vector used to select the elements in the priority queue.
    +   */
    +  SelectionVector4 getFinalSv4();
    --- End diff --
    
    The code already works that way. There is a config option <b>drill.exec.sort.purge.threshold</b> which controls the maximum number of batches allowed in the hyper batch. Once the threshold is exceeded the top N values are consolidated into a single batch and the process is repeated. There is an issue in the case where the limit is large. Ex. 100,000,000 . In this case the operator will keep all the records in memory and die. There is a jira created to address this issue: [https://issues.apache.org/jira/browse/DRILL-5823]


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145516665
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.junit.Assert;
    +
    +import java.io.UnsupportedEncodingException;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class BatchUtils {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
    +
    +  public static Map<String, List<Object>> containerToObjects(VectorContainer vectorContainer) {
    --- End diff --
    
    Do we really want to create yet another way to muck with data? Any reason we can't use the `RowSet` concepts here? I fear that if we have multiple ways to do the same thing, we won't achieve critical mass on any of them and we'll have a bunch of half-baked solutions. I'd rather have one fully-baked solution we use over and over. Allows new tests to get done easily because the required tools already exist. This, in turn, encourages testing.
    
    What are we doing here that `RowSet` can't (yet) do?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r148144235
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
    @@ -0,0 +1,184 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.base.Charsets;
    +import org.apache.commons.io.FileUtils;
    +import org.apache.drill.common.util.TestTools;
    +import org.apache.drill.exec.util.TestUtilities;
    +import org.junit.runner.Description;
    +
    +import java.io.File;
    +import java.io.IOException;
    +
    +public class BaseDirTestWatcher extends DirTestWatcher {
    +  public enum DirType {
    +    ROOT,
    +    TEST_TMP;
    +  }
    +
    +  private File tmpDir;
    +  private File storeDir;
    +  private File dfsTestTmpParentDir;
    +  private File dfsTestTmpDir;
    +  private File rootDir;
    +
    +  public BaseDirTestWatcher() {
    +    super();
    +  }
    +
    +  @Override
    +  protected void starting(Description description) {
    +    super.starting(description);
    +
    +    rootDir = makeSubDir("root");
    +    tmpDir = new File(rootDir, "tmp");
    +    storeDir = new File(rootDir, "store");
    +    dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
    +
    +    tmpDir.mkdirs();
    +    storeDir.mkdirs();
    +    dfsTestTmpParentDir.mkdirs();
    +  }
    +
    +  public File getTmpDir() {
    +    return tmpDir;
    +  }
    +
    +  public File getStoreDir() {
    +    return storeDir;
    +  }
    +
    +  public File getDfsTestTmpParentDir() {
    +    return dfsTestTmpParentDir;
    +  }
    +
    +  public File getDfsTestTmpDir() {
    +    return dfsTestTmpDir;
    +  }
    +
    +  public String getDfsTestTmpDirPath() {
    +    return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
    +  }
    +
    +  public File getRootDir() {
    +    return rootDir;
    +  }
    +
    +  public String getRootDirPath() {
    +    return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
    +  }
    +
    +  public void newDfsTestTmpDir() {
    +    dfsTestTmpDir = TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
    +  }
    +
    +  private File getDir(DirType type) {
    +    switch (type) {
    +      case ROOT:
    +        return rootDir;
    +      case TEST_TMP:
    +        return dfsTestTmpDir;
    +      default:
    +        throw new IllegalArgumentException(String.format("Unsupported type %s", type));
    +    }
    +  }
    +
    +  public File makeRootSubDir(String relPath) {
    +    return makeSubDir(relPath, DirType.ROOT);
    +  }
    +
    +  public File makeTestTmpSubDir(String relPath) {
    +    return makeSubDir(relPath, DirType.TEST_TMP);
    +  }
    +
    +  private File makeSubDir(String relPath, DirType type) {
    +    File subDir = new File(getDir(type), relPath);
    +    subDir.mkdirs();
    +    return subDir;
    +  }
    +
    +  /**
    +   * This preserves the relative path of the directory in root
    +   * @param relPath
    +   * @return
    +   */
    +  public File copyResourceToRoot(String relPath) {
    +    return copyTo(relPath, relPath, TestTools.DataType.RESOURCE, DirType.ROOT);
    +  }
    +
    +  public File copyFileToRoot(String relPath) {
    +    return copyTo(relPath, relPath, TestTools.DataType.PROJECT, DirType.ROOT);
    +  }
    +
    +  public File copyResourceToRoot(String relPath, String destPath) {
    +    return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, DirType.ROOT);
    +  }
    +
    +  public File copyResourceToTestTmp(String relPath, String destPath) {
    +    return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, DirType.TEST_TMP);
    +  }
    +
    +  private File copyTo(String relPath, String destPath, TestTools.DataType dataType, DirType dirType) {
    +    File file = TestTools.getFile(relPath, dataType);
    +
    +    if (file.isDirectory()) {
    +      File subDir = makeSubDir(destPath, dirType);
    +      TestTools.copyDirToDest(relPath, subDir, dataType);
    +      return subDir;
    +    } else {
    +      File baseDir = getDir(dirType);
    +
    +      destPath = destPath.startsWith("/") ? destPath.substring(1): destPath;
    +      baseDir.toPath()
    +        .resolve(destPath)
    +        .getParent()
    +        .toFile()
    +        .mkdirs();
    +
    +      File destFile = baseDir.toPath()
    +        .resolve(destPath)
    +        .toFile();
    +
    +      try {
    +        destFile.createNewFile();
    +        FileUtils.copyFile(file, destFile);
    +      } catch (IOException e) {
    +        throw new RuntimeException("This should not happen", e);
    +      }
    +
    +      return destFile;
    +    }
    +  }
    +
    +  public void replaceMetaDataContents(File metaDataFile, String replacePath, String customStringReplacement) {
    --- End diff --
    
    done


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147229671
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java ---
    @@ -76,43 +78,50 @@
       //        - detecting corrupt values must be deferred to actual data page reading
       //    - one from 1.4, where there is a proper created-by, but the corruption is present
       private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
    +      "/parquet/4203_corrupt_dates/mixed_drill_versions";
       // partitioned with 1.2.0, no certain metadata that these were written with Drill
       // the value will be checked to see that they look corrupt and they will be corrected
       // by default. Users can use the format plugin option autoCorrectCorruptDates to disable
       // this behavior if they have foreign parquet files with valid rare date values that are
       // in the similar range as Drill's corrupt values
    +  private static final String PARTITIONED_1_2_FOLDER = "partitioned_with_corruption_4203_1_2";
       private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
    +      Paths.get("/parquet/4203_corrupt_dates", PARTITIONED_1_2_FOLDER).toString();
       // partitioned with 1.4.0, no certain metadata regarding the date corruption status.
       // The same detection approach of the corrupt date values as for the files partitioned with 1.2.0
    +  private static final String PARTITIONED_1_4_FOLDER = "partitioned_with_corruption_4203";
       private static final String CORRUPTED_PARTITIONED_DATES_1_4_0_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203";
    +      Paths.get("/parquet/4203_corrupt_dates", PARTITIONED_1_4_FOLDER).toString();
       private static final String PARQUET_DATE_FILE_WITH_NULL_FILLED_COLS =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
    +      "/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
    +  private static final String PARTITIONED_1_9_FOLDER = "1_9_0_partitioned_no_corruption";
       private static final String CORRECT_PARTITIONED_DATES_1_9_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption";
    +      Paths.get("/parquet/4203_corrupt_dates", PARTITIONED_1_9_FOLDER).toString();
       private static final String VARCHAR_PARTITIONED =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
    +      "/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
       private static final String DATE_PARTITIONED =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition";
    +      "/parquet/4203_corrupt_dates/fewtypes_datepartition";
       private static final String EXCEPTION_WHILE_PARSING_CREATED_BY_META =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
    +      "/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
       private static final String CORRECT_DATES_1_6_0_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
    -  private static final String PARTITIONED_1_2_FOLDER = "partitioned_with_corruption_4203_1_2";
    +      "/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
       private static final String MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER = "mixed_partitioned";
     
    -
       @BeforeClass
       public static void initFs() throws Exception {
    +    dirTestWatcher.copyResourceToRoot("/parquet/4203_corrupt_dates");
    --- End diff --
    
    fixed


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145516134
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
    @@ -0,0 +1,184 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.base.Charsets;
    +import org.apache.commons.io.FileUtils;
    +import org.apache.drill.common.util.TestTools;
    +import org.apache.drill.exec.util.TestUtilities;
    +import org.junit.runner.Description;
    +
    +import java.io.File;
    +import java.io.IOException;
    +
    +public class BaseDirTestWatcher extends DirTestWatcher {
    +  public enum DirType {
    +    ROOT,
    +    TEST_TMP;
    +  }
    +
    +  private File tmpDir;
    +  private File storeDir;
    +  private File dfsTestTmpParentDir;
    +  private File dfsTestTmpDir;
    +  private File rootDir;
    +
    +  public BaseDirTestWatcher() {
    +    super();
    +  }
    +
    +  @Override
    +  protected void starting(Description description) {
    +    super.starting(description);
    +
    +    rootDir = makeSubDir("root");
    +    tmpDir = new File(rootDir, "tmp");
    +    storeDir = new File(rootDir, "store");
    +    dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
    +
    +    tmpDir.mkdirs();
    +    storeDir.mkdirs();
    +    dfsTestTmpParentDir.mkdirs();
    +  }
    +
    +  public File getTmpDir() {
    +    return tmpDir;
    +  }
    +
    +  public File getStoreDir() {
    +    return storeDir;
    +  }
    +
    +  public File getDfsTestTmpParentDir() {
    +    return dfsTestTmpParentDir;
    +  }
    +
    +  public File getDfsTestTmpDir() {
    +    return dfsTestTmpDir;
    +  }
    +
    +  public String getDfsTestTmpDirPath() {
    +    return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
    +  }
    +
    +  public File getRootDir() {
    +    return rootDir;
    +  }
    +
    +  public String getRootDirPath() {
    +    return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
    +  }
    +
    +  public void newDfsTestTmpDir() {
    +    dfsTestTmpDir = TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
    +  }
    +
    +  private File getDir(DirType type) {
    +    switch (type) {
    +      case ROOT:
    +        return rootDir;
    +      case TEST_TMP:
    +        return dfsTestTmpDir;
    +      default:
    +        throw new IllegalArgumentException(String.format("Unsupported type %s", type));
    +    }
    +  }
    +
    +  public File makeRootSubDir(String relPath) {
    +    return makeSubDir(relPath, DirType.ROOT);
    +  }
    +
    +  public File makeTestTmpSubDir(String relPath) {
    +    return makeSubDir(relPath, DirType.TEST_TMP);
    +  }
    +
    +  private File makeSubDir(String relPath, DirType type) {
    +    File subDir = new File(getDir(type), relPath);
    +    subDir.mkdirs();
    +    return subDir;
    +  }
    +
    +  /**
    +   * This preserves the relative path of the directory in root
    +   * @param relPath
    +   * @return
    +   */
    +  public File copyResourceToRoot(String relPath) {
    +    return copyTo(relPath, relPath, TestTools.DataType.RESOURCE, DirType.ROOT);
    +  }
    +
    +  public File copyFileToRoot(String relPath) {
    +    return copyTo(relPath, relPath, TestTools.DataType.PROJECT, DirType.ROOT);
    +  }
    +
    +  public File copyResourceToRoot(String relPath, String destPath) {
    +    return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, DirType.ROOT);
    +  }
    +
    +  public File copyResourceToTestTmp(String relPath, String destPath) {
    +    return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, DirType.TEST_TMP);
    +  }
    +
    +  private File copyTo(String relPath, String destPath, TestTools.DataType dataType, DirType dirType) {
    +    File file = TestTools.getFile(relPath, dataType);
    +
    +    if (file.isDirectory()) {
    +      File subDir = makeSubDir(destPath, dirType);
    +      TestTools.copyDirToDest(relPath, subDir, dataType);
    +      return subDir;
    +    } else {
    +      File baseDir = getDir(dirType);
    +
    +      destPath = destPath.startsWith("/") ? destPath.substring(1): destPath;
    +      baseDir.toPath()
    +        .resolve(destPath)
    +        .getParent()
    +        .toFile()
    +        .mkdirs();
    +
    +      File destFile = baseDir.toPath()
    +        .resolve(destPath)
    +        .toFile();
    +
    +      try {
    +        destFile.createNewFile();
    +        FileUtils.copyFile(file, destFile);
    +      } catch (IOException e) {
    +        throw new RuntimeException("This should not happen", e);
    +      }
    +
    +      return destFile;
    +    }
    +  }
    +
    +  public void replaceMetaDataContents(File metaDataFile, String replacePath, String customStringReplacement) {
    --- End diff --
    
    See? Here we mix `File` and `String`. The madness continues...


---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/984
  
    @paul-rogers Applied / responding to comments. Also removed RecordBatchBuilder and used RowSetBuilder.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r148157850
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.junit.Assert;
    +
    +import java.io.UnsupportedEncodingException;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class BatchUtils {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
    +
    +  public static Map<String, List<Object>> containerToObjects(VectorContainer vectorContainer) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int recordCount = vectorContainer.getRecordCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      ValueVector.Accessor valueVectorAccessor = vectorContainer
    +        .getValueVector(columnIndex)
    +        .getValueVector()
    +        .getAccessor();
    +
    +      for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
    +        data.add(valueVectorAccessor.getObject(recordIndex));
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static Map<String, List<Object>> hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, SelectionVector4 selectionVector4) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int numIndices = selectionVector4.getCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      VectorWrapper vectorWrapper = vectorContainer.getValueVector(columnIndex);
    +
    +      for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
    +        int sv4Index = selectionVector4.get(indexIndex);
    +        int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
    +        int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
    +
    +        ValueVector valueVector = vectorWrapper.getValueVectors()[batchIndex];
    +        Object columnValue = valueVector.getAccessor().getObject(recordIndex);
    +        data.add(columnValue);
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static String toString(Map<String, List<Object>> table) {
    +    if (table.isEmpty()) {
    +      return "[ empty table ]";
    +    }
    +
    +    List<String> columnNames = Lists.newArrayList(table.keySet());
    +    Collections.sort(columnNames);
    +    int numRecords = table.get(columnNames.get(0)).size();
    +
    +    StringBuilder sb = new StringBuilder();
    +
    +    {
    +      sb.append("[ ");
    +      String separator = "";
    +
    +      for (String columnName : columnNames) {
    +        sb.append(separator);
    +        separator = ", ";
    +        sb.append(columnName);
    +      }
    +
    +      sb.append(" ]\n");
    +    }
    +
    +    for (int recordIndex = 0; recordIndex < numRecords; recordIndex++) {
    +      sb.append("{");
    +      String separator = "";
    +
    +      for (String columnName : columnNames) {
    +        sb.append(separator);
    +        separator = ", ";
    +        sb.append(table.get(columnName).get(recordIndex));
    +      }
    +
    +      sb.append("}\n");
    +    }
    +
    +    return sb.toString();
    +  }
    +
    +  public static void assertEqual(Map<String, List<Object>> expected, Map<String, List<Object>> actual) {
    +    if (expected.isEmpty() && actual.isEmpty()) {
    +      // both tables are empty
    +      return;
    +    }
    +
    +    if (!expected.keySet().equals(actual.keySet())) {
    +      List<String> columnNamesExpected = Lists.newArrayList(expected.keySet());
    +      List<String> columnNamesActual = Lists.newArrayList(actual.keySet());
    +
    +      Collections.sort(columnNamesExpected);
    +      Collections.sort(columnNamesActual);
    +
    +      String message = String.format("The columns in the expected table %s don't match the columns in actual table %s",
    +        columnNamesExpected.toString(), columnNamesActual.toString());
    +      Assert.fail(message);
    +    }
    +
    +    List<String> columnNames = Lists.newArrayList(expected.keySet());
    +    Collections.sort(columnNames);
    +
    +    int numRows = expected.get(columnNames.get(0)).size();
    +
    +    checkTableRowCount(expected, numRows, "expected");
    +    checkTableRowCount(actual, numRows, "actual");
    +
    +    if (numRows == 0) {
    +      // The tables are empty
    +      return;
    +    }
    +
    +    // The tables are non-empty
    +
    +    // Validate each row of the two tables are equal
    +    for (int rowIndex = 0; rowIndex < numRows; rowIndex++) {
    +      for (String columnName: columnNames) {
    +        Object expectedObject = expected.get(columnName).get(rowIndex);
    +        Object actualObject = actual.get(columnName).get(rowIndex);
    +        compareValuesErrorOnMismatch(expectedObject, actualObject, rowIndex, columnName);
    +      }
    +    }
    +  }
    +
    +  public static boolean compareValuesErrorOnMismatch(Object expected, Object actual, int counter, String column) {
    +    if (compareValues(expected, actual)) {
    +      return true;
    +    }
    +
    +    if (expected == null) {
    +      String message = String.format("at row %s column '%s' mismatched values, expected: null " +
    +        "but received %s (%s)", counter, column, actual, actual.getClass().getSimpleName());
    +      Assert.fail(message);
    +    }
    +
    +    if (actual == null) {
    +      String message = String.format("unexpected null at row %s column '%s' should have been: %s", counter, column, expected);
    +      Assert.fail(message);
    +    }
    +
    +    if (actual instanceof byte[]) {
    +      try {
    +        String message = String.format("at row %s column '%s' mismatched values, expected: %s but received %s",
    +          counter, column, new String((byte[])expected, "UTF-8"), new String((byte[])actual, "UTF-8"));
    +        Assert.fail(message);
    +      } catch (UnsupportedEncodingException e) {
    +        throw new RuntimeException("This should never happen", e);
    +      }
    +    }
    +
    +    if (!expected.equals(actual)) {
    +      String message = String.format("at row %s column '%s' mismatched values, expected: %s (%s) but received %s (%s)",
    +        counter, column, expected, expected.getClass().getSimpleName(), actual, actual.getClass().getSimpleName());
    +      Assert.fail(message);
    +    }
    +    return true;
    +  }
    +
    +  public static boolean compareValues(Object expected, Object actual) {
    +    if (expected == null) {
    +      if (actual == null) {
    +        return true;
    +      } else {
    +        return false;
    +      }
    +    }
    +
    +    if (actual == null) {
    +      return false;
    +    }
    +
    +    if (actual instanceof byte[]) {
    +      if ( ! Arrays.equals((byte[]) expected, (byte[]) actual)) {
    +        return false;
    +      } else {
    +        return true;
    +      }
    +    }
    +
    +    if (!expected.equals(actual)) {
    +      return false;
    +    }
    +
    +    return true;
    +  }
    +
    +  private static void checkTableRowCount(Map<String, List<Object>> table, int numRows, String tableName) {
    +    for (Map.Entry<String, List<Object>> entry: table.entrySet()) {
    +      List<Object> data = entry.getValue();
    +
    +      if (numRows != data.size()) {
    +        String message = String.format("Table %s does not have %s rows. Instead it has %s rows",
    +          tableName, numRows, data.size());
    +        Assert.fail(message);
    +      }
    +    }
    +  }
    +
    +  public static class ObjectComparator implements Comparator<Object> {
    --- End diff --
    
    I looked at RowSetComparison. It had code for testing equality, but not for ordering (less than, greater than) which is what I needed for my test. I've moved this class into RowSetComparison without any changes for now. We need to modify DrillTestWrapper to use the RowSet classes first, then we can make ObjectComparator to be something native to RowSets.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144936994
  
    --- Diff: common/src/test/java/org/apache/drill/test/DirTestWatcher.java ---
    @@ -32,23 +32,50 @@
     public class DirTestWatcher extends TestWatcher {
       private String dirPath;
       private File dir;
    +  private boolean deleteDirAtEnd = true;
    --- End diff --
    
    Thanks for the Javadoc added previously. Perhaps add a section on how to use this class. If I need a test directory, do I create an instance of this class? Or, does JUnit do it for me? If done automagically, how do I get a copy of the directory?
    
    If this is explained in JUnit, perhaps just reference that information. Even better would be a short example:
    
    > To get a test directory:
    ```
        // Do something here
        File myDir = // do something
    ```


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145517411
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.junit.Assert;
    +
    +import java.io.UnsupportedEncodingException;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class BatchUtils {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
    +
    +  public static Map<String, List<Object>> containerToObjects(VectorContainer vectorContainer) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int recordCount = vectorContainer.getRecordCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      ValueVector.Accessor valueVectorAccessor = vectorContainer
    +        .getValueVector(columnIndex)
    +        .getValueVector()
    +        .getAccessor();
    +
    +      for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
    +        data.add(valueVectorAccessor.getObject(recordIndex));
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static Map<String, List<Object>> hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, SelectionVector4 selectionVector4) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int numIndices = selectionVector4.getCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      VectorWrapper vectorWrapper = vectorContainer.getValueVector(columnIndex);
    +
    +      for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
    +        int sv4Index = selectionVector4.get(indexIndex);
    +        int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
    +        int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
    +
    +        ValueVector valueVector = vectorWrapper.getValueVectors()[batchIndex];
    +        Object columnValue = valueVector.getAccessor().getObject(recordIndex);
    +        data.add(columnValue);
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static String toString(Map<String, List<Object>> table) {
    +    if (table.isEmpty()) {
    +      return "[ empty table ]";
    +    }
    +
    +    List<String> columnNames = Lists.newArrayList(table.keySet());
    +    Collections.sort(columnNames);
    +    int numRecords = table.get(columnNames.get(0)).size();
    +
    +    StringBuilder sb = new StringBuilder();
    +
    +    {
    +      sb.append("[ ");
    +      String separator = "";
    +
    +      for (String columnName : columnNames) {
    +        sb.append(separator);
    +        separator = ", ";
    +        sb.append(columnName);
    +      }
    +
    +      sb.append(" ]\n");
    +    }
    +
    +    for (int recordIndex = 0; recordIndex < numRecords; recordIndex++) {
    +      sb.append("{");
    +      String separator = "";
    +
    +      for (String columnName : columnNames) {
    +        sb.append(separator);
    +        separator = ", ";
    +        sb.append(table.get(columnName).get(recordIndex));
    +      }
    +
    +      sb.append("}\n");
    +    }
    +
    +    return sb.toString();
    +  }
    +
    +  public static void assertEqual(Map<String, List<Object>> expected, Map<String, List<Object>> actual) {
    --- End diff --
    
    See `RowSetComparison`
    
    Yes, the name can be improved. And, since it only ever gets used one way, the interface an be simplified. But, it does the job: for all data types and (if DRILL-5657 is ever approved) handles maps and repeated maps.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r150098888
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java ---
    @@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader ea,
           }
         }
       }
    +
    +  // TODO make a native RowSetComparison comparator
    +  public static class ObjectComparator implements Comparator<Object> {
    --- End diff --
    
    This is used in the DrillTestWrapper to verify the ordering of results. I agree this is not suitable for equality tests, but it's intended to be used only for ordering tests. I didn't add support for all the supported RowSet types because we would first have to move DrillTestWrapper to use RowSets (currently it uses Maps and Lists to represent data). Currently it is not used by RowSets, but the intention is to move DrillTestWrapper to use RowSets and then make this comparator operate on RowSets, but that will be an incremental process.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147007663
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
    @@ -19,24 +19,33 @@
     
     import org.apache.drill.categories.OperatorTest;
     import org.apache.drill.categories.SqlTest;
    +import org.apache.drill.test.BaseTestQuery;
    +import org.junit.BeforeClass;
     import org.junit.Test;
     import org.junit.experimental.categories.Category;
     
     @Category({SqlTest.class, OperatorTest.class})
    -public class TestAltSortQueries extends BaseTestQuery{
    +public class TestAltSortQueries extends BaseTestQuery {
       static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
     
    +  @BeforeClass
    +  public static void setupTestFiles() {
    +    dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
    +    dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
    +    dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
    +  }
    +
       @Test
       public void testOrderBy() throws Exception{
         test("select R_REGIONKEY " +
    -         "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` " +
    +         "from dfs_test.`/sample-data/region.parquet` " +
    --- End diff --
    
    I have now removed **dfs_test** completely. There was no reason for it to be added and it was inconsistently being mixed with **dfs**. The **dfs** workspaces are automatically mapped to the correct temp directories for you provided that you use **BaseTestQuery** or the **ClusterFixture**. I will update **org.apache.drill.test.package-info.java** with the theory of how this works and will add a simple example to **ExampleTest.java**


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner-mapr <gi...@git.apache.org>.
Github user ilooner-mapr commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145561373
  
    --- Diff: common/src/main/java/org/apache/drill/common/util/TestTools.java ---
    @@ -17,15 +17,28 @@
      */
     package org.apache.drill.common.util;
     
    +import java.io.File;
    +import java.io.IOException;
     import java.nio.file.Paths;
     
    +import org.apache.commons.io.FileUtils;
     import org.junit.rules.TestName;
     import org.junit.rules.TestRule;
     import org.junit.rules.Timeout;
     
    +import static org.apache.drill.common.util.TestTools.DataType.PROJECT;
    +import static org.apache.drill.common.util.TestTools.DataType.RESOURCE;
    +
     public class TestTools {
       // private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestTools.class);
     
    +  public enum DataType {
    --- End diff --
    
    Done changed to FileSource


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145575310
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java ---
    @@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
         logger.debug("Took {} us to purge", watch.elapsed(TimeUnit.MICROSECONDS));
       }
     
    -  public PriorityQueue createNewPriorityQueue(FragmentContext context, List<Ordering> orderings,
    -                                                     VectorAccessible batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
    -          throws ClassTransformationException, IOException, SchemaChangeException{
    -    CodeGenerator<PriorityQueue> cg = CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions());
    +  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int limit)
    +    throws SchemaChangeException, ClassTransformationException, IOException {
    +    return createNewPriorityQueue(
    +      mainMapping, leftMapping, rightMapping, context.getOptionSet(), context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
    +      config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, oContext.getAllocator(), schema.getSelectionVectorMode());
    +  }
    +
    +  public static MappingSet createMainMappingSet() {
    +    return new MappingSet((String) null, null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static MappingSet createLeftMappingSet() {
    +    return new MappingSet("leftIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static MappingSet createRightMappingSet() {
    +    return new MappingSet("rightIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static PriorityQueue createNewPriorityQueue(
    +    MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping,
    --- End diff --
    
    Agreed. Pritesh has asked me to look at Codegen improvements in the long term, so I will work on this down the line.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r150073673
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java ---
    @@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader ea,
           }
         }
       }
    +
    +  // TODO make a native RowSetComparison comparator
    +  public static class ObjectComparator implements Comparator<Object> {
    --- End diff --
    
    Defined here, but not used in this file. Does not include all types that Drill supports (via the RowSet): Date, byte arrays, BigDecimal, etc. Does not allow for ranges for floats & doubles as does JUnit. (Two floats are seldom exactly equal.)


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r150073945
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
    @@ -85,8 +85,7 @@
        * new row set with the updated columns, then merge the new
        * and old row sets to create a new immutable row set.
        */
    -
    -  public interface RowSetWriter extends TupleWriter {
    +  interface RowSetWriter extends TupleWriter {
    --- End diff --
    
    Aren't nested interfaces `protected` by default? Just had to change one from default to `public` so I could use it in another package...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r148395420
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java ---
    @@ -377,21 +386,21 @@ public void testReadOldMetadataCacheFileOverrideCorrection() throws Exception {
     
       @Test
       public void testReadNewMetadataCacheFileOverOldAndNewFiles() throws Exception {
    -    String table = format("dfs.`%s`", new Path(getDfsTestTmpSchemaLocation(), MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER));
    -    copyMetaDataCacheToTempReplacingInternalPaths(
    +    File meta = dirTestWatcher.copyResourceToRoot(
             "parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt",
    -        MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, Metadata.METADATA_FILENAME);
    +        Paths.get(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, Metadata.METADATA_FILENAME).toString());
    --- End diff --
    
    I've cleaned this up a bit. Of the two patterns for building paths and files I prefer pattern **A** because it is cleaner.
    
    **Pattern A:**
    ```
    myPath.resolve("subDir1")
      .resolve("subDir2")
      .toFile();
    ```
    
    **Pattern B:**
    ```
    new File(new File(myFile, "subDir1"), "subDir2")
    ```


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145556819
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -908,6 +908,15 @@ public void generateTestData(int count) {
         }
     
       <#else> <#-- type.width <= 8 -->
    +    @Override
    +    public void add(Object value) {
    +      int index = accessor.getValueCount();
    +      int valueCount = index + 1;
    +      setValueCount(valueCount);
    +
    +      set(index, (${type.javaType}) value);
    --- End diff --
    
    I don't think this will work as you hope it will -- for obscure reasons that we've all learned the hard way...
    
    * The accessor does not keep an ongoing count of values. Rather, the value count comes from the writer index which is set only at the end of a write.
    * This method sets the value count, but doing so is, in general, expensive.
    * Vectors in a batch must be kept in sync. This adds a value to one vector, but does not coordinate across a set of vectors.
    * Not all values can be set by casting. Works for some numeric values, but does not handle, say, `add(10)` when the item is a byte or a short because of the need for a two-step cast: `(byte)(int) value`.
    * Does not handle conversions for date or decimal.
    
    Yes, this is all a colossal pain in the neck. But, it is the reason that the `RowSet` classes were created.
    
    We can talk in person about how to move this idea forward effectively.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145294641
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java ---
    @@ -33,8 +35,11 @@
     @Category(UnlikelyTest.class)
     public class TestBugFixes extends BaseTestQuery {
       private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestBugFixes.class);
    -  private static final String WORKING_PATH = TestTools.getWorkingPath();
    -  private static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
    +
    +  @BeforeClass
    +  public static void setupTestFiles() {
    +    dirTestWatcher.copyResourceToRoot("/bugs/DRILL-4192");
    --- End diff --
    
    Small nit. Presumably `/bugs/DRILL-4192` is a relative path. Should we express it in the normal Unix fashion as `bugs/DRILL-4192`? That way, in time-honored fashion, a leading slash tells us the path is absolute.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144194241
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
    @@ -90,12 +89,11 @@
       private String generatedCode;
       private String generifiedCode;
     
    -  CodeGenerator(TemplateClassDefinition<T> definition, FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
    -    this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, optionManager);
    +  CodeGenerator(TemplateClassDefinition<T> definition, OptionSet optionManager) {
    +    this(ClassGenerator.getDefaultMapping(), definition, optionManager);
    --- End diff --
    
    I traced through the code and asked IntelliJ where the variable is used. This is what I found:
    
    1. CodeGenerator accepts it as an argument to its constructor
    1. It is then passed to the constructor of the EvaluationVisitor
    1. A private field is set with the value in the constructor of EvaluationVisitor
    1. This private field of EvaluationVisitor is then unused.
    
    If your not convinced let's walk through the code when I'm in the office next week.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145292776
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
    @@ -21,29 +21,25 @@
     
     import org.apache.drill.categories.OperatorTest;
     import org.apache.drill.common.util.TestTools;
    +import org.apache.drill.test.BaseTestQuery;
     import org.junit.Test;
     import org.junit.experimental.categories.Category;
     
     @Category(OperatorTest.class)
    -public class TestAggNullable extends BaseTestQuery{
    +public class TestAggNullable extends BaseTestQuery {
       static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestAggNullable.class);
     
    -  static final String WORKING_PATH = TestTools.getWorkingPath();
    -  static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
    -
       private static void enableAggr(boolean ha, boolean sa) throws Exception {
     
    -    test(String.format("alter session set `planner.enable_hashagg` = %s", ha ? "true":"false"));
    -    test(String.format("alter session set `planner.enable_streamagg` = %s", sa ? "true":"false"));
    +    test("alter session set `planner.enable_hashagg` = %s", ha);
    +    test("alter session set `planner.enable_streamagg` = %s", sa);
    --- End diff --
    
    I was about to suggest using the functions created for this purpose. But, then I realized those changes are in a PR that has not yet been reviewed...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145523267
  
    --- Diff: exec/java-exec/src/test/resources/topN/one_key_sort.json ---
    @@ -12,11 +12,11 @@
                 pop:"mock-scan",
                 url: "http://apache.org",
                 entries:[
    -                {records: 10000000, types: [
    +                {records: 100000, types: [
    --- End diff --
    
    This change reduces the record count by two orders of magnitude. Do we know if this results in testing the same functionality as the original test?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145518566
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.junit.Assert;
    +
    +import java.io.UnsupportedEncodingException;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class BatchUtils {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
    +
    +  public static Map<String, List<Object>> containerToObjects(VectorContainer vectorContainer) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int recordCount = vectorContainer.getRecordCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      ValueVector.Accessor valueVectorAccessor = vectorContainer
    +        .getValueVector(columnIndex)
    +        .getValueVector()
    +        .getAccessor();
    +
    +      for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
    +        data.add(valueVectorAccessor.getObject(recordIndex));
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static Map<String, List<Object>> hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, SelectionVector4 selectionVector4) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int numIndices = selectionVector4.getCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      VectorWrapper vectorWrapper = vectorContainer.getValueVector(columnIndex);
    +
    +      for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
    +        int sv4Index = selectionVector4.get(indexIndex);
    +        int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
    +        int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
    +
    +        ValueVector valueVector = vectorWrapper.getValueVectors()[batchIndex];
    +        Object columnValue = valueVector.getAccessor().getObject(recordIndex);
    +        data.add(columnValue);
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static String toString(Map<String, List<Object>> table) {
    +    if (table.isEmpty()) {
    +      return "[ empty table ]";
    +    }
    +
    +    List<String> columnNames = Lists.newArrayList(table.keySet());
    +    Collections.sort(columnNames);
    +    int numRecords = table.get(columnNames.get(0)).size();
    +
    +    StringBuilder sb = new StringBuilder();
    +
    +    {
    +      sb.append("[ ");
    +      String separator = "";
    +
    +      for (String columnName : columnNames) {
    +        sb.append(separator);
    +        separator = ", ";
    +        sb.append(columnName);
    +      }
    +
    +      sb.append(" ]\n");
    +    }
    +
    +    for (int recordIndex = 0; recordIndex < numRecords; recordIndex++) {
    +      sb.append("{");
    +      String separator = "";
    +
    +      for (String columnName : columnNames) {
    +        sb.append(separator);
    +        separator = ", ";
    +        sb.append(table.get(columnName).get(recordIndex));
    +      }
    +
    +      sb.append("}\n");
    +    }
    +
    +    return sb.toString();
    +  }
    +
    +  public static void assertEqual(Map<String, List<Object>> expected, Map<String, List<Object>> actual) {
    +    if (expected.isEmpty() && actual.isEmpty()) {
    +      // both tables are empty
    +      return;
    +    }
    +
    +    if (!expected.keySet().equals(actual.keySet())) {
    +      List<String> columnNamesExpected = Lists.newArrayList(expected.keySet());
    +      List<String> columnNamesActual = Lists.newArrayList(actual.keySet());
    +
    +      Collections.sort(columnNamesExpected);
    +      Collections.sort(columnNamesActual);
    +
    +      String message = String.format("The columns in the expected table %s don't match the columns in actual table %s",
    +        columnNamesExpected.toString(), columnNamesActual.toString());
    +      Assert.fail(message);
    +    }
    +
    +    List<String> columnNames = Lists.newArrayList(expected.keySet());
    +    Collections.sort(columnNames);
    +
    +    int numRows = expected.get(columnNames.get(0)).size();
    +
    +    checkTableRowCount(expected, numRows, "expected");
    +    checkTableRowCount(actual, numRows, "actual");
    +
    +    if (numRows == 0) {
    +      // The tables are empty
    +      return;
    +    }
    +
    +    // The tables are non-empty
    +
    +    // Validate each row of the two tables are equal
    +    for (int rowIndex = 0; rowIndex < numRows; rowIndex++) {
    +      for (String columnName: columnNames) {
    +        Object expectedObject = expected.get(columnName).get(rowIndex);
    +        Object actualObject = actual.get(columnName).get(rowIndex);
    +        compareValuesErrorOnMismatch(expectedObject, actualObject, rowIndex, columnName);
    +      }
    +    }
    +  }
    +
    +  public static boolean compareValuesErrorOnMismatch(Object expected, Object actual, int counter, String column) {
    +    if (compareValues(expected, actual)) {
    +      return true;
    +    }
    +
    +    if (expected == null) {
    +      String message = String.format("at row %s column '%s' mismatched values, expected: null " +
    +        "but received %s (%s)", counter, column, actual, actual.getClass().getSimpleName());
    +      Assert.fail(message);
    +    }
    +
    +    if (actual == null) {
    +      String message = String.format("unexpected null at row %s column '%s' should have been: %s", counter, column, expected);
    +      Assert.fail(message);
    +    }
    +
    +    if (actual instanceof byte[]) {
    +      try {
    +        String message = String.format("at row %s column '%s' mismatched values, expected: %s but received %s",
    +          counter, column, new String((byte[])expected, "UTF-8"), new String((byte[])actual, "UTF-8"));
    +        Assert.fail(message);
    +      } catch (UnsupportedEncodingException e) {
    +        throw new RuntimeException("This should never happen", e);
    +      }
    +    }
    +
    +    if (!expected.equals(actual)) {
    +      String message = String.format("at row %s column '%s' mismatched values, expected: %s (%s) but received %s (%s)",
    +        counter, column, expected, expected.getClass().getSimpleName(), actual, actual.getClass().getSimpleName());
    +      Assert.fail(message);
    +    }
    +    return true;
    +  }
    +
    +  public static boolean compareValues(Object expected, Object actual) {
    +    if (expected == null) {
    +      if (actual == null) {
    +        return true;
    +      } else {
    +        return false;
    +      }
    +    }
    +
    +    if (actual == null) {
    +      return false;
    +    }
    +
    +    if (actual instanceof byte[]) {
    +      if ( ! Arrays.equals((byte[]) expected, (byte[]) actual)) {
    +        return false;
    +      } else {
    +        return true;
    +      }
    +    }
    +
    +    if (!expected.equals(actual)) {
    +      return false;
    +    }
    +
    +    return true;
    +  }
    +
    +  private static void checkTableRowCount(Map<String, List<Object>> table, int numRows, String tableName) {
    +    for (Map.Entry<String, List<Object>> entry: table.entrySet()) {
    +      List<Object> data = entry.getValue();
    +
    +      if (numRows != data.size()) {
    +        String message = String.format("Table %s does not have %s rows. Instead it has %s rows",
    +          tableName, numRows, data.size());
    +        Assert.fail(message);
    +      }
    +    }
    +  }
    +
    +  public static class ObjectComparator implements Comparator<Object> {
    --- End diff --
    
    See similar code in the `RowSet` classes; though that code handles a larger set of types, handles maps, and handles arrays.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145790506
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -908,6 +908,15 @@ public void generateTestData(int count) {
         }
     
       <#else> <#-- type.width <= 8 -->
    +    @Override
    +    public void add(Object value) {
    +      int index = accessor.getValueCount();
    +      int valueCount = index + 1;
    +      setValueCount(valueCount);
    +
    +      set(index, (${type.javaType}) value);
    --- End diff --
    
    All the issues with this make sense. I should have deleted this when I switched to using the RowSet classes, but forgot too :). Removing this now.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144147623
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java ---
    @@ -335,20 +333,32 @@ private void purge() throws SchemaChangeException {
         logger.debug("Took {} us to purge", watch.elapsed(TimeUnit.MICROSECONDS));
       }
     
    -  public PriorityQueue createNewPriorityQueue(FragmentContext context, List<Ordering> orderings,
    -                                                     VectorAccessible batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
    -          throws ClassTransformationException, IOException, SchemaChangeException{
    -    CodeGenerator<PriorityQueue> cg = CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions());
    +  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int limit)
    +    throws SchemaChangeException, ClassTransformationException, IOException {
    +    return createNewPriorityQueue(context.getOptionSet(), context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
    +      config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, oContext.getAllocator(), schema.getSelectionVectorMode());
    +  }
    +
    +  public static PriorityQueue createNewPriorityQueue(
    +    OptionSet optionSet, FunctionLookupContext functionLookupContext, CodeCompiler codeCompiler,
    +    List<Ordering> orderings, VectorAccessible batch, boolean unionTypeEnabled, boolean codegenDump,
    +    int limit, BufferAllocator allocator, SelectionVectorMode mode)
    +          throws ClassTransformationException, IOException, SchemaChangeException {
    +    final MappingSet mainMapping = new MappingSet((String) null, null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +    final MappingSet leftMapping = new MappingSet("leftIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +    final MappingSet rightMapping = new MappingSet("rightIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    --- End diff --
    
    Not sure these should be inside the method. Can we still grab hold of the variable for testing or debugging? Might we want to leave this in the class name space?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147229235
  
    --- Diff: common/src/test/java/org/apache/drill/test/SubDirTestWatcher.java ---
    @@ -0,0 +1,108 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.base.Preconditions;
    +import com.google.common.collect.Lists;
    +import org.junit.rules.TestWatcher;
    +import org.junit.runner.Description;
    +
    +import java.io.File;
    +import java.util.List;
    +
    +public class SubDirTestWatcher extends TestWatcher {
    --- End diff --
    
    Done


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145508960
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestExternalSort.java ---
    @@ -138,34 +141,34 @@ public void testNewColumnsManaged() throws Exception {
         testNewColumns(false);
       }
     
    -
       @Test
       public void testNewColumnsLegacy() throws Exception {
         testNewColumns(true);
       }
     
       private void testNewColumns(boolean testLegacy) throws Exception {
         final int record_count = 10000;
    -    String dfs_temp = getDfsTestTmpSchemaLocation();
    -    System.out.println(dfs_temp);
    -    File table_dir = new File(dfs_temp, "newColumns");
    -    table_dir.mkdir();
    -    try (BufferedOutputStream os = new BufferedOutputStream(new FileOutputStream(new File(table_dir, "a.json")))) {
    -      String format = "{ a : %d, b : %d }%n";
    -      for (int i = 0; i <= record_count; i += 2) {
    -        os.write(String.format(format, i, i).getBytes());
    -      }
    +    final String tableDirName = "newColumns";
    +
    +    TableFileBuilder tableA = new TableFileBuilder(Lists.newArrayList("a", "b"), Lists.newArrayList("%d", "%d"));
    --- End diff --
    
    Clever -- but how do we handle types? The original code created JSON of the form:
    
    ```
    { a : 10, b : 20 }
    ```
    
    Aside from the fact that the labels are not true JSON (not quoted), the mechanism does not work for strings, which need to be quoted. The mechanism here assumes numbers. But, of course, if we assume numbers, we don't need the second argument, the "%d".
    
    If you look in the `ClusterFixture` code, you'll see a method `ClusterFixture.stringify()` that converts an Object to a SQL-compatible string. We could create a similar one for JSON.
    
    But, if we take a step back, we are creating JSON. Perfectly fine JSON builder classes exist that can be used to build up type-aware JSON and render the result as a string. The one limitation is that these classes are for single documents; they can't handle the non-standard list-of-objects format that Drill users. Still, we can use the per-object classes to build up the list of objects.
    
    Yet another solution is to use the `RowSet` classes which are type aware. Yes, they build value vectors, but that is not important here. What is important is that they use the full schema power of Drill: including data type, repeated, nullable, etc. Then, we just create a RowSet-to-JSON converter. In fact, I may have something like that in the "Jig" project I did earlier; I'll rummage around and see if I can find it. 


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144945872
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    --- End diff --
    
    TopN has a priority queue. So does the sort. Are there others? Should we have a single, shared implementation that we can optimize and test once, but benefit from multiple times? Or, is the TopN version different enough from the sort version that we need two versions? If we need two, should improvements here (especially for code gen) be applied to the sort version?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144132366
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
    @@ -90,12 +89,11 @@
       private String generatedCode;
       private String generifiedCode;
     
    -  CodeGenerator(TemplateClassDefinition<T> definition, FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
    -    this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, optionManager);
    +  CodeGenerator(TemplateClassDefinition<T> definition, OptionSet optionManager) {
    +    this(ClassGenerator.getDefaultMapping(), definition, optionManager);
    --- End diff --
    
    Does the code actually work without this item? Isn't the registry where we store UDF functions? And, doesn't the expression materializer make use of this info to copy UDFs inline into the generated code?
    
    If we can remove this, that is a big convenience. But, I'm skeptical that it is, in fact, not needed.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145292655
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -247,7 +248,7 @@ drill.exec: {
         }
       },
       sort: {
    -    purge.threshold : 10,
    +    purge.threshold : 1000,
    --- End diff --
    
    Are you reverting your own change here?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner-mapr <gi...@git.apache.org>.
Github user ilooner-mapr commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145571158
  
    --- Diff: common/src/test/java/org/apache/drill/test/DirTestWatcher.java ---
    @@ -32,23 +32,50 @@
     public class DirTestWatcher extends TestWatcher {
       private String dirPath;
       private File dir;
    +  private boolean deleteDirAtEnd = true;
    --- End diff --
    
    Done


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147262655
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ExampleTest.java ---
    @@ -59,6 +59,9 @@
     @Ignore
     public class ExampleTest {
     
    +  @Rule
    +  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
    --- End diff --
    
    I added java doc, and added example of how it's used in secondTest


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r148393893
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.junit.Assert;
    +
    +import java.io.UnsupportedEncodingException;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class BatchUtils {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
    +
    +  public static Map<String, List<Object>> containerToObjects(VectorContainer vectorContainer) {
    --- End diff --
    
    Removed BatchUtils


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144946834
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java ---
    @@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
         logger.debug("Took {} us to purge", watch.elapsed(TimeUnit.MICROSECONDS));
       }
     
    -  public PriorityQueue createNewPriorityQueue(FragmentContext context, List<Ordering> orderings,
    -                                                     VectorAccessible batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
    -          throws ClassTransformationException, IOException, SchemaChangeException{
    -    CodeGenerator<PriorityQueue> cg = CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions());
    +  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int limit)
    +    throws SchemaChangeException, ClassTransformationException, IOException {
    +    return createNewPriorityQueue(
    +      mainMapping, leftMapping, rightMapping, context.getOptionSet(), context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
    +      config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, oContext.getAllocator(), schema.getSelectionVectorMode());
    +  }
    +
    +  public static MappingSet createMainMappingSet() {
    +    return new MappingSet((String) null, null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static MappingSet createLeftMappingSet() {
    +    return new MappingSet("leftIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static MappingSet createRightMappingSet() {
    +    return new MappingSet("rightIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static PriorityQueue createNewPriorityQueue(
    +    MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping,
    --- End diff --
    
    Do we ever use more than one mapping set? If not, can we just refer to the constants inside this function?
    
    I see that a simpler form exists. Just curious why we also need the complex form. For better testing?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144195098
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java ---
    @@ -95,7 +91,9 @@ public TopNBatch(TopN popConfig, FragmentContext context, RecordBatch incoming)
         super(popConfig, context);
         this.incoming = incoming;
         this.config = popConfig;
    -    batchPurgeThreshold = context.getConfig().getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
    +    DrillConfig drillConfig = context.getConfig();
    +    batchPurgeThreshold = drillConfig.getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
    +    codegenDump = drillConfig.getBoolean(CodeCompiler.ENABLE_SAVE_CODE_FOR_DEBUG);
    --- End diff --
    
    I will rename the option to drill.debug.codegen.TopN


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/984


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145515947
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
    @@ -0,0 +1,184 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.base.Charsets;
    +import org.apache.commons.io.FileUtils;
    +import org.apache.drill.common.util.TestTools;
    +import org.apache.drill.exec.util.TestUtilities;
    +import org.junit.runner.Description;
    +
    +import java.io.File;
    +import java.io.IOException;
    +
    +public class BaseDirTestWatcher extends DirTestWatcher {
    +  public enum DirType {
    +    ROOT,
    +    TEST_TMP;
    +  }
    +
    +  private File tmpDir;
    +  private File storeDir;
    +  private File dfsTestTmpParentDir;
    +  private File dfsTestTmpDir;
    +  private File rootDir;
    +
    +  public BaseDirTestWatcher() {
    +    super();
    +  }
    +
    +  @Override
    +  protected void starting(Description description) {
    +    super.starting(description);
    +
    +    rootDir = makeSubDir("root");
    +    tmpDir = new File(rootDir, "tmp");
    +    storeDir = new File(rootDir, "store");
    +    dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
    +
    +    tmpDir.mkdirs();
    +    storeDir.mkdirs();
    +    dfsTestTmpParentDir.mkdirs();
    +  }
    +
    +  public File getTmpDir() {
    +    return tmpDir;
    +  }
    +
    +  public File getStoreDir() {
    +    return storeDir;
    +  }
    +
    +  public File getDfsTestTmpParentDir() {
    +    return dfsTestTmpParentDir;
    +  }
    +
    +  public File getDfsTestTmpDir() {
    +    return dfsTestTmpDir;
    +  }
    +
    +  public String getDfsTestTmpDirPath() {
    +    return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
    +  }
    +
    +  public File getRootDir() {
    +    return rootDir;
    +  }
    +
    +  public String getRootDirPath() {
    +    return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
    +  }
    +
    +  public void newDfsTestTmpDir() {
    +    dfsTestTmpDir = TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
    +  }
    +
    +  private File getDir(DirType type) {
    +    switch (type) {
    +      case ROOT:
    +        return rootDir;
    +      case TEST_TMP:
    +        return dfsTestTmpDir;
    +      default:
    +        throw new IllegalArgumentException(String.format("Unsupported type %s", type));
    +    }
    +  }
    +
    +  public File makeRootSubDir(String relPath) {
    --- End diff --
    
    Here is where we might have a bit of a discussion. Should we represent directory paths as:
    
    * Strings (as done here)
    * Java NIO `Path` objects (which Java seems to want to be the new standard)
    * Hadoop `Path` objects (since that is what the Drill file system uses)
    * Good old fashioned Java `File` objects (which, while old, works well in common cases)
    
    I would suggest we use `File` for references to file system paths:
    
    * The Java `Path` has the same name as the Hadoop `Path`, causing endless confusion.
    * Hadoop `Path` is overkill for simple uses in tests.
    * `String` encourages people to write their own code to join paths by concatenating.
    
    The use of `File` keeps things simple and standard. Concatenation is simple.
    
    Other solutions are, of course, possible. We could use `String` and replicate the methods on `File` or either of the `Path` classes, say. The point is, let's pick a standard and use it everywhere.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147009092
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TopN/TopNBatchTest.java ---
    @@ -0,0 +1,179 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl.TopN;
    +
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Properties;
    +import java.util.Random;
    +
    +import com.google.common.collect.Lists;
    +import org.apache.drill.test.TestBuilder;
    +import org.apache.drill.categories.OperatorTest;
    +import org.apache.drill.common.config.DrillConfig;
    +import org.apache.drill.common.expression.FieldReference;
    +import org.apache.drill.common.logical.data.Order;
    +import org.apache.drill.common.types.TypeProtos;
    +import org.apache.drill.common.types.Types;
    +import org.apache.drill.exec.compile.ClassBuilder;
    +import org.apache.drill.exec.compile.CodeCompiler;
    +import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
    +import org.apache.drill.exec.memory.RootAllocator;
    +import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
    +import org.apache.drill.exec.pop.PopUnitTestBase;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.ExpandableHyperContainer;
    +import org.apache.drill.exec.record.MaterializedField;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.server.options.OptionSet;
    +import org.apache.drill.test.ClientFixture;
    +import org.apache.drill.test.ClusterFixture;
    +import org.apache.drill.test.FixtureBuilder;
    +import org.apache.drill.test.OperatorFixture;
    +import org.apache.drill.test.BatchUtils;
    +import org.apache.drill.test.DirTestWatcher;
    +import org.apache.drill.test.rowSet.RowSetBuilder;
    +import org.junit.Rule;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +
    +@Category(OperatorTest.class)
    +public class TopNBatchTest extends PopUnitTestBase {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TopNBatchTest.class);
    +
    +  // Allows you to look at generated code after tests execute
    +  @Rule
    +  public DirTestWatcher dirTestWatcher = new DirTestWatcher(false);
    +
    +  /**
    +   * Priority queue unit test.
    +   * @throws Exception
    +   */
    +  @Test
    +  public void priorityQueueOrderingTest() throws Exception {
    +    Properties properties = new Properties();
    +    properties.setProperty(ClassBuilder.CODE_DIR_OPTION, dirTestWatcher.getDirPath());
    +
    +    DrillConfig drillConfig = DrillConfig.create(properties);
    +    OptionSet optionSet = new OperatorFixture.TestOptionSet();
    +
    +    FieldReference expr = FieldReference.getWithQuotedRef("colA");
    +    Order.Ordering ordering = new Order.Ordering(Order.Ordering.ORDER_DESC, expr, Order.Ordering.NULLS_FIRST);
    +    List<Order.Ordering> orderings = Lists.newArrayList(ordering);
    +
    +    MaterializedField colA = MaterializedField.create("colA", Types.required(TypeProtos.MinorType.INT));
    +    MaterializedField colB = MaterializedField.create("colB", Types.required(TypeProtos.MinorType.INT));
    +
    +    List<MaterializedField> cols = Lists.newArrayList(colA, colB);
    +    BatchSchema batchSchema = new BatchSchema(BatchSchema.SelectionVectorMode.NONE, cols);
    +
    +    try (RootAllocator allocator = new RootAllocator(100_000_000)) {
    +      VectorContainer expectedVectors = new RowSetBuilder(allocator, batchSchema)
    +        .add(110, 10)
    +        .add(109, 9)
    +        .add(108, 8)
    +        .add(107, 7)
    +        .add(106, 6)
    +        .add(105, 5)
    +        .add(104, 4)
    +        .add(103, 3)
    +        .add(102, 2)
    +        .add(101, 1)
    +        .build()
    +        .container();
    +
    +      Map<String, List<Object>> expectedTable = BatchUtils.containerToObjects(expectedVectors);
    +      expectedVectors.clear();
    +
    +      PriorityQueue queue;
    +      ExpandableHyperContainer hyperContainer;
    +
    +      {
    +        VectorContainer container = new RowSetBuilder(allocator, batchSchema)
    +          .build()
    +          .container();
    +        hyperContainer = new ExpandableHyperContainer(container);
    +
    +        queue = TopNBatch.createNewPriorityQueue(
    --- End diff --
    
    Agreed. Sounds good.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r148394268
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestExternalSort.java ---
    @@ -138,34 +141,34 @@ public void testNewColumnsManaged() throws Exception {
         testNewColumns(false);
       }
     
    -
       @Test
       public void testNewColumnsLegacy() throws Exception {
         testNewColumns(true);
       }
     
       private void testNewColumns(boolean testLegacy) throws Exception {
         final int record_count = 10000;
    -    String dfs_temp = getDfsTestTmpSchemaLocation();
    -    System.out.println(dfs_temp);
    -    File table_dir = new File(dfs_temp, "newColumns");
    -    table_dir.mkdir();
    -    try (BufferedOutputStream os = new BufferedOutputStream(new FileOutputStream(new File(table_dir, "a.json")))) {
    -      String format = "{ a : %d, b : %d }%n";
    -      for (int i = 0; i <= record_count; i += 2) {
    -        os.write(String.format(format, i, i).getBytes());
    -      }
    +    final String tableDirName = "newColumns";
    +
    +    TableFileBuilder tableA = new TableFileBuilder(Lists.newArrayList("a", "b"), Lists.newArrayList("%d", "%d"));
    --- End diff --
    
    I've made the JsonFileBuilder, which takes a RowSet and writes it out to a file as json.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner-mapr <gi...@git.apache.org>.
Github user ilooner-mapr commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145572262
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    --- End diff --
    
    I think the priority queue is generic and not specific to TopN. I agree we should make an effort to consolidate the code generated data structures.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145294835
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestCTASPartitionFilter.java ---
    @@ -59,48 +58,48 @@ public void withDistribution() throws Exception {
         test("alter session set `planner.slice_target` = 1");
         test("alter session set `store.partition.hash_distribute` = true");
         test("use dfs_test.tmp");
    -    test(String.format("create table orders_distribution partition by (o_orderpriority) as select * from dfs_test.`%s/multilevel/parquet`", TEST_RES_PATH));
    +    test("create table orders_distribution partition by (o_orderpriority) as select * from dfs_test.`/multilevel/parquet`");
         String query = "select * from orders_distribution where o_orderpriority = '1-URGENT'";
    -    testExcludeFilter(query, 1, "Filter", 24);
    +    testExcludeFilter(query, 1, "Filter\\(", 24);
    --- End diff --
    
    Why the extra characters?


---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/984
  
    @paul-rogers


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r148157170
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
    @@ -0,0 +1,280 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.record.selection.SelectionVector4;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.junit.Assert;
    +
    +import java.io.UnsupportedEncodingException;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class BatchUtils {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
    +
    +  public static Map<String, List<Object>> containerToObjects(VectorContainer vectorContainer) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int recordCount = vectorContainer.getRecordCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      ValueVector.Accessor valueVectorAccessor = vectorContainer
    +        .getValueVector(columnIndex)
    +        .getValueVector()
    +        .getAccessor();
    +
    +      for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
    +        data.add(valueVectorAccessor.getObject(recordIndex));
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static Map<String, List<Object>> hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, SelectionVector4 selectionVector4) {
    +    Map<String, List<Object>> rows = Maps.newHashMap();
    +    int numCols = vectorContainer.getNumberOfColumns();
    +    int numIndices = selectionVector4.getCount();
    +
    +    for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
    +      String columnName = vectorContainer.getSchema().getColumn(columnIndex).getName();
    +      List<Object> data = Lists.newArrayList();
    +
    +      VectorWrapper vectorWrapper = vectorContainer.getValueVector(columnIndex);
    +
    +      for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
    +        int sv4Index = selectionVector4.get(indexIndex);
    +        int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
    +        int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
    +
    +        ValueVector valueVector = vectorWrapper.getValueVectors()[batchIndex];
    +        Object columnValue = valueVector.getAccessor().getObject(recordIndex);
    +        data.add(columnValue);
    +      }
    +
    +      rows.put(columnName, data);
    +    }
    +
    +    return rows;
    +  }
    +
    +  public static String toString(Map<String, List<Object>> table) {
    +    if (table.isEmpty()) {
    +      return "[ empty table ]";
    +    }
    +
    +    List<String> columnNames = Lists.newArrayList(table.keySet());
    +    Collections.sort(columnNames);
    +    int numRecords = table.get(columnNames.get(0)).size();
    +
    +    StringBuilder sb = new StringBuilder();
    +
    +    {
    +      sb.append("[ ");
    +      String separator = "";
    +
    +      for (String columnName : columnNames) {
    +        sb.append(separator);
    +        separator = ", ";
    +        sb.append(columnName);
    +      }
    +
    +      sb.append(" ]\n");
    +    }
    +
    +    for (int recordIndex = 0; recordIndex < numRecords; recordIndex++) {
    +      sb.append("{");
    +      String separator = "";
    +
    +      for (String columnName : columnNames) {
    +        sb.append(separator);
    +        separator = ", ";
    +        sb.append(table.get(columnName).get(recordIndex));
    +      }
    +
    +      sb.append("}\n");
    +    }
    +
    +    return sb.toString();
    +  }
    +
    +  public static void assertEqual(Map<String, List<Object>> expected, Map<String, List<Object>> actual) {
    --- End diff --
    
    Done


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145575975
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
    @@ -21,29 +21,25 @@
     
     import org.apache.drill.categories.OperatorTest;
     import org.apache.drill.common.util.TestTools;
    +import org.apache.drill.test.BaseTestQuery;
     import org.junit.Test;
     import org.junit.experimental.categories.Category;
     
     @Category(OperatorTest.class)
    -public class TestAggNullable extends BaseTestQuery{
    +public class TestAggNullable extends BaseTestQuery {
       static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestAggNullable.class);
     
    -  static final String WORKING_PATH = TestTools.getWorkingPath();
    -  static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
    -
       private static void enableAggr(boolean ha, boolean sa) throws Exception {
     
    -    test(String.format("alter session set `planner.enable_hashagg` = %s", ha ? "true":"false"));
    -    test(String.format("alter session set `planner.enable_streamagg` = %s", sa ? "true":"false"));
    +    test("alter session set `planner.enable_hashagg` = %s", ha);
    +    test("alter session set `planner.enable_streamagg` = %s", sa);
    --- End diff --
    
    We can update these after the other PR goes in then.


---

Re: Excessive review comments

Posted by Parth Chandra <pa...@apache.org>.
I would prefer a solution that does not require subscribing to individual
PRs or JIRAs. How would one know what is interesting if one doesn't get
notified?

On Thu, Oct 19, 2017 at 10:13 AM, Paul Rogers <pr...@mapr.com> wrote:

> So, rather than copying all messages to dev, simply ask interested folks
> subscribe to the PRs of interest. (Or, subscribe to the corresponding JIRA
> tickets, since JIRA echos all comments in another spray of e-mail…)
>
> This way, semi-active folks on the dev list can see substantive
> discussions without being bombarded with day-to-day minutia.
>
> Thanks,
>
> - Paul
>
> > On Oct 19, 2017, at 10:07 AM, Parth Chandra <pa...@apache.org> wrote:
> >
> > I generally keep track of dev activity only through the emails sent. I
> > don't mind getting duplicate emails either - it's not too hard to write a
> > filter to take out the stuff you're not interested in.
> >
> > Interestingly, I've never noticed whether 'start a review' sends one
> email
> > or many; mostly because gmail does a nice job of grouping the emails
> > together. I have seen a single email with many comments being sent out so
> > the feature did work as advertised at one point.
> >
> > That said, if there is a way I can stay updated via email (repo watching,
> > jira updates, etc.) then I am fine with turning the feature off.
> >
> >
> >
> > On Thu, Oct 19, 2017 at 9:51 AM, Timothy Farkas <tf...@mapr.com>
> wrote:
> >
> >> +1 for turning off the feature. If someone really needs to be emailed
> with
> >> comment updates they can become a watcher of the repo on Github.
> >>
> >> ________________________________
> >> From: Paul Rogers <pr...@mapr.com>
> >> Sent: Thursday, October 19, 2017 9:43:26 AM
> >> To: dev@drill.apache.org
> >> Subject: Re: Excessive review comments
> >>
> >> Can we simply turn off the feature? I never, ever read the e-mails
> coming
> >> from this source; I always follow the link back to the PR. Can we
> reduce it
> >> to “Hey, just wanted to let you know that a new comment was posted.
> Click
> >> _here_ to read it.”
> >>
> >> The only other solution is to give few review comments; not sure if we
> >> want to go that route...
> >>
> >> - Paul
> >>
> >>> On Oct 19, 2017, at 8:35 AM, Arina Yelchiyeva <
> >> arina.yelchiyeva@gmail.com> wrote:
> >>>
> >>> Agree, I am not sure I saw this feature working as well.
> >>> All it did it was sending all the emails at once, rather in the process
> >> of
> >>> comments emergence.
> >>>
> >>> Kind regards
> >>> Arina
> >>>
> >>> On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org> wrote:
> >>>
> >>>> I don’t know whether anything is broken. I believed that the GitHub
> >> “start
> >>>> a review” feature would cause all review comments to be sent in a
> single
> >>>> email. But now I think of it, I’m not sure I ever saw it working. I
> >> wonder
> >>>> whether Github-ASF integration is at fault.
> >>>>
> >>>> Whatever the reasons for it, 39 emails to dev list is quite a blast.
> >>>> People tend to unsubscribe from lists if the volume is too high.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
> >>>>>
> >>>>> With all due respect, I did start a review. Is something broken?
> >>>>>
> >>>>> - Paul
> >>>>>
> >>>>>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
> >>>>>>
> >>>>>> Github user julianhyde commented on a diff in the pull request:
> >>>>>>
> >>>>>> https://github.com/apache/drill/pull/984#discussion_r145561518
> >>>>>>
> >>>>>> --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/
> >> ClusterFixture.java
> >>>> ---
> >>>>>> @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
> >>>> drillbit, String pluginName,
> >>>>>>    public static final String EXPLAIN_PLAN_TEXT = "text";
> >>>>>>    public static final String EXPLAIN_PLAN_JSON = "json";
> >>>>>>
> >>>>>> -  public static FixtureBuilder builder() {
> >>>>>> -    FixtureBuilder builder = new FixtureBuilder()
> >>>>>> +  public static FixtureBuilder builder(DirTestWatcher
> >>>> dirTestWatcher) {
> >>>>>> --- End diff --
> >>>>>>
> >>>>>> Jeez Paul, please start a review rather than making single review
> >>>> comments. I just got 39 emails from you, and so did everyone else on
> >>>> dev@drill.
> >>>>>>
> >>>>>>
> >>>>>> ---
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Excessive review comments

Posted by Arina Yelchiyeva <ar...@gmail.com>.
Ideally, there should be an option if you want to receive all notifications
from github or not. All these notifications don't really bother me. As
Parth mentioned this is the way to track dev activity.
Subscribing to the pull request maybe not always be the best solution,
since sometimes the comments in pull request is the point of interest and
gets you engaged into the discussion.

Regarding the code review comments, I think we should not point to the code
reviewers how they should write comments until they provide good code
review and raise good questions.
Especially taking into account limited number of volunteers to do the code
review...


Kind regards
Arina

On Thu, Oct 19, 2017 at 9:55 PM, Aman Sinha <am...@apache.org> wrote:

> Going back to one comment by Paul : “The only other solution is to give few
> review comments; not sure if we want to go that route...”
> IMO the individual code review comments should be concise such that the
> main idea is expressed.  This is more palatable for the original developer
> and he/she can act on it.  If there are several comments that are somewhat
> lengthier, it amounts to a design discussion.  These would be better
> combined and could either be (a) added to the JIRA or (b) added to the
> ‘conversation’ section of the PR to minimize notifications.
>
> -Aman
>
> On Thu, Oct 19, 2017 at 10:59 AM, Padma Penumarthy <pp...@mapr.com>
> wrote:
>
> > To subscribe to PRs and JIRAs of interest, we need to know about them
> > first.
> > Is it possible to get notified when a new PR or JIRA is created and not
> for
> > further updates unless you subscribe to them ?
> >
> > Thanks
> > Padma
> >
> >
> > > On Oct 19, 2017, at 10:13 AM, Paul Rogers <pr...@mapr.com> wrote:
> > >
> > > So, rather than copying all messages to dev, simply ask interested
> folks
> > subscribe to the PRs of interest. (Or, subscribe to the corresponding
> JIRA
> > tickets, since JIRA echos all comments in another spray of e-mail…)
> > >
> > > This way, semi-active folks on the dev list can see substantive
> > discussions without being bombarded with day-to-day minutia.
> > >
> > > Thanks,
> > >
> > > - Paul
> > >
> > >> On Oct 19, 2017, at 10:07 AM, Parth Chandra <pa...@apache.org>
> wrote:
> > >>
> > >> I generally keep track of dev activity only through the emails sent. I
> > >> don't mind getting duplicate emails either - it's not too hard to
> write
> > a
> > >> filter to take out the stuff you're not interested in.
> > >>
> > >> Interestingly, I've never noticed whether 'start a review' sends one
> > email
> > >> or many; mostly because gmail does a nice job of grouping the emails
> > >> together. I have seen a single email with many comments being sent out
> > so
> > >> the feature did work as advertised at one point.
> > >>
> > >> That said, if there is a way I can stay updated via email (repo
> > watching,
> > >> jira updates, etc.) then I am fine with turning the feature off.
> > >>
> > >>
> > >>
> > >> On Thu, Oct 19, 2017 at 9:51 AM, Timothy Farkas <tf...@mapr.com>
> > wrote:
> > >>
> > >>> +1 for turning off the feature. If someone really needs to be emailed
> > with
> > >>> comment updates they can become a watcher of the repo on Github.
> > >>>
> > >>> ________________________________
> > >>> From: Paul Rogers <pr...@mapr.com>
> > >>> Sent: Thursday, October 19, 2017 9:43:26 AM
> > >>> To: dev@drill.apache.org
> > >>> Subject: Re: Excessive review comments
> > >>>
> > >>> Can we simply turn off the feature? I never, ever read the e-mails
> > coming
> > >>> from this source; I always follow the link back to the PR. Can we
> > reduce it
> > >>> to “Hey, just wanted to let you know that a new comment was posted.
> > Click
> > >>> _here_ to read it.”
> > >>>
> > >>> The only other solution is to give few review comments; not sure if
> we
> > >>> want to go that route...
> > >>>
> > >>> - Paul
> > >>>
> > >>>> On Oct 19, 2017, at 8:35 AM, Arina Yelchiyeva <
> > >>> arina.yelchiyeva@gmail.com> wrote:
> > >>>>
> > >>>> Agree, I am not sure I saw this feature working as well.
> > >>>> All it did it was sending all the emails at once, rather in the
> > process
> > >>> of
> > >>>> comments emergence.
> > >>>>
> > >>>> Kind regards
> > >>>> Arina
> > >>>>
> > >>>> On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org>
> > wrote:
> > >>>>
> > >>>>> I don’t know whether anything is broken. I believed that the GitHub
> > >>> “start
> > >>>>> a review” feature would cause all review comments to be sent in a
> > single
> > >>>>> email. But now I think of it, I’m not sure I ever saw it working. I
> > >>> wonder
> > >>>>> whether Github-ASF integration is at fault.
> > >>>>>
> > >>>>> Whatever the reasons for it, 39 emails to dev list is quite a
> blast.
> > >>>>> People tend to unsubscribe from lists if the volume is too high.
> > >>>>>
> > >>>>> Julian
> > >>>>>
> > >>>>>
> > >>>>>> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com>
> wrote:
> > >>>>>>
> > >>>>>> With all due respect, I did start a review. Is something broken?
> > >>>>>>
> > >>>>>> - Paul
> > >>>>>>
> > >>>>>>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org>
> > wrote:
> > >>>>>>>
> > >>>>>>> Github user julianhyde commented on a diff in the pull request:
> > >>>>>>>
> > >>>>>>> https://github.com/apache/drill/pull/984#discussion_r145561518
> > >>>>>>>
> > >>>>>>> --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/
> > >>> ClusterFixture.java
> > >>>>> ---
> > >>>>>>> @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
> > >>>>> drillbit, String pluginName,
> > >>>>>>>   public static final String EXPLAIN_PLAN_TEXT = "text";
> > >>>>>>>   public static final String EXPLAIN_PLAN_JSON = "json";
> > >>>>>>>
> > >>>>>>> -  public static FixtureBuilder builder() {
> > >>>>>>> -    FixtureBuilder builder = new FixtureBuilder()
> > >>>>>>> +  public static FixtureBuilder builder(DirTestWatcher
> > >>>>> dirTestWatcher) {
> > >>>>>>> --- End diff --
> > >>>>>>>
> > >>>>>>> Jeez Paul, please start a review rather than making single review
> > >>>>> comments. I just got 39 emails from you, and so did everyone else
> on
> > >>>>> dev@drill.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ---
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >
> >
> >
>

Re: Excessive review comments

Posted by Aman Sinha <am...@apache.org>.
Going back to one comment by Paul : “The only other solution is to give few
review comments; not sure if we want to go that route...”
IMO the individual code review comments should be concise such that the
main idea is expressed.  This is more palatable for the original developer
and he/she can act on it.  If there are several comments that are somewhat
lengthier, it amounts to a design discussion.  These would be better
combined and could either be (a) added to the JIRA or (b) added to the
‘conversation’ section of the PR to minimize notifications.

-Aman

On Thu, Oct 19, 2017 at 10:59 AM, Padma Penumarthy <pp...@mapr.com>
wrote:

> To subscribe to PRs and JIRAs of interest, we need to know about them
> first.
> Is it possible to get notified when a new PR or JIRA is created and not for
> further updates unless you subscribe to them ?
>
> Thanks
> Padma
>
>
> > On Oct 19, 2017, at 10:13 AM, Paul Rogers <pr...@mapr.com> wrote:
> >
> > So, rather than copying all messages to dev, simply ask interested folks
> subscribe to the PRs of interest. (Or, subscribe to the corresponding JIRA
> tickets, since JIRA echos all comments in another spray of e-mail…)
> >
> > This way, semi-active folks on the dev list can see substantive
> discussions without being bombarded with day-to-day minutia.
> >
> > Thanks,
> >
> > - Paul
> >
> >> On Oct 19, 2017, at 10:07 AM, Parth Chandra <pa...@apache.org> wrote:
> >>
> >> I generally keep track of dev activity only through the emails sent. I
> >> don't mind getting duplicate emails either - it's not too hard to write
> a
> >> filter to take out the stuff you're not interested in.
> >>
> >> Interestingly, I've never noticed whether 'start a review' sends one
> email
> >> or many; mostly because gmail does a nice job of grouping the emails
> >> together. I have seen a single email with many comments being sent out
> so
> >> the feature did work as advertised at one point.
> >>
> >> That said, if there is a way I can stay updated via email (repo
> watching,
> >> jira updates, etc.) then I am fine with turning the feature off.
> >>
> >>
> >>
> >> On Thu, Oct 19, 2017 at 9:51 AM, Timothy Farkas <tf...@mapr.com>
> wrote:
> >>
> >>> +1 for turning off the feature. If someone really needs to be emailed
> with
> >>> comment updates they can become a watcher of the repo on Github.
> >>>
> >>> ________________________________
> >>> From: Paul Rogers <pr...@mapr.com>
> >>> Sent: Thursday, October 19, 2017 9:43:26 AM
> >>> To: dev@drill.apache.org
> >>> Subject: Re: Excessive review comments
> >>>
> >>> Can we simply turn off the feature? I never, ever read the e-mails
> coming
> >>> from this source; I always follow the link back to the PR. Can we
> reduce it
> >>> to “Hey, just wanted to let you know that a new comment was posted.
> Click
> >>> _here_ to read it.”
> >>>
> >>> The only other solution is to give few review comments; not sure if we
> >>> want to go that route...
> >>>
> >>> - Paul
> >>>
> >>>> On Oct 19, 2017, at 8:35 AM, Arina Yelchiyeva <
> >>> arina.yelchiyeva@gmail.com> wrote:
> >>>>
> >>>> Agree, I am not sure I saw this feature working as well.
> >>>> All it did it was sending all the emails at once, rather in the
> process
> >>> of
> >>>> comments emergence.
> >>>>
> >>>> Kind regards
> >>>> Arina
> >>>>
> >>>> On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >>>>
> >>>>> I don’t know whether anything is broken. I believed that the GitHub
> >>> “start
> >>>>> a review” feature would cause all review comments to be sent in a
> single
> >>>>> email. But now I think of it, I’m not sure I ever saw it working. I
> >>> wonder
> >>>>> whether Github-ASF integration is at fault.
> >>>>>
> >>>>> Whatever the reasons for it, 39 emails to dev list is quite a blast.
> >>>>> People tend to unsubscribe from lists if the volume is too high.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
> >>>>>>
> >>>>>> With all due respect, I did start a review. Is something broken?
> >>>>>>
> >>>>>> - Paul
> >>>>>>
> >>>>>>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org>
> wrote:
> >>>>>>>
> >>>>>>> Github user julianhyde commented on a diff in the pull request:
> >>>>>>>
> >>>>>>> https://github.com/apache/drill/pull/984#discussion_r145561518
> >>>>>>>
> >>>>>>> --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/
> >>> ClusterFixture.java
> >>>>> ---
> >>>>>>> @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
> >>>>> drillbit, String pluginName,
> >>>>>>>   public static final String EXPLAIN_PLAN_TEXT = "text";
> >>>>>>>   public static final String EXPLAIN_PLAN_JSON = "json";
> >>>>>>>
> >>>>>>> -  public static FixtureBuilder builder() {
> >>>>>>> -    FixtureBuilder builder = new FixtureBuilder()
> >>>>>>> +  public static FixtureBuilder builder(DirTestWatcher
> >>>>> dirTestWatcher) {
> >>>>>>> --- End diff --
> >>>>>>>
> >>>>>>> Jeez Paul, please start a review rather than making single review
> >>>>> comments. I just got 39 emails from you, and so did everyone else on
> >>>>> dev@drill.
> >>>>>>>
> >>>>>>>
> >>>>>>> ---
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >
>
>

Re: Excessive review comments

Posted by Padma Penumarthy <pp...@mapr.com>.
To subscribe to PRs and JIRAs of interest, we need to know about them first.
Is it possible to get notified when a new PR or JIRA is created and not for
further updates unless you subscribe to them ?

Thanks
Padma


> On Oct 19, 2017, at 10:13 AM, Paul Rogers <pr...@mapr.com> wrote:
> 
> So, rather than copying all messages to dev, simply ask interested folks subscribe to the PRs of interest. (Or, subscribe to the corresponding JIRA tickets, since JIRA echos all comments in another spray of e-mail…)
> 
> This way, semi-active folks on the dev list can see substantive discussions without being bombarded with day-to-day minutia.
> 
> Thanks,
> 
> - Paul
> 
>> On Oct 19, 2017, at 10:07 AM, Parth Chandra <pa...@apache.org> wrote:
>> 
>> I generally keep track of dev activity only through the emails sent. I
>> don't mind getting duplicate emails either - it's not too hard to write a
>> filter to take out the stuff you're not interested in.
>> 
>> Interestingly, I've never noticed whether 'start a review' sends one email
>> or many; mostly because gmail does a nice job of grouping the emails
>> together. I have seen a single email with many comments being sent out so
>> the feature did work as advertised at one point.
>> 
>> That said, if there is a way I can stay updated via email (repo watching,
>> jira updates, etc.) then I am fine with turning the feature off.
>> 
>> 
>> 
>> On Thu, Oct 19, 2017 at 9:51 AM, Timothy Farkas <tf...@mapr.com> wrote:
>> 
>>> +1 for turning off the feature. If someone really needs to be emailed with
>>> comment updates they can become a watcher of the repo on Github.
>>> 
>>> ________________________________
>>> From: Paul Rogers <pr...@mapr.com>
>>> Sent: Thursday, October 19, 2017 9:43:26 AM
>>> To: dev@drill.apache.org
>>> Subject: Re: Excessive review comments
>>> 
>>> Can we simply turn off the feature? I never, ever read the e-mails coming
>>> from this source; I always follow the link back to the PR. Can we reduce it
>>> to “Hey, just wanted to let you know that a new comment was posted. Click
>>> _here_ to read it.”
>>> 
>>> The only other solution is to give few review comments; not sure if we
>>> want to go that route...
>>> 
>>> - Paul
>>> 
>>>> On Oct 19, 2017, at 8:35 AM, Arina Yelchiyeva <
>>> arina.yelchiyeva@gmail.com> wrote:
>>>> 
>>>> Agree, I am not sure I saw this feature working as well.
>>>> All it did it was sending all the emails at once, rather in the process
>>> of
>>>> comments emergence.
>>>> 
>>>> Kind regards
>>>> Arina
>>>> 
>>>> On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> I don’t know whether anything is broken. I believed that the GitHub
>>> “start
>>>>> a review” feature would cause all review comments to be sent in a single
>>>>> email. But now I think of it, I’m not sure I ever saw it working. I
>>> wonder
>>>>> whether Github-ASF integration is at fault.
>>>>> 
>>>>> Whatever the reasons for it, 39 emails to dev list is quite a blast.
>>>>> People tend to unsubscribe from lists if the volume is too high.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
>>>>>> 
>>>>>> With all due respect, I did start a review. Is something broken?
>>>>>> 
>>>>>> - Paul
>>>>>> 
>>>>>>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
>>>>>>> 
>>>>>>> Github user julianhyde commented on a diff in the pull request:
>>>>>>> 
>>>>>>> https://github.com/apache/drill/pull/984#discussion_r145561518
>>>>>>> 
>>>>>>> --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/
>>> ClusterFixture.java
>>>>> ---
>>>>>>> @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
>>>>> drillbit, String pluginName,
>>>>>>>   public static final String EXPLAIN_PLAN_TEXT = "text";
>>>>>>>   public static final String EXPLAIN_PLAN_JSON = "json";
>>>>>>> 
>>>>>>> -  public static FixtureBuilder builder() {
>>>>>>> -    FixtureBuilder builder = new FixtureBuilder()
>>>>>>> +  public static FixtureBuilder builder(DirTestWatcher
>>>>> dirTestWatcher) {
>>>>>>> --- End diff --
>>>>>>> 
>>>>>>> Jeez Paul, please start a review rather than making single review
>>>>> comments. I just got 39 emails from you, and so did everyone else on
>>>>> dev@drill.
>>>>>>> 
>>>>>>> 
>>>>>>> ---
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 


Re: Excessive review comments

Posted by Paul Rogers <pr...@mapr.com>.
So, rather than copying all messages to dev, simply ask interested folks subscribe to the PRs of interest. (Or, subscribe to the corresponding JIRA tickets, since JIRA echos all comments in another spray of e-mail…)

This way, semi-active folks on the dev list can see substantive discussions without being bombarded with day-to-day minutia.

Thanks,

- Paul

> On Oct 19, 2017, at 10:07 AM, Parth Chandra <pa...@apache.org> wrote:
> 
> I generally keep track of dev activity only through the emails sent. I
> don't mind getting duplicate emails either - it's not too hard to write a
> filter to take out the stuff you're not interested in.
> 
> Interestingly, I've never noticed whether 'start a review' sends one email
> or many; mostly because gmail does a nice job of grouping the emails
> together. I have seen a single email with many comments being sent out so
> the feature did work as advertised at one point.
> 
> That said, if there is a way I can stay updated via email (repo watching,
> jira updates, etc.) then I am fine with turning the feature off.
> 
> 
> 
> On Thu, Oct 19, 2017 at 9:51 AM, Timothy Farkas <tf...@mapr.com> wrote:
> 
>> +1 for turning off the feature. If someone really needs to be emailed with
>> comment updates they can become a watcher of the repo on Github.
>> 
>> ________________________________
>> From: Paul Rogers <pr...@mapr.com>
>> Sent: Thursday, October 19, 2017 9:43:26 AM
>> To: dev@drill.apache.org
>> Subject: Re: Excessive review comments
>> 
>> Can we simply turn off the feature? I never, ever read the e-mails coming
>> from this source; I always follow the link back to the PR. Can we reduce it
>> to “Hey, just wanted to let you know that a new comment was posted. Click
>> _here_ to read it.”
>> 
>> The only other solution is to give few review comments; not sure if we
>> want to go that route...
>> 
>> - Paul
>> 
>>> On Oct 19, 2017, at 8:35 AM, Arina Yelchiyeva <
>> arina.yelchiyeva@gmail.com> wrote:
>>> 
>>> Agree, I am not sure I saw this feature working as well.
>>> All it did it was sending all the emails at once, rather in the process
>> of
>>> comments emergence.
>>> 
>>> Kind regards
>>> Arina
>>> 
>>> On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> I don’t know whether anything is broken. I believed that the GitHub
>> “start
>>>> a review” feature would cause all review comments to be sent in a single
>>>> email. But now I think of it, I’m not sure I ever saw it working. I
>> wonder
>>>> whether Github-ASF integration is at fault.
>>>> 
>>>> Whatever the reasons for it, 39 emails to dev list is quite a blast.
>>>> People tend to unsubscribe from lists if the volume is too high.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
>>>>> 
>>>>> With all due respect, I did start a review. Is something broken?
>>>>> 
>>>>> - Paul
>>>>> 
>>>>>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
>>>>>> 
>>>>>> Github user julianhyde commented on a diff in the pull request:
>>>>>> 
>>>>>> https://github.com/apache/drill/pull/984#discussion_r145561518
>>>>>> 
>>>>>> --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/
>> ClusterFixture.java
>>>> ---
>>>>>> @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
>>>> drillbit, String pluginName,
>>>>>>    public static final String EXPLAIN_PLAN_TEXT = "text";
>>>>>>    public static final String EXPLAIN_PLAN_JSON = "json";
>>>>>> 
>>>>>> -  public static FixtureBuilder builder() {
>>>>>> -    FixtureBuilder builder = new FixtureBuilder()
>>>>>> +  public static FixtureBuilder builder(DirTestWatcher
>>>> dirTestWatcher) {
>>>>>> --- End diff --
>>>>>> 
>>>>>> Jeez Paul, please start a review rather than making single review
>>>> comments. I just got 39 emails from you, and so did everyone else on
>>>> dev@drill.
>>>>>> 
>>>>>> 
>>>>>> ---
>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Excessive review comments

Posted by Parth Chandra <pa...@apache.org>.
I generally keep track of dev activity only through the emails sent. I
don't mind getting duplicate emails either - it's not too hard to write a
filter to take out the stuff you're not interested in.

Interestingly, I've never noticed whether 'start a review' sends one email
or many; mostly because gmail does a nice job of grouping the emails
together. I have seen a single email with many comments being sent out so
the feature did work as advertised at one point.

That said, if there is a way I can stay updated via email (repo watching,
jira updates, etc.) then I am fine with turning the feature off.



On Thu, Oct 19, 2017 at 9:51 AM, Timothy Farkas <tf...@mapr.com> wrote:

> +1 for turning off the feature. If someone really needs to be emailed with
> comment updates they can become a watcher of the repo on Github.
>
> ________________________________
> From: Paul Rogers <pr...@mapr.com>
> Sent: Thursday, October 19, 2017 9:43:26 AM
> To: dev@drill.apache.org
> Subject: Re: Excessive review comments
>
> Can we simply turn off the feature? I never, ever read the e-mails coming
> from this source; I always follow the link back to the PR. Can we reduce it
> to “Hey, just wanted to let you know that a new comment was posted. Click
> _here_ to read it.”
>
> The only other solution is to give few review comments; not sure if we
> want to go that route...
>
> - Paul
>
> > On Oct 19, 2017, at 8:35 AM, Arina Yelchiyeva <
> arina.yelchiyeva@gmail.com> wrote:
> >
> > Agree, I am not sure I saw this feature working as well.
> > All it did it was sending all the emails at once, rather in the process
> of
> > comments emergence.
> >
> > Kind regards
> > Arina
> >
> > On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> I don’t know whether anything is broken. I believed that the GitHub
> “start
> >> a review” feature would cause all review comments to be sent in a single
> >> email. But now I think of it, I’m not sure I ever saw it working. I
> wonder
> >> whether Github-ASF integration is at fault.
> >>
> >> Whatever the reasons for it, 39 emails to dev list is quite a blast.
> >> People tend to unsubscribe from lists if the volume is too high.
> >>
> >> Julian
> >>
> >>
> >>> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
> >>>
> >>> With all due respect, I did start a review. Is something broken?
> >>>
> >>> - Paul
> >>>
> >>>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
> >>>>
> >>>> Github user julianhyde commented on a diff in the pull request:
> >>>>
> >>>>  https://github.com/apache/drill/pull/984#discussion_r145561518
> >>>>
> >>>>  --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/
> ClusterFixture.java
> >> ---
> >>>>  @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
> >> drillbit, String pluginName,
> >>>>     public static final String EXPLAIN_PLAN_TEXT = "text";
> >>>>     public static final String EXPLAIN_PLAN_JSON = "json";
> >>>>
> >>>>  -  public static FixtureBuilder builder() {
> >>>>  -    FixtureBuilder builder = new FixtureBuilder()
> >>>>  +  public static FixtureBuilder builder(DirTestWatcher
> >> dirTestWatcher) {
> >>>>  --- End diff --
> >>>>
> >>>>  Jeez Paul, please start a review rather than making single review
> >> comments. I just got 39 emails from you, and so did everyone else on
> >> dev@drill.
> >>>>
> >>>>
> >>>> ---
> >>>
> >>
> >>
>
>

Re: Excessive review comments

Posted by Timothy Farkas <tf...@mapr.com>.
+1 for turning off the feature. If someone really needs to be emailed with comment updates they can become a watcher of the repo on Github.

________________________________
From: Paul Rogers <pr...@mapr.com>
Sent: Thursday, October 19, 2017 9:43:26 AM
To: dev@drill.apache.org
Subject: Re: Excessive review comments

Can we simply turn off the feature? I never, ever read the e-mails coming from this source; I always follow the link back to the PR. Can we reduce it to “Hey, just wanted to let you know that a new comment was posted. Click _here_ to read it.”

The only other solution is to give few review comments; not sure if we want to go that route...

- Paul

> On Oct 19, 2017, at 8:35 AM, Arina Yelchiyeva <ar...@gmail.com> wrote:
>
> Agree, I am not sure I saw this feature working as well.
> All it did it was sending all the emails at once, rather in the process of
> comments emergence.
>
> Kind regards
> Arina
>
> On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> I don’t know whether anything is broken. I believed that the GitHub “start
>> a review” feature would cause all review comments to be sent in a single
>> email. But now I think of it, I’m not sure I ever saw it working. I wonder
>> whether Github-ASF integration is at fault.
>>
>> Whatever the reasons for it, 39 emails to dev list is quite a blast.
>> People tend to unsubscribe from lists if the volume is too high.
>>
>> Julian
>>
>>
>>> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
>>>
>>> With all due respect, I did start a review. Is something broken?
>>>
>>> - Paul
>>>
>>>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
>>>>
>>>> Github user julianhyde commented on a diff in the pull request:
>>>>
>>>>  https://github.com/apache/drill/pull/984#discussion_r145561518
>>>>
>>>>  --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java
>> ---
>>>>  @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
>> drillbit, String pluginName,
>>>>     public static final String EXPLAIN_PLAN_TEXT = "text";
>>>>     public static final String EXPLAIN_PLAN_JSON = "json";
>>>>
>>>>  -  public static FixtureBuilder builder() {
>>>>  -    FixtureBuilder builder = new FixtureBuilder()
>>>>  +  public static FixtureBuilder builder(DirTestWatcher
>> dirTestWatcher) {
>>>>  --- End diff --
>>>>
>>>>  Jeez Paul, please start a review rather than making single review
>> comments. I just got 39 emails from you, and so did everyone else on
>> dev@drill.
>>>>
>>>>
>>>> ---
>>>
>>
>>


Re: Excessive review comments

Posted by Paul Rogers <pr...@mapr.com>.
Can we simply turn off the feature? I never, ever read the e-mails coming from this source; I always follow the link back to the PR. Can we reduce it to “Hey, just wanted to let you know that a new comment was posted. Click _here_ to read it.”

The only other solution is to give few review comments; not sure if we want to go that route...

- Paul

> On Oct 19, 2017, at 8:35 AM, Arina Yelchiyeva <ar...@gmail.com> wrote:
> 
> Agree, I am not sure I saw this feature working as well.
> All it did it was sending all the emails at once, rather in the process of
> comments emergence.
> 
> Kind regards
> Arina
> 
> On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> I don’t know whether anything is broken. I believed that the GitHub “start
>> a review” feature would cause all review comments to be sent in a single
>> email. But now I think of it, I’m not sure I ever saw it working. I wonder
>> whether Github-ASF integration is at fault.
>> 
>> Whatever the reasons for it, 39 emails to dev list is quite a blast.
>> People tend to unsubscribe from lists if the volume is too high.
>> 
>> Julian
>> 
>> 
>>> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
>>> 
>>> With all due respect, I did start a review. Is something broken?
>>> 
>>> - Paul
>>> 
>>>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
>>>> 
>>>> Github user julianhyde commented on a diff in the pull request:
>>>> 
>>>>  https://github.com/apache/drill/pull/984#discussion_r145561518
>>>> 
>>>>  --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java
>> ---
>>>>  @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
>> drillbit, String pluginName,
>>>>     public static final String EXPLAIN_PLAN_TEXT = "text";
>>>>     public static final String EXPLAIN_PLAN_JSON = "json";
>>>> 
>>>>  -  public static FixtureBuilder builder() {
>>>>  -    FixtureBuilder builder = new FixtureBuilder()
>>>>  +  public static FixtureBuilder builder(DirTestWatcher
>> dirTestWatcher) {
>>>>  --- End diff --
>>>> 
>>>>  Jeez Paul, please start a review rather than making single review
>> comments. I just got 39 emails from you, and so did everyone else on
>> dev@drill.
>>>> 
>>>> 
>>>> ---
>>> 
>> 
>> 


Re: Excessive review comments

Posted by Arina Yelchiyeva <ar...@gmail.com>.
Agree, I am not sure I saw this feature working as well.
All it did it was sending all the emails at once, rather in the process of
comments emergence.

Kind regards
Arina

On Thu, Oct 19, 2017 at 6:27 PM, Julian Hyde <jh...@apache.org> wrote:

> I don’t know whether anything is broken. I believed that the GitHub “start
> a review” feature would cause all review comments to be sent in a single
> email. But now I think of it, I’m not sure I ever saw it working. I wonder
> whether Github-ASF integration is at fault.
>
> Whatever the reasons for it, 39 emails to dev list is quite a blast.
> People tend to unsubscribe from lists if the volume is too high.
>
> Julian
>
>
> > On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
> >
> > With all due respect, I did start a review. Is something broken?
> >
> > - Paul
> >
> >> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
> >>
> >> Github user julianhyde commented on a diff in the pull request:
> >>
> >>   https://github.com/apache/drill/pull/984#discussion_r145561518
> >>
> >>   --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java
> ---
> >>   @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit
> drillbit, String pluginName,
> >>      public static final String EXPLAIN_PLAN_TEXT = "text";
> >>      public static final String EXPLAIN_PLAN_JSON = "json";
> >>
> >>   -  public static FixtureBuilder builder() {
> >>   -    FixtureBuilder builder = new FixtureBuilder()
> >>   +  public static FixtureBuilder builder(DirTestWatcher
> dirTestWatcher) {
> >>   --- End diff --
> >>
> >>   Jeez Paul, please start a review rather than making single review
> comments. I just got 39 emails from you, and so did everyone else on
> dev@drill.
> >>
> >>
> >> ---
> >
>
>

Re: Excessive review comments

Posted by Julian Hyde <jh...@apache.org>.
I don’t know whether anything is broken. I believed that the GitHub “start a review” feature would cause all review comments to be sent in a single email. But now I think of it, I’m not sure I ever saw it working. I wonder whether Github-ASF integration is at fault.

Whatever the reasons for it, 39 emails to dev list is quite a blast. People tend to unsubscribe from lists if the volume is too high.

Julian


> On Oct 18, 2017, at 5:54 PM, Paul Rogers <pr...@mapr.com> wrote:
> 
> With all due respect, I did start a review. Is something broken?
> 
> - Paul
> 
>> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
>> 
>> Github user julianhyde commented on a diff in the pull request:
>> 
>>   https://github.com/apache/drill/pull/984#discussion_r145561518
>> 
>>   --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
>>   @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit drillbit, String pluginName,
>>      public static final String EXPLAIN_PLAN_TEXT = "text";
>>      public static final String EXPLAIN_PLAN_JSON = "json";
>> 
>>   -  public static FixtureBuilder builder() {
>>   -    FixtureBuilder builder = new FixtureBuilder()
>>   +  public static FixtureBuilder builder(DirTestWatcher dirTestWatcher) {
>>   --- End diff --
>> 
>>   Jeez Paul, please start a review rather than making single review comments. I just got 39 emails from you, and so did everyone else on dev@drill.
>> 
>> 
>> ---
> 


Re: Excessive review comments

Posted by Paul Rogers <pr...@mapr.com>.
With all due respect, I did start a review. Is something broken?

- Paul

> On Oct 18, 2017, at 3:36 PM, julianhyde <gi...@git.apache.org> wrote:
> 
> Github user julianhyde commented on a diff in the pull request:
> 
>    https://github.com/apache/drill/pull/984#discussion_r145561518
> 
>    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
>    @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit drillbit, String pluginName,
>       public static final String EXPLAIN_PLAN_TEXT = "text";
>       public static final String EXPLAIN_PLAN_JSON = "json";
> 
>    -  public static FixtureBuilder builder() {
>    -    FixtureBuilder builder = new FixtureBuilder()
>    +  public static FixtureBuilder builder(DirTestWatcher dirTestWatcher) {
>    --- End diff --
> 
>    Jeez Paul, please start a review rather than making single review comments. I just got 39 emails from you, and so did everyone else on dev@drill.
> 
> 
> ---


[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by julianhyde <gi...@git.apache.org>.
Github user julianhyde commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145561518
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
    @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit drillbit, String pluginName,
       public static final String EXPLAIN_PLAN_TEXT = "text";
       public static final String EXPLAIN_PLAN_JSON = "json";
     
    -  public static FixtureBuilder builder() {
    -    FixtureBuilder builder = new FixtureBuilder()
    +  public static FixtureBuilder builder(DirTestWatcher dirTestWatcher) {
    --- End diff --
    
    Jeez Paul, please start a review rather than making single review comments. I just got 39 emails from you, and so did everyone else on dev@drill.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145575781
  
    --- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
    @@ -247,7 +248,7 @@ drill.exec: {
         }
       },
       sort: {
    -    purge.threshold : 10,
    +    purge.threshold : 1000,
    --- End diff --
    
    Oh my. Thanks for catching that!


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144142548
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    +   * The elements in the given batch are added to the priority queue. Note that the priority queue
    +   * only retains the top elements that fit within the size specified by the {@link #init(int, BufferAllocator, boolean)}
    +   * method.
    +   * @param batch The batch containing elements we want to add.
    +   * @throws SchemaChangeException
    +   */
    +  void add(RecordBatchData batch) throws SchemaChangeException;
     
    +  /**
    +   * Initializes the priority queue. This method must be called before any other methods on the priority
    +   * queue are called.
    +   * @param limit The size of the priority queue.
    +   * @param allocator The {@link BufferAllocator} to use when creating the priority queue.
    +   * @param hasSv2 True when incoming batches have v2 selection vectors. False otherwise.
    +   * @throws SchemaChangeException
    +   */
    +  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    +
    +  /**
    +   * This method must be called before fetching the final heap hyper batch and final Sv4 vector.
    --- End diff --
    
    "heap hyper batch" seems an oxymoron. All batches, including hyper-batches, exist in direct memory. What is meant by "heap"?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144131909
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final OptionSet optionManager)
       public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + ".disable_cache";
     
       /**
    +   * Enables saving generated code for debugging
    +   */
    +  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + ".codegen.dump";
    --- End diff --
    
    Also, how does a static string enable code generation? Isn't this just the destination folder once saving code is enabled?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145292505
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java ---
    @@ -116,6 +116,14 @@ public void clear() {
         }
       }
     
    +  public static int getBatchIndex(int sv4Index) {
    +    return (sv4Index >> 16) & 0xFFFF;
    +  }
    +
    +  public static int getRecordIndex(int sv4Index) {
    +    return (sv4Index) & 0xFFFF;
    +  }
    --- End diff --
    
    I see you gave into the temptation to wrap these magic expressions in methods rather than repeating them over and over in generated code. The worry probably was that the overhead of a method call per value would be expensive. Did you do a bit of performance testing to demonstrate that Java inlines these methods so we get the same performance with this code as the original?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r150096261
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
    @@ -85,8 +85,7 @@
        * new row set with the updated columns, then merge the new
        * and old row sets to create a new immutable row set.
        */
    -
    -  public interface RowSetWriter extends TupleWriter {
    +  interface RowSetWriter extends TupleWriter {
    --- End diff --
    
    IntelliJ gave a warning that the modifier is redundant. Also an interface nested inside another interface is public by default.
    
    https://beginnersbook.com/2016/03/nested-or-inner-interfaces-in-java/


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145519174
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
    @@ -213,7 +204,7 @@ private void configureZk(FixtureBuilder builder) {
         }
       }
     
    -  private void createConfig(FixtureBuilder builder) throws Exception {
    +  private void createConfig() throws Exception {
    --- End diff --
    
    There are actually three ZK cases:
    
    * Does not use ZK at all.
    * Uses ZK, but it is local (mini ZK started within test JDK)
    * Uses ZK, but ZK is remote. (Seldom used in tests, needed for an embedded Drillbit.)
    
    Are these semantics preserved with the present changes?


---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/984
  
    @paul-rogers This is ready for another round of review. I have also responded to / addressed your comments.



---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147256621
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSimpleExternalSort.java ---
    @@ -147,7 +152,7 @@ public void sortOneKeyDescendingExternalSortLegacy() throws Throwable {
       }
     
       private void sortOneKeyDescendingExternalSort(boolean testLegacy) throws Throwable {
    -    FixtureBuilder builder = ClusterFixture.builder()
    +    FixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
    --- End diff --
    
    Is there a case where the tmp directories are optional? The Drillbit will always have to be configured with a <b>drill.tmp-dir</b> at minimum, so I think it should be required to use a BaseDirTestWatcher with a ClusterFixture. I could move passing the dirTestWatcher to the build method instead of the constructor. Let me know what you think.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144943882
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    +   * The elements in the given batch are added to the priority queue. Note that the priority queue
    +   * only retains the top elements that fit within the size specified by the {@link #init(int, BufferAllocator, boolean)}
    +   * method.
    +   * @param batch The batch containing elements we want to add.
    +   * @throws SchemaChangeException
    +   */
    +  void add(RecordBatchData batch) throws SchemaChangeException;
     
    +  /**
    +   * Initializes the priority queue. This method must be called before any other methods on the priority
    +   * queue are called.
    +   * @param limit The size of the priority queue.
    +   * @param allocator The {@link BufferAllocator} to use when creating the priority queue.
    +   * @param hasSv2 True when incoming batches have v2 selection vectors. False otherwise.
    --- End diff --
    
    Actutally, the name "sv2" refers to a 2 -byte Selection vector. Welcome to Drill's cryptic naming...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145520129
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
    @@ -226,43 +217,16 @@ private void createConfig(FixtureBuilder builder) throws Exception {
     
           serviceSet = null;
           usesZk = true;
    -      isLocal = false;
         } else {
           // Embedded Drillbit.
     
           serviceSet = RemoteServiceSet.getLocalServiceSet();
    -      isLocal = true;
         }
       }
     
    -  private void startDrillbits(FixtureBuilder builder) throws Exception {
    -//    // Ensure that Drill uses the log directory determined here rather than
    -//    // it's hard-coded defaults. WIP: seems to be needed some times but
    -//    // not others.
    -//
    -//    String logDir = null;
    -//    if (builder.tempDir != null) {
    -//      logDir = builder.tempDir.getAbsolutePath();
    -//    }
    -//    if (logDir == null) {
    -//      logDir = config.getString(ExecConstants.DRILL_TMP_DIR);
    -//      if (logDir != null) {
    -//        logDir += "/drill/log";
    -//      }
    -//    }
    -//    if (logDir == null) {
    -//      logDir = "/tmp/drill";
    -//    }
    -//    new File(logDir).mkdirs();
    -//    System.setProperty("drill.log-dir", logDir);
    -
    -    dfsTestTempDir = makeTempDir("dfs-test");
    -
    -    // Clean up any files that may have been left from the
    -    // last run.
    -
    -    preserveLocalFiles = builder.preserveLocalFiles;
    -    removeLocalFiles();
    +  private void startDrillbits() throws Exception {
    +    dfsTestTempDir = new File(getRootDir(), "dfs-test");
    +    dfsTestTempDir.mkdirs();
    --- End diff --
    
    Is the "preserve local files" behavior maintained? Let me explain.
    
    In automated tests, files are created, used, and discarded.
    
    But, we also use this framework for ad-hoc tests when debugging. In this case, we create files, end the test, and examine the files by hand. Examples: logs, output files, query profiles, etc.
    
    So, for this test fixture (but not for `BaseTestQuery`), we want two modes:
    
    * Normal mode: as you have implemented.
    * "Preserve local files" mode: files are written to `/tmp/drill` (or some other fixed, well-known location, can be within the `target` folder) and preserved past test exit.
    
    A method on the `FixtureBuilder` identifies which mode is desired. We turn on the "preserve" mode for ad-hoc tests.
    
    For more info on this framework, see [this description](https://github.com/paul-rogers/drill/wiki/Cluster-Fixture-Framework).


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145521268
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
    @@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit drillbit, String pluginName,
       public static final String EXPLAIN_PLAN_TEXT = "text";
       public static final String EXPLAIN_PLAN_JSON = "json";
     
    -  public static FixtureBuilder builder() {
    -    FixtureBuilder builder = new FixtureBuilder()
    +  public static FixtureBuilder builder(DirTestWatcher dirTestWatcher) {
    --- End diff --
    
    See note above about passing in the test watcher via a method instead of the constructor.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145505747
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java ---
    @@ -76,43 +78,50 @@
       //        - detecting corrupt values must be deferred to actual data page reading
       //    - one from 1.4, where there is a proper created-by, but the corruption is present
       private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
    +      "/parquet/4203_corrupt_dates/mixed_drill_versions";
       // partitioned with 1.2.0, no certain metadata that these were written with Drill
       // the value will be checked to see that they look corrupt and they will be corrected
       // by default. Users can use the format plugin option autoCorrectCorruptDates to disable
       // this behavior if they have foreign parquet files with valid rare date values that are
       // in the similar range as Drill's corrupt values
    +  private static final String PARTITIONED_1_2_FOLDER = "partitioned_with_corruption_4203_1_2";
       private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
    +      Paths.get("/parquet/4203_corrupt_dates", PARTITIONED_1_2_FOLDER).toString();
       // partitioned with 1.4.0, no certain metadata regarding the date corruption status.
       // The same detection approach of the corrupt date values as for the files partitioned with 1.2.0
    +  private static final String PARTITIONED_1_4_FOLDER = "partitioned_with_corruption_4203";
       private static final String CORRUPTED_PARTITIONED_DATES_1_4_0_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203";
    +      Paths.get("/parquet/4203_corrupt_dates", PARTITIONED_1_4_FOLDER).toString();
       private static final String PARQUET_DATE_FILE_WITH_NULL_FILLED_COLS =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
    +      "/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
    +  private static final String PARTITIONED_1_9_FOLDER = "1_9_0_partitioned_no_corruption";
       private static final String CORRECT_PARTITIONED_DATES_1_9_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption";
    +      Paths.get("/parquet/4203_corrupt_dates", PARTITIONED_1_9_FOLDER).toString();
       private static final String VARCHAR_PARTITIONED =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
    +      "/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
       private static final String DATE_PARTITIONED =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition";
    +      "/parquet/4203_corrupt_dates/fewtypes_datepartition";
       private static final String EXCEPTION_WHILE_PARSING_CREATED_BY_META =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
    +      "/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
       private static final String CORRECT_DATES_1_6_0_PATH =
    -      "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
    -  private static final String PARTITIONED_1_2_FOLDER = "partitioned_with_corruption_4203_1_2";
    +      "/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
       private static final String MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER = "mixed_partitioned";
     
    -
       @BeforeClass
       public static void initFs() throws Exception {
    +    dirTestWatcher.copyResourceToRoot("/parquet/4203_corrupt_dates");
    --- End diff --
    
    Each time I see this I have to remember that this is an absolute path relative to some unstated root. That is, although it looks absolute, it is actually relative...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r150097140
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
    @@ -85,8 +85,7 @@
        * new row set with the updated columns, then merge the new
        * and old row sets to create a new immutable row set.
        */
    -
    -  public interface RowSetWriter extends TupleWriter {
    +  interface RowSetWriter extends TupleWriter {
    --- End diff --
    
    Ah, forgot that the file defines an interface, not a class. (The situation I described was an interface nested inside a class.) So, you're good.


---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/984
  
    Discussed in person. Conflicts are resolved. We have a plan for merging with other in-flight PRs.
    +1 (again)


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145297931
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TopN/TopNBatchTest.java ---
    @@ -0,0 +1,179 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl.TopN;
    +
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Properties;
    +import java.util.Random;
    +
    +import com.google.common.collect.Lists;
    +import org.apache.drill.test.TestBuilder;
    +import org.apache.drill.categories.OperatorTest;
    +import org.apache.drill.common.config.DrillConfig;
    +import org.apache.drill.common.expression.FieldReference;
    +import org.apache.drill.common.logical.data.Order;
    +import org.apache.drill.common.types.TypeProtos;
    +import org.apache.drill.common.types.Types;
    +import org.apache.drill.exec.compile.ClassBuilder;
    +import org.apache.drill.exec.compile.CodeCompiler;
    +import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
    +import org.apache.drill.exec.memory.RootAllocator;
    +import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
    +import org.apache.drill.exec.pop.PopUnitTestBase;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.ExpandableHyperContainer;
    +import org.apache.drill.exec.record.MaterializedField;
    +import org.apache.drill.exec.record.VectorContainer;
    +import org.apache.drill.exec.server.options.OptionSet;
    +import org.apache.drill.test.ClientFixture;
    +import org.apache.drill.test.ClusterFixture;
    +import org.apache.drill.test.FixtureBuilder;
    +import org.apache.drill.test.OperatorFixture;
    +import org.apache.drill.test.BatchUtils;
    +import org.apache.drill.test.DirTestWatcher;
    +import org.apache.drill.test.rowSet.RowSetBuilder;
    +import org.junit.Rule;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +
    +@Category(OperatorTest.class)
    +public class TopNBatchTest extends PopUnitTestBase {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TopNBatchTest.class);
    +
    +  // Allows you to look at generated code after tests execute
    +  @Rule
    +  public DirTestWatcher dirTestWatcher = new DirTestWatcher(false);
    +
    +  /**
    +   * Priority queue unit test.
    +   * @throws Exception
    +   */
    +  @Test
    +  public void priorityQueueOrderingTest() throws Exception {
    +    Properties properties = new Properties();
    +    properties.setProperty(ClassBuilder.CODE_DIR_OPTION, dirTestWatcher.getDirPath());
    +
    +    DrillConfig drillConfig = DrillConfig.create(properties);
    +    OptionSet optionSet = new OperatorFixture.TestOptionSet();
    +
    +    FieldReference expr = FieldReference.getWithQuotedRef("colA");
    +    Order.Ordering ordering = new Order.Ordering(Order.Ordering.ORDER_DESC, expr, Order.Ordering.NULLS_FIRST);
    +    List<Order.Ordering> orderings = Lists.newArrayList(ordering);
    +
    +    MaterializedField colA = MaterializedField.create("colA", Types.required(TypeProtos.MinorType.INT));
    +    MaterializedField colB = MaterializedField.create("colB", Types.required(TypeProtos.MinorType.INT));
    +
    +    List<MaterializedField> cols = Lists.newArrayList(colA, colB);
    +    BatchSchema batchSchema = new BatchSchema(BatchSchema.SelectionVectorMode.NONE, cols);
    +
    +    try (RootAllocator allocator = new RootAllocator(100_000_000)) {
    +      VectorContainer expectedVectors = new RowSetBuilder(allocator, batchSchema)
    +        .add(110, 10)
    +        .add(109, 9)
    +        .add(108, 8)
    +        .add(107, 7)
    +        .add(106, 6)
    +        .add(105, 5)
    +        .add(104, 4)
    +        .add(103, 3)
    +        .add(102, 2)
    +        .add(101, 1)
    +        .build()
    +        .container();
    +
    +      Map<String, List<Object>> expectedTable = BatchUtils.containerToObjects(expectedVectors);
    +      expectedVectors.clear();
    +
    +      PriorityQueue queue;
    +      ExpandableHyperContainer hyperContainer;
    +
    +      {
    +        VectorContainer container = new RowSetBuilder(allocator, batchSchema)
    +          .build()
    +          .container();
    +        hyperContainer = new ExpandableHyperContainer(container);
    +
    +        queue = TopNBatch.createNewPriorityQueue(
    --- End diff --
    
    Nice! Good to see you found a way to do true unit testing of the internal component without having to wrap it in the entire Drill server. The setup is a bit more clunky that we'd like, but this is a great start.
    
    I say "more clunky than we'd like" because we'd like to make these tests so simple that people go this route (and do much more thorough testing) than going the "test the whole drillbit" route.
    
    That is not a change to make here; rather it is simply a note that we want to keep looking for ways to make things simpler based on our experience with each new test.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144936263
  
    --- Diff: common/src/test/java/org/apache/drill/test/SubDirTestWatcher.java ---
    @@ -0,0 +1,108 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.base.Preconditions;
    +import com.google.common.collect.Lists;
    +import org.junit.rules.TestWatcher;
    +import org.junit.runner.Description;
    +
    +import java.io.File;
    +import java.util.List;
    +
    +public class SubDirTestWatcher extends TestWatcher {
    --- End diff --
    
    Maybe a Javadoc comment to explain what this does? From the name, it must have something to do with a temporary sub dir... But, how does this relate to the DirTestWatcher?


---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/984
  
    @ilooner as far as know you are doing presentation today connected with this pull request. If overall approach will be accepted, then we'll try to merge this pull request. Please resolve the conflicts as well.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145572887
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java ---
    @@ -20,22 +20,58 @@
     import org.apache.drill.exec.compile.TemplateClassDefinition;
     import org.apache.drill.exec.exception.SchemaChangeException;
     import org.apache.drill.exec.memory.BufferAllocator;
    -import org.apache.drill.exec.ops.FragmentContext;
     import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
     import org.apache.drill.exec.record.VectorContainer;
     import org.apache.drill.exec.record.selection.SelectionVector4;
     
     public interface PriorityQueue {
    -  public void add(FragmentContext context, RecordBatchData batch) throws SchemaChangeException;
    -  public void init(int limit, FragmentContext context, BufferAllocator allocator, boolean hasSv2) throws SchemaChangeException;
    -  public void generate() throws SchemaChangeException;
    -  public VectorContainer getHyperBatch();
    -  public SelectionVector4 getHeapSv4();
    -  public SelectionVector4 getFinalSv4();
    -  public boolean validate();
    -  public void resetQueue(VectorContainer container, SelectionVector4 vector4) throws SchemaChangeException;
    -  public void cleanup();
    -
    -  public static TemplateClassDefinition<PriorityQueue> TEMPLATE_DEFINITION = new TemplateClassDefinition<PriorityQueue>(PriorityQueue.class, PriorityQueueTemplate.class);
    +  /**
    +   * The elements in the given batch are added to the priority queue. Note that the priority queue
    +   * only retains the top elements that fit within the size specified by the {@link #init(int, BufferAllocator, boolean)}
    +   * method.
    +   * @param batch The batch containing elements we want to add.
    +   * @throws SchemaChangeException
    +   */
    +  void add(RecordBatchData batch) throws SchemaChangeException;
     
    +  /**
    +   * Initializes the priority queue. This method must be called before any other methods on the priority
    +   * queue are called.
    +   * @param limit The size of the priority queue.
    +   * @param allocator The {@link BufferAllocator} to use when creating the priority queue.
    +   * @param hasSv2 True when incoming batches have v2 selection vectors. False otherwise.
    --- End diff --
    
    fixed


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144207089
  
    --- Diff: common/src/test/java/org/apache/drill/testutils/SubDirTestWatcher.java ---
    @@ -0,0 +1,108 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.testutils;
    --- End diff --
    
    done


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145297229
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java ---
    @@ -559,7 +563,7 @@ public void testRankWithGroupBy() throws Exception {
     
       @Test // DRILL-3404
       public void testWindowSumAggIsNotNull() throws Exception {
    -    String query = String.format("select count(*) cnt from (select sum ( c1 ) over ( partition by c2 order by c1 asc nulls first ) w_sum from dfs.`%s/window/table_with_nulls.parquet` ) sub_query where w_sum is not null", TEST_RES_PATH);
    +    String query = "select count(*) cnt from (select sum ( c1 ) over ( partition by c2 order by c1 asc nulls first ) w_sum from dfs.`/window/table_with_nulls.parquet` ) sub_query where w_sum is not null";
    --- End diff --
    
    Right about now I have to give you a huge THANKS! This is a huge amount of clean-up you've done. These are really good improvements. Really appreciate the tedious work to get this all done.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145575588
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java ---
    @@ -116,6 +116,14 @@ public void clear() {
         }
       }
     
    +  public static int getBatchIndex(int sv4Index) {
    +    return (sv4Index >> 16) & 0xFFFF;
    +  }
    +
    +  public static int getRecordIndex(int sv4Index) {
    +    return (sv4Index) & 0xFFFF;
    +  }
    --- End diff --
    
    Didn't do any performance testing. I mainly intended to use these methods for testing purposes. For example I use them in org.apache.drill.test.BatchUtils .


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145292056
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java ---
    @@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
         logger.debug("Took {} us to purge", watch.elapsed(TimeUnit.MICROSECONDS));
       }
     
    -  public PriorityQueue createNewPriorityQueue(FragmentContext context, List<Ordering> orderings,
    -                                                     VectorAccessible batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
    -          throws ClassTransformationException, IOException, SchemaChangeException{
    -    CodeGenerator<PriorityQueue> cg = CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions());
    +  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int limit)
    +    throws SchemaChangeException, ClassTransformationException, IOException {
    +    return createNewPriorityQueue(
    +      mainMapping, leftMapping, rightMapping, context.getOptionSet(), context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
    +      config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, oContext.getAllocator(), schema.getSelectionVectorMode());
    +  }
    +
    +  public static MappingSet createMainMappingSet() {
    +    return new MappingSet((String) null, null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static MappingSet createLeftMappingSet() {
    +    return new MappingSet("leftIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static MappingSet createRightMappingSet() {
    +    return new MappingSet("rightIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static PriorityQueue createNewPriorityQueue(
    +    MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping,
    --- End diff --
    
    One thought worth sharing here. Today, our code cache uses source code as the key to look for an existing compiled version of a class. But, this is inefficient: we have to generate the source code, transform it, and store two copies to act as keys.
    
    Better would be to have a "class definition" object that holds all parameters needed to generate the code. Store these in the cache. Likely, these will be much smaller than the generated code. Using these avoids the need to generate the code to see if we already have a copy of that code.
    
    The class definition idea works as long as the definition is a closed set: everything needed to generate the code is in that one definition object.
    
    Your function here is close: it passes in a closed set of items. The open bits are the expression and system/session options. For options, we can copy the few values we need into our definition. The expression might require a bit more thought to capture.
    
    So, if we can evolve your idea further, we can get a potential large per-schema performance improvement by avoiding code generation when doing a repeat query.
    
    Too much to fix here; but something to think about moving forward.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145294152
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
    @@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
             "  nations.N_NAME,\n" +
             "  regions.R_NAME\n" +
             "FROM\n" +
    -        "  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` nations\n" +
    +        "  dfs.`/sample-data/nation.parquet` nations\n" +
    --- End diff --
    
    Does this work? Did we remap `dfs` to point the per-query temp directory? Do we want to do that? Or, should we use `dfs_test` here as well?


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147229478
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java ---
    @@ -33,8 +35,11 @@
     @Category(UnlikelyTest.class)
     public class TestBugFixes extends BaseTestQuery {
       private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestBugFixes.class);
    -  private static final String WORKING_PATH = TestTools.getWorkingPath();
    -  private static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
    +
    +  @BeforeClass
    +  public static void setupTestFiles() {
    +    dirTestWatcher.copyResourceToRoot("/bugs/DRILL-4192");
    --- End diff --
    
    I have made all the path references relative in the tests now



---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144205146
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java ---
    @@ -335,20 +333,32 @@ private void purge() throws SchemaChangeException {
         logger.debug("Took {} us to purge", watch.elapsed(TimeUnit.MICROSECONDS));
       }
     
    -  public PriorityQueue createNewPriorityQueue(FragmentContext context, List<Ordering> orderings,
    -                                                     VectorAccessible batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
    -          throws ClassTransformationException, IOException, SchemaChangeException{
    -    CodeGenerator<PriorityQueue> cg = CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions());
    +  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int limit)
    +    throws SchemaChangeException, ClassTransformationException, IOException {
    +    return createNewPriorityQueue(context.getOptionSet(), context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
    +      config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, oContext.getAllocator(), schema.getSelectionVectorMode());
    +  }
    +
    +  public static PriorityQueue createNewPriorityQueue(
    +    OptionSet optionSet, FunctionLookupContext functionLookupContext, CodeCompiler codeCompiler,
    +    List<Ordering> orderings, VectorAccessible batch, boolean unionTypeEnabled, boolean codegenDump,
    +    int limit, BufferAllocator allocator, SelectionVectorMode mode)
    +          throws ClassTransformationException, IOException, SchemaChangeException {
    +    final MappingSet mainMapping = new MappingSet((String) null, null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +    final MappingSet leftMapping = new MappingSet("leftIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +    final MappingSet rightMapping = new MappingSet("rightIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    --- End diff --
    
    Reverted it back to the way it was


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r147258230
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
    @@ -213,7 +204,7 @@ private void configureZk(FixtureBuilder builder) {
         }
       }
     
    -  private void createConfig(FixtureBuilder builder) throws Exception {
    +  private void createConfig() throws Exception {
    --- End diff --
    
    They should be. The only change I made is that I didn't require passing the FixtureBuilder as an arg, because the class is already holding a reference to it.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r148396417
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/TableFileBuilder.java ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.base.Preconditions;
    +import com.google.common.collect.Lists;
    +
    +import java.io.BufferedOutputStream;
    +import java.io.File;
    +import java.io.FileOutputStream;
    +import java.io.IOException;
    +import java.util.List;
    +
    +public class TableFileBuilder
    +{
    +  private final List<String> columnNames;
    +  private final List<String> columnFormatters;
    +  private final List<Object[]> rows = Lists.newArrayList();
    +  private final String rowString;
    +
    +  public TableFileBuilder(List<String> columnNames, List<String> columnFormatters) {
    --- End diff --
    
    I've switched this around a bit. The general flow is now the following:
    
      1. A **RowSet** is constructed with a **RowSetBuilder**
      1. A **JsonFileBuilder** is created and configured to use the **RowSet** that was just created.
      1. The **JsonFileBuilder** reads the **RowSet** and writes it out to a file in a json format.



---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144943191
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
    @@ -90,12 +89,11 @@
       private String generatedCode;
       private String generifiedCode;
     
    -  CodeGenerator(TemplateClassDefinition<T> definition, FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
    -    this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, optionManager);
    +  CodeGenerator(TemplateClassDefinition<T> definition, OptionSet optionManager) {
    +    this(ClassGenerator.getDefaultMapping(), definition, optionManager);
    --- End diff --
    
    Good analysis. It was a real pain to lug the function registry around. Presumably the registry is used elsewhere in code gen; perhaps in the expression materializer.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144193392
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
    @@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final OptionSet optionManager)
       public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + ".disable_cache";
     
       /**
    +   * Enables saving generated code for debugging
    +   */
    +  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + ".codegen.dump";
    --- End diff --
    
    The directory that generated code is dumped into is determined by the ClassBuilder.CODE_DIR_OPTION property. The DirTestWatcher createw a directory of the following form:
    
    ```
    /target/<Fully qualified test class name>/<test method name>
    ```
    
    So if the ClassBuilder.CODE_DIR_OPTION property is configured to use the directory generated by the DirTestWatcher, the code is dumped into the same well defined location every test run. Also the generated class file has names like:
    
    ```
    <interface name>Gen<random num>.java
    ```


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r150072992
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java ---
    @@ -0,0 +1,159 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test.rowSet.file;
    +
    +import com.google.common.base.Preconditions;
    +import com.google.common.collect.ImmutableMap;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Maps;
    +import org.apache.drill.exec.record.MaterializedField;
    +import org.apache.drill.exec.vector.accessor.ColumnAccessor;
    +import org.apache.drill.exec.vector.accessor.ColumnReader;
    +import org.apache.drill.test.rowSet.RowSet;
    +
    +import java.io.BufferedOutputStream;
    +import java.io.File;
    +import java.io.FileOutputStream;
    +import java.io.IOException;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Map;
    +
    +public class JsonFileBuilder
    +{
    +  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
    +  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
    +  public static final String DEFAULT_LONG_FORMATTER = "%d";
    +  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
    +  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
    +  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
    +
    +  public static final Map<String, String> DEFAULT_FORMATTERS = new ImmutableMap.Builder()
    +    .put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
    +    .put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
    +    .put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
    +    .put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
    +    .put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
    +    .put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
    +    .build();
    +
    +  private final RowSet rowSet;
    +  private final Map<String, String> customFormatters = Maps.newHashMap();
    +
    +  public JsonFileBuilder(RowSet rowSet) {
    +    this.rowSet = Preconditions.checkNotNull(rowSet);
    +    Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset is empty.");
    +  }
    +
    +  public JsonFileBuilder setCustomFormatter(final String columnName, final String columnFormatter) {
    +    Preconditions.checkNotNull(columnName);
    +    Preconditions.checkNotNull(columnFormatter);
    +
    +    Iterator<MaterializedField> fields = rowSet
    +      .schema()
    +      .batch()
    +      .iterator();
    +
    +    boolean hasColumn = false;
    +
    +    while (!hasColumn && fields.hasNext()) {
    +      hasColumn = fields.next()
    +        .getName()
    +        .equals(columnName);
    +    }
    +
    +    final String message = String.format("(%s) is not a valid column", columnName);
    +    Preconditions.checkArgument(hasColumn, message);
    +
    +    customFormatters.put(columnName, columnFormatter);
    +
    +    return this;
    +  }
    +
    +  public void build(File tableFile) throws IOException {
    --- End diff --
    
    Great! This does not yet handle nested tuples or arrays; in part because the row set work for that is still sitting in PR #914. You can update this to be aware of maps and map arrays once that PR is committed.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145522737
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/TableFileBuilder.java ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.test;
    +
    +import com.google.common.base.Preconditions;
    +import com.google.common.collect.Lists;
    +
    +import java.io.BufferedOutputStream;
    +import java.io.File;
    +import java.io.FileOutputStream;
    +import java.io.IOException;
    +import java.util.List;
    +
    +public class TableFileBuilder
    +{
    +  private final List<String> columnNames;
    +  private final List<String> columnFormatters;
    +  private final List<Object[]> rows = Lists.newArrayList();
    +  private final String rowString;
    +
    +  public TableFileBuilder(List<String> columnNames, List<String> columnFormatters) {
    --- End diff --
    
    See prior notes. We now have three ways to build a collection of rows...


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145298677
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggrSpill.java ---
    @@ -43,146 +44,122 @@
     import static org.junit.Assert.assertTrue;
     
     /**
    - *  Test spilling for the Hash Aggr operator (using the mock reader)
    + * Test spilling for the Hash Aggr operator (using the mock reader)
      */
     @Category({SlowTest.class, OperatorTest.class})
     public class TestHashAggrSpill {
     
    -    private void runAndDump(ClientFixture client, String sql, long expectedRows, long spillCycle, long fromSpilledPartitions, long toSpilledPartitions) throws Exception {
    -        String plan = client.queryBuilder().sql(sql).explainJson();
    -
    -        QueryBuilder.QuerySummary summary = client.queryBuilder().sql(sql).run();
    -        if ( expectedRows > 0 ) {
    -            assertEquals(expectedRows, summary.recordCount());
    -        }
    -        // System.out.println(String.format("======== \n Results: %,d records, %d batches, %,d ms\n ========", summary.recordCount(), summary.batchCount(), summary.runTimeMs() ) );
    -
    -        //System.out.println("Query ID: " + summary.queryIdString());
    -        ProfileParser profile = client.parseProfile(summary.queryIdString());
    -        //profile.print();
    -        List<ProfileParser.OperatorProfile> ops = profile.getOpsOfType(UserBitShared.CoreOperatorType.HASH_AGGREGATE_VALUE);
    -
    -        assertTrue( ! ops.isEmpty() );
    -        // check for the first op only
    -        ProfileParser.OperatorProfile hag0 = ops.get(0);
    -        long opCycle = hag0.getMetric(HashAggTemplate.Metric.SPILL_CYCLE.ordinal());
    -        assertEquals(spillCycle, opCycle);
    -        long op_spilled_partitions = hag0.getMetric(HashAggTemplate.Metric.SPILLED_PARTITIONS.ordinal());
    -        assertTrue( op_spilled_partitions >= fromSpilledPartitions && op_spilled_partitions <= toSpilledPartitions );
    -        /* assertEquals(3, ops.size());
    -        for ( int i = 0; i < ops.size(); i++ ) {
    -            ProfileParser.OperatorProfile hag = ops.get(i);
    -            long cycle = hag.getMetric(HashAggTemplate.Metric.SPILL_CYCLE.ordinal());
    -            long num_partitions = hag.getMetric(HashAggTemplate.Metric.NUM_PARTITIONS.ordinal());
    -            long spilled_partitions = hag.getMetric(HashAggTemplate.Metric.SPILLED_PARTITIONS.ordinal());
    -            long mb_spilled = hag.getMetric(HashAggTemplate.Metric.SPILL_MB.ordinal());
    -            System.out.println(String.format("(%d) Spill cycle: %d, num partitions: %d, spilled partitions: %d, MB spilled: %d", i,cycle, num_partitions, spilled_partitions,
    -                mb_spilled));
    -        } */
    -    }
    +  @Rule
    +  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
     
    -    /**
    -     *  A template for Hash Aggr spilling tests
    -     *
    -     * @throws Exception
    -     */
    -    private void testSpill(long maxMem, long numPartitions, long minBatches, int maxParallel, boolean fallback ,boolean predict,
    -                           String sql, long expectedRows, int cycle, int fromPart, int toPart) throws Exception {
    -        LogFixture.LogFixtureBuilder logBuilder = LogFixture.builder()
    -          .toConsole()
    -          //.logger("org.apache.drill.exec.physical.impl.aggregate", Level.INFO)
    -          .logger("org.apache.drill", Level.WARN)
    -          ;
    -
    -        FixtureBuilder builder = ClusterFixture.builder()
    -          .sessionOption(ExecConstants.HASHAGG_MAX_MEMORY_KEY,maxMem)
    -          .sessionOption(ExecConstants.HASHAGG_NUM_PARTITIONS_KEY,numPartitions)
    -          .sessionOption(ExecConstants.HASHAGG_MIN_BATCHES_PER_PARTITION_KEY,minBatches)
    -          .configProperty(ExecConstants.SYS_STORE_PROVIDER_LOCAL_ENABLE_WRITE, false)
    -          .sessionOption(PlannerSettings.FORCE_2PHASE_AGGR_KEY,true)
    -          // .sessionOption(PlannerSettings.EXCHANGE.getOptionName(), true)
    -          .sessionOption(ExecConstants.HASHAGG_FALLBACK_ENABLED_KEY, fallback)
    -          .sessionOption(ExecConstants.HASHAGG_USE_MEMORY_PREDICTION_KEY,predict)
    -
    -          .maxParallelization(maxParallel)
    -          .saveProfiles()
    -          //.keepLocalFiles()
    -          ;
    -        String sqlStr = sql != null ? sql :  // if null then use this default query
    -          "SELECT empid_s17, dept_i, branch_i, AVG(salary_i) FROM `mock`.`employee_1200K` GROUP BY empid_s17, dept_i, branch_i";
    -
    -        try (LogFixture logs = logBuilder.build();
    -             ClusterFixture cluster = builder.build();
    -             ClientFixture client = cluster.clientFixture()) {
    -            runAndDump(client, sqlStr, expectedRows, cycle, fromPart,toPart);
    -        }
    -    }
    -    /**
    -     * Test "normal" spilling: Only 2 (or 3) partitions (out of 4) would require spilling
    -     * ("normal spill" means spill-cycle = 1 )
    -     *
    -     * @throws Exception
    -     */
    -    @Test
    -    public void testSimpleHashAggrSpill() throws Exception {
    -        testSpill(68_000_000, 16, 2, 2, false, true, null,
    -          1_200_000, 1,2, 3
    -          );
    -    }
    -    /**
    -     * Test with "needed memory" prediction turned off
    -     * (i.e., do exercise code paths that catch OOMs from the Hash Table and recover)
    -     *
    -     * @throws Exception
    -     */
    -    @Test
    -    public void testNoPredictHashAggrSpill() throws Exception {
    -        testSpill(58_000_000, 16, 2, 2, false,false /* no prediction */,
    -             null,1_200_000, 1,1, 1
    -        );
    +  private void runAndDump(ClientFixture client, String sql, long expectedRows, long spillCycle, long fromSpilledPartitions, long toSpilledPartitions) throws Exception {
    --- End diff --
    
    Boaz inherited this from one of my tests; but I use the "run and dump" pattern only in ad-hoc tests when I need lots of detail for debugging. Would be nice to turn off the "dump" part when running the tests as part of the suite.
    
    Also, when running in a suite, Boaz is not there to verify results. This test needs a way to ensure that the results are correct.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145574915
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java ---
    @@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
         logger.debug("Took {} us to purge", watch.elapsed(TimeUnit.MICROSECONDS));
       }
     
    -  public PriorityQueue createNewPriorityQueue(FragmentContext context, List<Ordering> orderings,
    -                                                     VectorAccessible batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
    -          throws ClassTransformationException, IOException, SchemaChangeException{
    -    CodeGenerator<PriorityQueue> cg = CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions());
    +  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int limit)
    +    throws SchemaChangeException, ClassTransformationException, IOException {
    +    return createNewPriorityQueue(
    +      mainMapping, leftMapping, rightMapping, context.getOptionSet(), context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
    +      config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, oContext.getAllocator(), schema.getSelectionVectorMode());
    +  }
    +
    +  public static MappingSet createMainMappingSet() {
    +    return new MappingSet((String) null, null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static MappingSet createLeftMappingSet() {
    +    return new MappingSet("leftIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static MappingSet createRightMappingSet() {
    +    return new MappingSet("rightIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
    +  }
    +
    +  public static PriorityQueue createNewPriorityQueue(
    +    MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping,
    --- End diff --
    
    We cannot declare the MappingSets as constants because they contain some internal state (the mappingIndex field). If they were constants the state could become corrupt in the case where multiple queries run and create PriorityQueues on the same node simultaneously, so we have to create new MappingSets each time we create a new priority queue.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144147334
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java ---
    @@ -95,7 +91,9 @@ public TopNBatch(TopN popConfig, FragmentContext context, RecordBatch incoming)
         super(popConfig, context);
         this.incoming = incoming;
         this.config = popConfig;
    -    batchPurgeThreshold = context.getConfig().getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
    +    DrillConfig drillConfig = context.getConfig();
    +    batchPurgeThreshold = drillConfig.getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
    +    codegenDump = drillConfig.getBoolean(CodeCompiler.ENABLE_SAVE_CODE_FOR_DEBUG);
    --- End diff --
    
    Not sure we want to enable this in this way. This would globally enable all code to be dumped. In practice, we want to enable module by module depending on what we are debugging.
    
    I've been doing this the crude-but-effective way: just uncommenting the magic line of code. This usually works because I have to be in the debugger anyway to step through the code. But, if we wanted to be fancier, we could have config settings for each package. So, here, maybe you would enable `drill.debug.codegen.org.apache.drill.exec.physical.impl.TopN`. Or, we could be a bit lighter weight and just assign names: `drill.debug.codegen.TopN`.
    
    Because a typical run will generate zillions of files, reducing the number of saved files to those of interest will help to keep things tidy.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r144147991
  
    --- Diff: common/src/test/java/org/apache/drill/testutils/SubDirTestWatcher.java ---
    @@ -0,0 +1,108 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.testutils;
    --- End diff --
    
    You were a brave soul and moved the `DrillTest` and `BaseTestQuery` classes into `org.apache.drill.test`. Do we want to move `testutils` under `test` as well?


---

[GitHub] drill issue #984: DRILL-5783 Made a unit test for generated Priority Queue. ...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/984
  
    @ilooner changes in 361 files seems to be too much :) I suggest you split this PR at least at three according to the addressed issues. So we try to merge smaller chunks.


---

[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/984#discussion_r145815670
  
    --- Diff: exec/java-exec/src/test/resources/topN/one_key_sort.json ---
    @@ -12,11 +12,11 @@
                 pop:"mock-scan",
                 url: "http://apache.org",
                 entries:[
    -                {records: 10000000, types: [
    +                {records: 100000, types: [
    --- End diff --
    
    The original test was checking to make sure the the Top N results returned are ordered. It was never really necessary to generate that much data. The benefit of reducing the amount of data generated is that the test runs faster.


---