You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by HanumathRao <gi...@git.apache.org> on 2017/12/01 01:16:02 UTC

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

GitHub user HanumathRao opened a pull request:

    https://github.com/apache/drill/pull/1059

    DRILL-5851: Empty table during a join operation with a non empty tabl…

    …e produces cast exception.
    
    These code changes handle the cases where either of the table in a join is empty and we do not need to process the join further. @paul-rogers @amansinha100 Please review these changes.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HanumathRao/drill DRILL-5851-lat

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1059.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1059
    
----
commit dbdffa51b802ae6c799264836ad86cc39d34fee2
Author: Hanumath Maduri <hm...@laptop-npjh4dhd.corp.maprtech.com>
Date:   2017-10-09T20:08:13Z

    DRILL-5851: Empty table during a join operation with a non empty table produces cast exception.

----


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939459
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +342,96 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testNestedLeftJoinWithEmptyTable() throws Exception {
    +    test(DISABLE_JOIN_OPTIMIZATION);
    +    test(DISABLE_NLJ_SCALAR);
    +    test(DISABLE_HJ);
    +    test(DISABLE_MJ);
    --- End diff --
    
    We now have `sessionOption("option.name", someValue)` to do this work.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r162289830
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/JoinTestBase.java ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl.join;
    +
    +import org.apache.drill.categories.OperatorTest;
    +import org.apache.drill.PlanTestBase;
    +import org.apache.drill.common.exceptions.UserRemoteException;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +import java.io.File;
    +import java.io.FileWriter;
    +import java.io.IOException;
    +import java.io.PrintWriter;
    +import java.nio.file.Paths;
    +import org.apache.drill.exec.planner.physical.PlannerSettings;
    +import static org.hamcrest.CoreMatchers.containsString;
    +import static org.junit.Assert.assertThat;
    +
    +
    +@Category(OperatorTest.class)
    +public class JoinTestBase extends PlanTestBase {
    +
    +  private static final String testEmptyJoin = "select count(*) as cnt from cp.`employee.json` emp %s join dfs.`dept.json` " +
    +          "as dept on dept.manager = emp.`last_name`";
    +
    +  /**
    +   * This function runs a join query with one of the table generated as an
    +   * empty json file.
    +   * @param testDir in which the empty json file is generated.
    +   * @param joinType to be executed.
    +   * @param joinPattern to look for the pattern in the successful run.
    +   * @param result number of the output rows.
    +   */
    +  public void testJoinWithEmptyFile(File testDir, String joinType,
    +                         String joinPattern, long result) throws Exception {
    +    buildFile("dept.json", new String[0], testDir);
    +    String query = String.format(testEmptyJoin, joinType);
    +    testPlanMatchingPatterns(query, new String[]{joinPattern}, new String[]{});
    +    testBuilder()
    +            .sqlQuery(query)
    +            .unOrdered()
    +            .baselineColumns("cnt")
    +            .baselineValues(result)
    +            .build().run();
    +  }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +}
    --- End diff --
    
    Put a new line in the end of file


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r162292571
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -19,19 +19,17 @@
     package org.apache.drill.exec.physical.impl.join;
     
     import org.apache.drill.categories.OperatorTest;
    -import org.apache.drill.PlanTestBase;
     import org.apache.drill.common.exceptions.UserRemoteException;
     import org.junit.BeforeClass;
     import org.junit.Test;
     import org.junit.experimental.categories.Category;
    -
     import java.nio.file.Paths;
    -
    +import org.apache.drill.exec.planner.physical.PlannerSettings;
     import static org.hamcrest.CoreMatchers.containsString;
     import static org.junit.Assert.assertThat;
     
     @Category(OperatorTest.class)
    -public class TestNestedLoopJoin extends PlanTestBase {
    +public class TestNestedLoopJoin extends JoinTestBase {
     
       private static String nlpattern = "NestedLoopJoin";
    --- End diff --
    
    Can you edit this as well? --> final, NLJ_PATTERN


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939311
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java ---
    @@ -535,4 +537,25 @@ public void close() {
         }
         super.close();
       }
    +
    +  @Override
    +  protected boolean checkForEarlyFinish() {
    +    if (joinType == JoinRelType.INNER &&
    +        (leftUpstream == IterOutcome.NONE || rightUpstream == IterOutcome.NONE) ||
    +        joinType != JoinRelType.INNER &&
    +        (leftUpstream == IterOutcome.NONE && rightUpstream == IterOutcome.NONE)) {
    +      return true;
    +    }
    +    return false;
    --- End diff --
    
    An explanation of the logic (in the form of a comment) would be very helpful.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939350
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java ---
    @@ -418,4 +418,15 @@ protected void killIncoming(boolean sendUpstream) {
       public int getRecordCount() {
         return outputRecords;
       }
    +
    +  @Override
    +  protected boolean checkForEarlyFinish() {
    +    if (popConfig.getJoinType() == JoinRelType.INNER &&
    +        (leftUpstream == IterOutcome.NONE || rightUpstream == IterOutcome.NONE) ||
    +        popConfig.getJoinType() != JoinRelType.INNER &&
    +            (leftUpstream == IterOutcome.NONE && rightUpstream == IterOutcome.NONE)) {
    +      return true;
    +    }
    +    return false;
    --- End diff --
    
    Maybe eliminate if as described above.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939344
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java ---
    @@ -152,7 +152,10 @@ public void buildSchema() {
           return;
         }
     
    -    if (leftOutcome == IterOutcome.NONE && rightOutcome == IterOutcome.NONE) {
    +    if (joinType == JoinRelType.INNER && (leftOutcome == IterOutcome.NONE || rightOutcome == IterOutcome.NONE) ||
    +        joinType != JoinRelType.INNER && (leftOutcome == IterOutcome.NONE && rightOutcome == IterOutcome.NONE)) {
    +      drainStream(leftOutcome, 0, left);
    +      drainStream(rightOutcome, 1, right);
    --- End diff --
    
    Does this do what it sounds like it does? Read values until there are no more to read? I wonder if this has been fully tested, or if it will end up running the subquery to completion unnecessarily?
    
    Also, here we are checking for NONE. Above we checked for error codes. Should we check for the error codes here?
    
    Or, better, when we receive an error code, should we simply throw an exception and end it all? (That is, the code does not currently do anything useful with STOP, OUT_OF_MEMORY or NOTYET.)


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939354
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/union/UnionAllRecordBatch.java ---
    @@ -97,6 +97,14 @@ protected void buildSchema() throws SchemaChangeException {
         VectorAccessibleUtilities.setValueCount(container,0);
       }
     
    +  @Override
    +  protected boolean checkForEarlyFinish() {
    +    if (leftUpstream == IterOutcome.NONE && rightUpstream == IterOutcome.NONE) {
    +      return true;
    +    }
    +    return false;
    --- End diff --
    
    Convert to return of boolean expression?


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593845
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +341,74 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testNestedLeftJoinWithEmptyTable() throws Exception {
    +    setSessionOption("planner.enable_join_optimization", false);
    +    setSessionOption("planner.enable_nljoin_for_scalar_only", false);
    +    setSessionOption("planner.enable_hashjoin", false);
    +    setSessionOption("planner.enable_mergejoin", false);
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    --- End diff --
    
    The above certainly does not do what one would expect. I suspect there is a bit of confusion. Not surprising, we don't document our test structure so we learn by trial and error.
    
    This test class derives from `PlanTestBase` which derives from `BaseTestQuery`. `BaseTestQuery` starts a server before running the tests.
    
    The `setSessionOption()` methods here sets options on the server created by `BaseTestQuery`. So far so good.
    
    But, the code in this tests then uses `ClusterFixture` to start a new server. This means:
    
    * Two servers are running.
    * The session options were set in the first.
    * The test was run against the second.
    
    In general, there are two equally valid choices.
    
    1. Run the tests against the server started by the test class.
    2. Change the test superclass to ClusterTest (to use a shared server) or DrillTest (to start the server in each test.)
    
    Here, we are adding to existing tests, so we should use the existing structure and use the server started by the test class.
    
    In another scenario, you might want to start a server for each test. If so, then you can pass the session options to the cluster builder so they are set when the server starts. (Not needed here, but handy to remember for other tests.)


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593808
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoinAdvanced.java ---
    @@ -253,4 +259,52 @@ public void testDrill4196() throws Exception {
           .baselineValues(6000*800L)
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    --- End diff --
    
    The above duplicates code in the hash join test. In general, duplicate code is a "bad thing." Suggestion: move this code into a utilities class; maybe one that already exists.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593830
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +341,74 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    --- End diff --
    
    Oh my! Another copy!


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593922
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +341,74 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testNestedLeftJoinWithEmptyTable() throws Exception {
    +    setSessionOption("planner.enable_join_optimization", false);
    --- End diff --
    
    Generally a better idea is to use the constants defined in `ExecConstants`. Doing so avoids any possibility of introducing a typo in the option name.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1059


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939449
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoinAdvanced.java ---
    @@ -253,4 +259,75 @@ public void testDrill4196() throws Exception {
           .baselineValues(6000*800L)
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testMergeLeftJoinWithEmptyTable() throws Exception {
    +    ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
    +    File testDir = null;
    +    try {
    +      ClusterFixture cluster = builder.build();
    +      ClientFixture client = cluster.clientFixture();
    +      testDir = dirTestWatcher.getTmpDir();
    +      cluster.defineWorkspace("dfs", "data", testDir.getAbsolutePath(), "json");
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.data.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.futureSummary().get().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      throw ex;
    +    } finally {
    +      if (testDir != null) {
    +        testDir.delete();
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testMergeInnerJoinWithEmptyTable() throws Exception {
    +    ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
    +    File testDir = null;
    +    try {
    +      ClusterFixture cluster = builder.build();
    +      ClientFixture client = cluster.clientFixture();
    +      testDir = dirTestWatcher.getTmpDir();
    +      cluster.defineWorkspace("dfs", "data", testDir.getAbsolutePath(), "json");
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp inner join dfs.data.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.futureSummary().get().recordCount() == 0);
    --- End diff --
    
    How could this test ensure that we did, indeed, stop the readers from reading extra records? (That is, that the `kill()` calls worked?)


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r161510606
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/TestNullInputMiniPlan.java ---
    @@ -354,9 +354,9 @@ public void testHashJoinRightEmpty() throws Exception {
             .build();
     
         BatchSchema expectedSchema = new SchemaBuilder()
    -        .addNullable("a", TypeProtos.MinorType.BIGINT)
    -        .addNullable("b", TypeProtos.MinorType.BIGINT)
    -        .withSVMode(BatchSchema.SelectionVectorMode.NONE)
    +            .addNullable("a", TypeProtos.MinorType.BIGINT)
    +            .addNullable("b", TypeProtos.MinorType.BIGINT)
    +            .withSVMode(BatchSchema.SelectionVectorMode.NONE)
    --- End diff --
    
    Please, return original indentation.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593956
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java ---
    @@ -305,11 +307,15 @@ public void executeBuildPhase() throws SchemaChangeException, ClassTransformatio
         //Setup the underlying hash table
     
         // skip first batch if count is zero, as it may be an empty schema batch
    -    if (right.getRecordCount() == 0) {
    +    if (isFurtherProcessingRequired(rightUpstream) && right.getRecordCount() == 0) {
           for (final VectorWrapper<?> w : right) {
             w.clear();
           }
           rightUpstream = next(right);
    +      if (isFurtherProcessingRequired(rightUpstream) &&
    +          right.getRecordCount() > 0 && hashTable == null) {
    +        setupHashTable();
    --- End diff --
    
    This handles an empty batch followed by a non-empty batch. Can we be sure that there will only ever be a sequence of 0 or 1 empty batches? Might there be a pathological scan that reads 20 (say) empty files, producing a series of 20 empty batches? In short, should the logic here be in a loop?
    
    Did we create a test that checks for this case?


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r162293231
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java ---
    @@ -535,4 +541,8 @@ public void close() {
         }
         super.close();
       }
    +
    +  private boolean isFurtherProcessingRequired(IterOutcome upStream) {
    --- End diff --
    
    It will be good to have here a small java-doc


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r162293578
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/AbstractRecordBatch.java ---
    @@ -228,4 +228,20 @@ public WritableBatch getWritableBatch() {
       public VectorContainer getOutgoingContainer() {
         throw new UnsupportedOperationException(String.format(" You should not call getOutgoingContainer() for class %s", this.getClass().getCanonicalName()));
       }
    +
    +  public void drainStream(IterOutcome stream, int input, RecordBatch batch) {
    --- End diff --
    
    Please add java-doc for this method


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939417
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -160,4 +166,75 @@ public void testJoinWithMapAndDotField() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testHashLeftJoinWithEmptyTable() throws Exception {
    +    ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
    +    File testDir = null;
    +    try {
    +      ClusterFixture cluster = builder.build();
    +      ClientFixture client = cluster.clientFixture();
    --- End diff --
    
    Please see ExampleTest for how the cluster and client should be created in a try-with-resources block to ensure that things are closed.
    
    Also, each cluster is started with the same parameters. The use of `ClusterTest` will allow you to start the cluster once, then reuse it for multiple test; which will speed up test cycle times.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by HanumathRao <gi...@git.apache.org>.
Github user HanumathRao commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r163122636
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java ---
    @@ -136,7 +136,9 @@ public void executeProbePhase() throws SchemaChangeException {
               case OK_NEW_SCHEMA:
                 if (probeBatch.getSchema().equals(probeSchema)) {
                   doSetup(outgoingJoinBatch.getContext(), buildBatch, probeBatch, outgoingJoinBatch);
    -              hashTable.updateBatches();
    +              if (hashTable != null) {
    +                hashTable.updateBatches();
    +              }
    --- End diff --
    
    @Ben-Zvi. Thanks Boaz for the quick review. I made the required changes and had a clean run precommit tests.
    
    Just to keep track of these failures. Here are the clean runs on the current commit.
    PASS (5.431 s) /root/drillAutomation/framework-master/framework/resources/Functional/schema_change_empty_batch/maprdb/binary_maprdb/emptyMaprDBLeftJoin.sql (connection: 1289554899) (queryID: 25997883-c4c3-afee-bd18-4907353109cd)
    PASS (6.403 s) /root/drillAutomation/framework-master/framework/resources/Functional/schema_change_empty_batch/hbase/emptyHbaseLeftJoin.sql (connection: 1037784189) (queryID: 2599792c-e9c0-304e-a82e-ff9438ed3f5b)
    PASS (5.392 s) /root/drillAutomation/framework-master/framework/resources/Functional/schema_change_empty_batch/hbase/emptyHbaseRightJoin.sql (connection: 1697676429) (queryID: 259979de-0f7a-f69e-9fcb-3b70f8eca3df)
    PASS (5.515 s) /root/drillAutomation/framework-master/framework/resources/Functional/schema_change_empty_batch/maprdb/binary_hbase/emptyMaprDBRightJoin.sql (connection: 1143786034) (queryID: 2599790d-5a7f-d737-a3dd-83baf15dcf94)


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939424
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -160,4 +166,75 @@ public void testJoinWithMapAndDotField() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testHashLeftJoinWithEmptyTable() throws Exception {
    +    ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
    +    File testDir = null;
    +    try {
    +      ClusterFixture cluster = builder.build();
    +      ClientFixture client = cluster.clientFixture();
    +      testDir = dirTestWatcher.getTmpDir();
    +      cluster.defineWorkspace("dfs", "data", testDir.getAbsolutePath(), "json");
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.data.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.futureSummary().get().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      throw ex;
    --- End diff --
    
    Probably want to do a `fail()` here.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593969
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/AbstractBinaryRecordBatch.java ---
    @@ -65,11 +65,18 @@ protected boolean prefetchFirstBatchFromBothSides() {
           return false;
         }
     
    -    if (leftUpstream == IterOutcome.NONE && rightUpstream == IterOutcome.NONE) {
    +    if (checkForEarlyFinish()) {
           state = BatchState.DONE;
           return false;
         }
     
         return true;
       }
    +
    +  /*
    +   * Checks for the operator specific early terminal condition.
    +   * @return true if the further processing can stop.
    +   *         false if the further processing is needed.
    +   */
    +  protected abstract boolean checkForEarlyFinish();
    --- End diff --
    
    This is abstract, But, all implementations are identical. Can't we just move the implementation here?


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r161510835
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +339,55 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private static void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  public static void testWithEmptyJoin(File testDir, String joinType,
    --- End diff --
    
    Please, add java doc for new helper methods.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155938557
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java ---
    @@ -535,4 +537,25 @@ public void close() {
         }
         super.close();
       }
    +
    +  @Override
    +  protected boolean checkForEarlyFinish() {
    +    if (joinType == JoinRelType.INNER &&
    +        (leftUpstream == IterOutcome.NONE || rightUpstream == IterOutcome.NONE) ||
    +        joinType != JoinRelType.INNER &&
    +        (leftUpstream == IterOutcome.NONE && rightUpstream == IterOutcome.NONE)) {
    +      return true;
    +    }
    +    return false;
    --- End diff --
    
    See below. The if statement can be omitted.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939358
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/AbstractBinaryRecordBatch.java ---
    @@ -65,11 +65,20 @@ protected boolean prefetchFirstBatchFromBothSides() {
           return false;
         }
     
    -    if (leftUpstream == IterOutcome.NONE && rightUpstream == IterOutcome.NONE) {
    +    if (checkForEarlyFinish()) {
           state = BatchState.DONE;
    +      drainStream(leftUpstream, 0, left);
    +      drainStream(rightUpstream, 1, right);
    --- End diff --
    
    As above on whether draining is a good idea.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r160542923
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +339,55 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private static void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  public static void testWithEmptyJoin(File testDir, String joinType,
    +                                String joinPattern, long result) throws Exception {
    +    buildFile("dept.json", new String[0], testDir);
    +    String query = String.format(testEmptyJoin, joinType);
    +    testPlanMatchingPatterns(query, new String[]{joinPattern}, new String[]{});
    +    testBuilder()
    +            .sqlQuery(query)
    +            .unOrdered()
    +            .baselineColumns("cnt")
    +            .baselineValues(result)
    +            .build().run();
    +  }
    +
    +  @Test
    +  public void testNestedLeftJoinWithEmptyTable() throws Exception {
    +    try {
    +      test(DISABLE_NLJ_SCALAR);
    --- End diff --
    
    Please use `client.setSessionOption(...)` to make it clear what we are doing here.


---

[GitHub] drill issue #1059: DRILL-5851: Empty table during a join operation with a no...

Posted by HanumathRao <gi...@git.apache.org>.
Github user HanumathRao commented on the issue:

    https://github.com/apache/drill/pull/1059
  
    @amansinha100  These issues might not be because of this PR changes. I think, these are random issues which are being shown up in my branch. I heard from @Ben-Zvi that he also had some issues with empty table in his private branch. Anyway I have made a change to check for htable being not null and now all the tests are passing. 
    
    I will update this PR once I talk to @Ben-Zvi about the issue which he had seen in his private branch.
    



---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r160540854
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/AbstractRecordBatch.java ---
    @@ -228,4 +228,20 @@ public WritableBatch getWritableBatch() {
       public VectorContainer getOutgoingContainer() {
         throw new UnsupportedOperationException(String.format(" You should not call getOutgoingContainer() for class %s", this.getClass().getCanonicalName()));
       }
    +
    +  public void drainStream(IterOutcome stream, int input, RecordBatch batch) {
    +    if (stream == IterOutcome.OK_NEW_SCHEMA || stream == IterOutcome.OK) {
    +      for (final VectorWrapper<?> wrapper : batch) {
    +        wrapper.getValueVector().clear();
    +      }
    +      batch.kill(true);
    +      stream = next(input, batch);
    +      while (stream == IterOutcome.OK_NEW_SCHEMA || stream == IterOutcome.OK) {
    +        for (final VectorWrapper<?> wrapper : batch) {
    +          wrapper.getValueVector().clear();
    +        }
    +        stream = next(input, batch);
    +      }
    --- End diff --
    
    In one-on-one discussions it became clear that fixing this issue is out of the scope of this particular task. That's fine. Please just file a JIRA to track the potential problem so we don't forget about it.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r160542798
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +339,55 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private static void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  public static void testWithEmptyJoin(File testDir, String joinType,
    +                                String joinPattern, long result) throws Exception {
    +    buildFile("dept.json", new String[0], testDir);
    +    String query = String.format(testEmptyJoin, joinType);
    +    testPlanMatchingPatterns(query, new String[]{joinPattern}, new String[]{});
    +    testBuilder()
    +            .sqlQuery(query)
    +            .unOrdered()
    +            .baselineColumns("cnt")
    +            .baselineValues(result)
    +            .build().run();
    +  }
    +
    +  @Test
    +  public void testNestedLeftJoinWithEmptyTable() throws Exception {
    +    try {
    +      test(DISABLE_NLJ_SCALAR);
    +      testWithEmptyJoin(dirTestWatcher.getRootDir(), "left outer", nlpattern, 1155L);
    +    } finally {
    +      test(RESET_HJ);
    --- End diff --
    
    Please use `client.resetSessionOption(...)`.


---

[GitHub] drill issue #1059: DRILL-5851: Empty table during a join operation with a no...

Posted by HanumathRao <gi...@git.apache.org>.
Github user HanumathRao commented on the issue:

    https://github.com/apache/drill/pull/1059
  
    @vdiravka  Thank you for the review comments.
    
    I have done the needed changes.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155938551
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java ---
    @@ -535,4 +537,25 @@ public void close() {
         }
         super.close();
       }
    +
    +  @Override
    +  protected boolean checkForEarlyFinish() {
    +    if (joinType == JoinRelType.INNER &&
    +        (leftUpstream == IterOutcome.NONE || rightUpstream == IterOutcome.NONE) ||
    +        joinType != JoinRelType.INNER &&
    +        (leftUpstream == IterOutcome.NONE && rightUpstream == IterOutcome.NONE)) {
    +      return true;
    +    }
    +    return false;
    +  }
    +
    +  private boolean isFurtherProcessingRequired(IterOutcome upStream) {
    +    if (upStream == IterOutcome.OUT_OF_MEMORY ||
    +        upStream == IterOutcome.NONE ||
    +        upStream == IterOutcome.NOT_YET ||
    +        upStream == IterOutcome.STOP) {
    +      return false;
    +    }
    +    return true;
    --- End diff --
    
    The classic way to do this would be:
    
    ```
    return upStream == IterOutcome.OK || upStream == IterOutcome.OK_NEW_SCHEMA;
    ```
    
    That is, express the return value as a Boolean expression. Also, identify the states that need processing, rather than those that don't.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593930
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +341,74 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testNestedLeftJoinWithEmptyTable() throws Exception {
    +    setSessionOption("planner.enable_join_optimization", false);
    +    setSessionOption("planner.enable_nljoin_for_scalar_only", false);
    +    setSessionOption("planner.enable_hashjoin", false);
    +    setSessionOption("planner.enable_mergejoin", false);
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.run().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      fail (ex.getMessage());
    +    } finally {
    +      resetSessionOption("planner.enable_join_optimization");
    +      resetSessionOption("planner.enable_nljoin_for_scalar_only");
    +      resetSessionOption("planner.enable_hashjoin");
    +      resetSessionOption("planner.enable_mergejoin");
    +    }
    +  }
    +
    +  @Test
    +  public void testNestedInnerJoinWithEmptyTable() throws Exception {
    +    setSessionOption("planner.enable_nljoin_for_scalar_only", false);
    --- End diff --
    
    Same comment as above: duplicate code. Here we see the problem with copies. This test, and the next one, set three options. The prior one sets four options. Which is right?


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939472
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java ---
    @@ -333,4 +342,96 @@ public void testNlJoinWithLargeRightInputSuccess() throws Exception {
           test(RESET_JOIN_OPTIMIZATION);
         }
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testNestedLeftJoinWithEmptyTable() throws Exception {
    +    test(DISABLE_JOIN_OPTIMIZATION);
    +    test(DISABLE_NLJ_SCALAR);
    +    test(DISABLE_HJ);
    +    test(DISABLE_MJ);
    +    ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
    +    File testDir = null;
    +    try {
    +      ClusterFixture cluster = builder.build();
    +      ClientFixture client = cluster.clientFixture();
    +      testDir = dirTestWatcher.getTmpDir();
    +      cluster.defineWorkspace("dfs", "data", testDir.getAbsolutePath(), "json");
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.data.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.futureSummary().get().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      throw ex;
    +    } finally {
    +      if (testDir != null) {
    +        testDir.delete();
    +      }
    +      test(RESET_JOIN_OPTIMIZATION);
    +      test(ENABLE_NLJ_SCALAR);
    +      test(ENABLE_HJ);
    +      test(ENABLE_MJ);
    --- End diff --
    
    The above probably assumes the value for each option. This has caused problems, especially during failures. Suggest to instead use `resetSessionOption("option.name")`


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593785
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -160,4 +166,51 @@ public void testJoinWithMapAndDotField() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testHashLeftJoinWithEmptyTable() throws Exception {
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.run().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      fail(ex.getMessage());
    +    }
    +  }
    +
    +  @Test
    +  public void testHashInnerJoinWithEmptyTable() throws Exception {
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp inner join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.run().recordCount() == 0);
    +    } catch (RuntimeException ex) {
    +      fail(ex.getMessage());
    +    }
    +  }
    +
    +  @Test
    +  public void testHashRightJoinWithEmptyTable() throws Exception {
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    --- End diff --
    
    All three tests use the same server setup. The current code starts the server three times. That's fine if the tests needed a different config for each. But, here it would be faster to derive from `ClusterTest` and use the methods provided to start the server once, sharing the same server for all tests.
    
    Upon further inspection, there is a deeper problem. Two servers are running. See note below.


---

[GitHub] drill issue #1059: DRILL-5851: Empty table during a join operation with a no...

Posted by HanumathRao <gi...@git.apache.org>.
Github user HanumathRao commented on the issue:

    https://github.com/apache/drill/pull/1059
  
    @paul-rogers  Thanks for the review. I have moved the testcase to one test file and called it from multiple places. I have also made changes to move the code in checkForEarlyExit to base class. Please let me know if any other changes are required. thanks.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593827
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoinAdvanced.java ---
    @@ -253,4 +259,52 @@ public void testDrill4196() throws Exception {
           .baselineValues(6000*800L)
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testMergeLeftJoinWithEmptyTable() throws Exception {
    +    try (
    +      ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.run().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      fail(ex.getMessage());
    +    }
    +  }
    +
    +  @Test
    +  public void testMergeInnerJoinWithEmptyTable() throws Exception {
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp inner join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.run().recordCount() == 0);
    +    } catch (RuntimeException ex) {
    +      fail(ex.getMessage());
    +    }
    +  }
    +
    +  @Test
    +  public void testMergeRightJoinWithEmptyTable() throws Exception {
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp right outer join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert( query.run().recordCount() == 0);
    +    } catch (RuntimeException ex) {
    +      fail (ex.getMessage());
    --- End diff --
    
    Suggestion: this is the sixth copy of basically the same test, with only the query text differing. Again, duplicate code is costly and is to be avoided. Now, once the tests convert to using a single server, most of the code here will disappear. More will disappear if the test file is created once. Otherwise, it would be better to define a test function, then simply have each test call the test function with the required statement.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by HanumathRao <gi...@git.apache.org>.
Github user HanumathRao commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158356421
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/TestNullInputMiniPlan.java ---
    @@ -323,8 +323,6 @@ public void testHashJoinLeftEmpty() throws Exception {
             .build();
     
         BatchSchema expectedSchema = new SchemaBuilder()
    -        .addNullable("a", TypeProtos.MinorType.BIGINT)
    -        .addNullable("b", TypeProtos.MinorType.BIGINT)
    --- End diff --
    
    @paul-rogers Thank you for the review. I have changed the code to make this case work. I have made changes to incorporate the earlier comments. Can you please review the changes.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r162292685
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -19,20 +19,22 @@
     package org.apache.drill.exec.physical.impl.join;
     
     
    -import org.apache.drill.test.BaseTestQuery;
     import org.apache.drill.categories.OperatorTest;
     import org.apache.drill.categories.UnlikelyTest;
     import org.junit.AfterClass;
     import org.junit.BeforeClass;
     import org.junit.Test;
     import org.junit.experimental.categories.Category;
    -
     import java.io.BufferedWriter;
     import java.io.File;
     import java.io.FileWriter;
     
    +
     @Category(OperatorTest.class)
    -public class TestHashJoinAdvanced extends BaseTestQuery {
    +public class TestHashJoinAdvanced extends JoinTestBase {
    +
    +  private static String hjpattern = "HashJoin";
    --- End diff --
    
    --> final, HJ_PATTERN


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939428
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -160,4 +166,75 @@ public void testJoinWithMapAndDotField() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testHashLeftJoinWithEmptyTable() throws Exception {
    +    ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
    +    File testDir = null;
    +    try {
    +      ClusterFixture cluster = builder.build();
    +      ClientFixture client = cluster.clientFixture();
    +      testDir = dirTestWatcher.getTmpDir();
    +      cluster.defineWorkspace("dfs", "data", testDir.getAbsolutePath(), "json");
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.data.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.futureSummary().get().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      throw ex;
    +    } finally {
    +      if (testDir != null) {
    +        testDir.delete();
    --- End diff --
    
    I think the whole point of the dir test watcher is that it will do this cleanup.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r161512862
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -160,4 +165,19 @@ public void testJoinWithMapAndDotField() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    +
    +  @Test
    +  public void testHashLeftJoinWithEmptyTable() throws Exception {
    +    TestNestedLoopJoin.testWithEmptyJoin(dirTestWatcher.getRootDir(), "left outer", hjpattern, 1155L);
    --- End diff --
    
    It is not so good to use helper methods from `TestNestedLoopJoin` in `TestHashJoinAdvanced` class. 
    Could you factor out the common part of code (helper methods and constants) into the Base class, for example `JoinTestBase`?


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939506
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/TestNullInputMiniPlan.java ---
    @@ -323,8 +323,6 @@ public void testHashJoinLeftEmpty() throws Exception {
             .build();
     
         BatchSchema expectedSchema = new SchemaBuilder()
    -        .addNullable("a", TypeProtos.MinorType.BIGINT)
    -        .addNullable("b", TypeProtos.MinorType.BIGINT)
    --- End diff --
    
    I wonder about this. This test came out of the "empty batches" project which attempted to handle empty inputs. If this PR finds the need to change the tests, then we are changing semantics of how we handle schemas. Are we sure we want to make this change?
    
    If we are changing the schema, are we saying that if we get no left input (but do get a schema), that we won't product the joined schema? This would seem to be a bug rather than a feature.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by HanumathRao <gi...@git.apache.org>.
Github user HanumathRao commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158992589
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java ---
    @@ -305,11 +307,15 @@ public void executeBuildPhase() throws SchemaChangeException, ClassTransformatio
         //Setup the underlying hash table
     
         // skip first batch if count is zero, as it may be an empty schema batch
    -    if (right.getRecordCount() == 0) {
    +    if (isFurtherProcessingRequired(rightUpstream) && right.getRecordCount() == 0) {
           for (final VectorWrapper<?> w : right) {
             w.clear();
           }
           rightUpstream = next(right);
    +      if (isFurtherProcessingRequired(rightUpstream) &&
    +          right.getRecordCount() > 0 && hashTable == null) {
    +        setupHashTable();
    --- End diff --
    
    Yes, there were some existing changes which were failing if we haven't done this check.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by Ben-Zvi <gi...@git.apache.org>.
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r163045737
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java ---
    @@ -136,7 +136,9 @@ public void executeProbePhase() throws SchemaChangeException {
               case OK_NEW_SCHEMA:
                 if (probeBatch.getSchema().equals(probeSchema)) {
                   doSetup(outgoingJoinBatch.getContext(), buildBatch, probeBatch, outgoingJoinBatch);
    -              hashTable.updateBatches();
    +              if (hashTable != null) {
    +                hashTable.updateBatches();
    +              }
    --- End diff --
    
    Fine.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r162292084
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoinAdvanced.java ---
    @@ -38,13 +37,16 @@
     import java.util.Random;
     
     @Category(OperatorTest.class)
    -public class TestMergeJoinAdvanced extends BaseTestQuery {
    +public class TestMergeJoinAdvanced extends JoinTestBase {
       private static final String LEFT = "merge-join-left.json";
       private static final String RIGHT = "merge-join-right.json";
    +  private static String mjpattern = "MergeJoin";
    --- End diff --
    
    Make it as a constant --> private static final String MJ_PATTERN = "MergeJoin";


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593774
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -160,4 +166,51 @@ public void testJoinWithMapAndDotField() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testHashLeftJoinWithEmptyTable() throws Exception {
    --- End diff --
    
    Should the tests be repeated for
    
    * Empty on left
    * Empty on right
    * Empty on both



---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r162294377
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/JoinTestBase.java ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl.join;
    +
    +import org.apache.drill.categories.OperatorTest;
    +import org.apache.drill.PlanTestBase;
    +import org.apache.drill.common.exceptions.UserRemoteException;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +import java.io.File;
    +import java.io.FileWriter;
    +import java.io.IOException;
    +import java.io.PrintWriter;
    +import java.nio.file.Paths;
    +import org.apache.drill.exec.planner.physical.PlannerSettings;
    +import static org.hamcrest.CoreMatchers.containsString;
    +import static org.junit.Assert.assertThat;
    +
    +
    +@Category(OperatorTest.class)
    +public class JoinTestBase extends PlanTestBase {
    +
    +  private static final String testEmptyJoin = "select count(*) as cnt from cp.`employee.json` emp %s join dfs.`dept.json` " +
    +          "as dept on dept.manager = emp.`last_name`";
    +
    +  /**
    +   * This function runs a join query with one of the table generated as an
    --- End diff --
    
    function -> method


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593793
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -160,4 +166,51 @@ public void testJoinWithMapAndDotField() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testHashLeftJoinWithEmptyTable() throws Exception {
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.run().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      fail(ex.getMessage());
    +    }
    +  }
    +
    +  @Test
    +  public void testHashInnerJoinWithEmptyTable() throws Exception {
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp inner join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.run().recordCount() == 0);
    +    } catch (RuntimeException ex) {
    +      fail(ex.getMessage());
    +    }
    +  }
    +
    +  @Test
    +  public void testHashRightJoinWithEmptyTable() throws Exception {
    +    try (ClusterFixture cluster = ClusterFixture.builder(dirTestWatcher).build();
    +      ClientFixture client = cluster.clientFixture()) {
    +      File testDir = dirTestWatcher.getRootDir();
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp right outer join dfs.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.run().recordCount() == 0);
    +    } catch (RuntimeException ex) {
    +      fail(ex.getMessage());
    --- End diff --
    
    Not really necessary to catch the exception. Just let the test throw it. JUnit will mark the test as failed if it throws an exception.


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939440
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java ---
    @@ -160,4 +166,75 @@ public void testJoinWithMapAndDotField() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    +
    +  private void buildFile(String fileName, String[] data, File testDir) throws IOException {
    +    try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, fileName)))) {
    +      for (String line : data) {
    +        out.println(line);
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testHashLeftJoinWithEmptyTable() throws Exception {
    +    ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
    +    File testDir = null;
    +    try {
    +      ClusterFixture cluster = builder.build();
    +      ClientFixture client = cluster.clientFixture();
    +      testDir = dirTestWatcher.getTmpDir();
    +      cluster.defineWorkspace("dfs", "data", testDir.getAbsolutePath(), "json");
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp left outer join dfs.data.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.futureSummary().get().recordCount() == 1155);
    +    } catch (RuntimeException ex) {
    +      throw ex;
    +    } finally {
    +      if (testDir != null) {
    +        testDir.delete();
    +      }
    +    }
    +  }
    +
    +  @Test
    +  public void testHashInnerJoinWithEmptyTable() throws Exception {
    +    ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
    +    File testDir = null;
    +    try {
    +      ClusterFixture cluster = builder.build();
    +      ClientFixture client = cluster.clientFixture();
    +      testDir = dirTestWatcher.getTmpDir();
    +      cluster.defineWorkspace("dfs", "data", testDir.getAbsolutePath(), "json");
    +      buildFile("dept.json", new String[0], testDir);
    +      QueryBuilder query = client.queryBuilder().sql("select * from cp.`employee.json` emp inner join dfs.data.`dept.json` as dept on dept.manager = emp.`last_name`");
    +      assert(query.futureSummary().get().recordCount() == 0);
    --- End diff --
    
    I think you want `query.run().recordCount()` which runs the query synchronously and returns a query summary upon completion.


---

[GitHub] drill issue #1059: DRILL-5851: Empty table during a join operation with a no...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on the issue:

    https://github.com/apache/drill/pull/1059
  
    @HanumathRao can you rebase your PR on the latest master and run the functional tests ?  I saw the following test failures which seem related to these changes: 
    Execution Failures:
    /root/drillAutomation/framework-master/framework/resources/Functional/schema_change_empty_batch/maprdb/binary_hbase/emptyMaprDBLeftJoin.sql
    /root/drillAutomation/framework-master/framework/resources/Functional/schema_change_empty_batch/hbase/emptyHbaseLeftJoin.sql
    /root/drillAutomation/framework-master/framework/resources/Functional/schema_change_empty_batch/hbase/emptyHbaseRightJoin.sql
    /root/drillAutomation/framework-master/framework/resources/Functional/schema_change_empty_batch/maprdb/binary_hbase/emptyMaprDBRightJoin.sql


---

[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939394
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/AbstractRecordBatch.java ---
    @@ -228,4 +228,20 @@ public WritableBatch getWritableBatch() {
       public VectorContainer getOutgoingContainer() {
         throw new UnsupportedOperationException(String.format(" You should not call getOutgoingContainer() for class %s", this.getClass().getCanonicalName()));
       }
    +
    +  public void drainStream(IterOutcome stream, int input, RecordBatch batch) {
    +    if (stream == IterOutcome.OK_NEW_SCHEMA || stream == IterOutcome.OK) {
    +      for (final VectorWrapper<?> wrapper : batch) {
    +        wrapper.getValueVector().clear();
    +      }
    +      batch.kill(true);
    +      stream = next(input, batch);
    +      while (stream == IterOutcome.OK_NEW_SCHEMA || stream == IterOutcome.OK) {
    +        for (final VectorWrapper<?> wrapper : batch) {
    +          wrapper.getValueVector().clear();
    +        }
    +        stream = next(input, batch);
    +      }
    --- End diff --
    
    Let's think a bit about this. Each fragment is synchronous and resides in a single thread. The `kill()` call will tell the upstream batch that we don't want any more batches. Under what conditions would the upstream operator ignore our request and still send us more batches?
    
    Given that the upstream batch is in the same thread, there is no race condition issues. That is, the upstream can't be busy producing batches and adding them to a queue. Why? It is in the same thread and the thread is executing here.
    
    If the upstream is a network receiver, then the network layer should handle the race conditions so we don't expose those issues to the entire operator stack.
    
    Given all of this, I wonder, was this tested? Can it be tested? How can we verify that the mechanism actually works other than trying it in production? Any way to unit test this?


---