You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by vdiravka <gi...@git.apache.org> on 2018/01/04 16:38:46 UTC

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

GitHub user vdiravka opened a pull request:

    https://github.com/apache/drill/pull/1083

    DRILL-4185: UNION ALL involving empty directory on any side of union …

    …all results in Failed query
    SchemaLessTable, SchemaLessScan, SchemaLessBatchCreator, SchemaLessBatch classes are inroduced. 
    The main idea that empty directory is a Drill (SchemaLess) table.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vdiravka/drill DRILL-4185

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1083
    
----
commit 0aeb9ecc94a25fd19002ffdacdbfd86ec4a8a90f
Author: Vitalii Diravka <vi...@...>
Date:   2017-12-01T20:48:05Z

    DRILL-4185: UNION ALL involving empty directory on any side of union all results in Failed query

----


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160685002
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java ---
    @@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, FileStatus dir) throws IOExcep
         }
     
         boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
    -      Path p = new Path(dir.getPath(), ParquetFileWriter.PARQUET_METADATA_FILE);
           try {
    -        if (fs.exists(p)) {
    -          return true;
    -        } else {
    -
    -          if (metaDataFileExists(fs, dir)) {
    -            return true;
    -          }
    -          List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, dir.getPath(), false);
    -          return !statuses.isEmpty() && super.isFileReadable(fs, statuses.get(0));
    -        }
    +        // There should be at least one file, which is readable by Drill
    +        List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, dir.getPath(), false);
    +        return !statuses.isEmpty() && super.isFileReadable(fs, statuses.get(0));
    --- End diff --
    
    I did it on purpose. With the old logic of isDirReadable() method an empty directory, which contains parquet metadata files, will be processes with ParquetGroupScan as a Parquet Table. It leads to obtaining an exception:
    https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java#L878
    
    To process such table with SchemalessScan, isReadable method should return false for that case. In other words it shouldn't check availability of metadata cache files, but only really readable files by Drill.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160717181
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java ---
    @@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, FileStatus dir) throws IOExcep
         }
     
         boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
    -      Path p = new Path(dir.getPath(), ParquetFileWriter.PARQUET_METADATA_FILE);
           try {
    -        if (fs.exists(p)) {
    -          return true;
    -        } else {
    -
    -          if (metaDataFileExists(fs, dir)) {
    -            return true;
    -          }
    -          List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, dir.getPath(), false);
    -          return !statuses.isEmpty() && super.isFileReadable(fs, statuses.get(0));
    -        }
    +        // There should be at least one file, which is readable by Drill
    +        List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, dir.getPath(), false);
    +        return !statuses.isEmpty() && super.isFileReadable(fs, statuses.get(0));
    --- End diff --
    
    1. How we know that we have file in nested directory if filter checks with recursive flag false?
    2. The `isReadable` method uses implicit assumption that if metadata file exists, there is some data in folder. Plus comment in this file also  points out that the same logic is used in `isDirReadable`. Since you are removing this logic from `isDirReadable` please make sure the logic is synced between two methods or at least comment is updated to reflect new approach.



---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160997779
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java ---
    @@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, FileStatus dir) throws IOExcep
         }
     
         boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
    -      Path p = new Path(dir.getPath(), ParquetFileWriter.PARQUET_METADATA_FILE);
           try {
    -        if (fs.exists(p)) {
    -          return true;
    -        } else {
    -
    -          if (metaDataFileExists(fs, dir)) {
    -            return true;
    -          }
    -          List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, dir.getPath(), false);
    -          return !statuses.isEmpty() && super.isFileReadable(fs, statuses.get(0));
    -        }
    +        // There should be at least one file, which is readable by Drill
    +        List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, dir.getPath(), false);
    +        return !statuses.isEmpty() && super.isFileReadable(fs, statuses.get(0));
    --- End diff --
    
    I have reverted my changes here. SchemalessScan will be created if ParquetGroupScan has empty entries and can't be used. The info that parquet metadata files are invalid will be logged.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160663899
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java ---
    @@ -1197,4 +1197,64 @@ public void testFieldWithDots() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    -}
    \ No newline at end of file
    +
    +  @Test
    +  public void testUnionAllRightEmptyDir() throws Exception {
    --- End diff --
    
    1. Union works fine too. I have added similar tests to TestUnionDistinct.java class. Thanks
    2. Test case is added. The result is the same as for querying single empty dir.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1083


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160525303
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java ---
    @@ -119,6 +125,15 @@ public static void setupDefaultTestCluster() throws Exception {
         // turns on the verbose errors in tests
         // sever side stacktraces are added to the message before sending back to the client
         test("ALTER SESSION SET `exec.errors.verbose` = true");
    +    emptyDirCreating();
    --- End diff --
    
    Why do we create empty directory for all tests that extend this class? I guess need to create it only for those tests that need it.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160666274
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java ---
    @@ -119,6 +125,15 @@ public static void setupDefaultTestCluster() throws Exception {
         // turns on the verbose errors in tests
         // sever side stacktraces are added to the message before sending back to the client
         test("ALTER SESSION SET `exec.errors.verbose` = true");
    +    emptyDirCreating();
    +  }
    +
    +  /**
    +   * Creates an empty directory under <b>dfs.root</b> schema.
    +   */
    +  private static void emptyDirCreating() {
    --- End diff --
    
    the method is deleted due to the previous comment


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160936248
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/SchemalessScan.java ---
    @@ -0,0 +1,90 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.base;
    +
    +import com.fasterxml.jackson.annotation.JsonTypeName;
    +import com.google.common.base.Preconditions;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.common.expression.SchemaPath;
    +import org.apache.drill.exec.physical.PhysicalOperatorSetupException;
    +import org.apache.drill.exec.planner.logical.DynamicDrillTable;
    +import org.apache.drill.exec.proto.CoordinationProtos;
    +
    +import java.util.List;
    +
    +/**
    + *  The type of scan operator, which allows to scan schemaless tables ({@link DynamicDrillTable} with null selection)
    + */
    +@JsonTypeName("schemaless-scan")
    +public class SchemalessScan extends AbstractGroupScan implements SubScan {
    +
    +  private String selectionRoot;
    --- End diff --
    
    Done


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160525146
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java ---
    @@ -119,6 +125,15 @@ public static void setupDefaultTestCluster() throws Exception {
         // turns on the verbose errors in tests
         // sever side stacktraces are added to the message before sending back to the client
         test("ALTER SESSION SET `exec.errors.verbose` = true");
    +    emptyDirCreating();
    +  }
    +
    +  /**
    +   * Creates an empty directory under <b>dfs.root</b> schema.
    +   */
    +  private static void emptyDirCreating() {
    --- End diff --
    
    Please rename -> `createEmptyDirectory`.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160663989
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/TestEmptyInputSql.java ---
    @@ -177,4 +177,33 @@ public void testQueryEmptyCsv() throws Exception {
             .run();
       }
     
    +  @Test
    +  public void testEmptyDirectory() throws Exception {
    +    final BatchSchema expectedSchema = new SchemaBuilder().build();
    +
    +    testBuilder()
    +        .sqlQuery("select * from dfs.`%s`", EMPTY_DIR_NAME)
    +        .schemaBaseLine(expectedSchema)
    +        .build()
    +        .run();
    +  }
    +
    +  @Test
    +  public void testEmptyDirectoryAndFieldInQuery() throws Exception {
    +    final List<Pair<SchemaPath, TypeProtos.MajorType>> expectedSchema = Lists.newArrayList();
    +    final TypeProtos.MajorType majorType = TypeProtos.MajorType.newBuilder()
    +        .setMinorType(TypeProtos.MinorType.INT) // field "key" is absent in schema-less table
    --- End diff --
    
    Done


---

[GitHub] drill issue #1083: DRILL-4185: UNION ALL involving empty directory on any si...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1083
  
    +1, LGTM. Thanks for making the changes.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160664191
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetGroupScan.java ---
    @@ -56,65 +56,50 @@ private void prepareTables(final String tableName, boolean refreshMetadata) thro
       public void testFix4376() throws Exception {
         prepareTables("4376_1", true);
     
    -    testBuilder()
    -      .sqlQuery("SELECT COUNT(*) AS `count` FROM dfs.tmp.`4376_1/60*`")
    -      .ordered()
    -      .baselineColumns("count").baselineValues(1984L)
    -      .go();
    +    int actualRecordCount = testSql("SELECT * FROM dfs.tmp.`4376_1/60*`");
    +    int expectedRecordCount = 1984;
    +    assertEquals(String.format("Received unexpected number of rows in output: expected=%d, received=%s",
    --- End diff --
    
    Done


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160663414
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/SchemalessScan.java ---
    @@ -0,0 +1,80 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.base;
    +
    +import com.fasterxml.jackson.annotation.JsonTypeName;
    +import com.google.common.base.Preconditions;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.common.expression.SchemaPath;
    +import org.apache.drill.exec.physical.PhysicalOperatorSetupException;
    +import org.apache.drill.exec.planner.logical.DynamicDrillTable;
    +import org.apache.drill.exec.proto.CoordinationProtos;
    +
    +import java.util.List;
    +
    +/**
    + *  The type of scan operator, which allows to scan schemaless tables ({@link DynamicDrillTable} with null selection)
    + */
    +@JsonTypeName("schemaless-scan")
    +public class SchemalessScan extends AbstractGroupScan implements SubScan {
    +
    +  public SchemalessScan(String userName) {
    +    super(userName);
    +  }
    +
    +  public SchemalessScan(AbstractGroupScan that) {
    +    super(that);
    +  }
    +
    +  @Override
    +  public void applyAssignments(List<CoordinationProtos.DrillbitEndpoint> endpoints) throws PhysicalOperatorSetupException {
    +  }
    +
    +  @Override
    +  public SubScan getSpecificScan(int minorFragmentId) throws ExecutionSetupException {
    +    return this;
    +  }
    +
    +  @Override
    +  public int getMaxParallelizationWidth() {
    +    return 1;
    +  }
    +
    +  @Override
    +  public String getDigest() {
    +    return toString();
    --- End diff --
    
    Thanks. Forgot to override this.
    
    For now toString() is overridden. And to show selectionRoot I have replaced null file selection with current File Selection (with `emptyDirectory` flag, which indicates whether this FileSelection is an empty directory).
    
    Profiles - > Physical Plan shows the following info:
    `Scan(groupscan=[SchemalessScan [selectionRoot = file:/home/vitalii/Documents/parquet_for_union/folder2]]) : rowType = (DrillRecordRow[*, value]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 334`


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160525590
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetGroupScan.java ---
    @@ -56,65 +56,50 @@ private void prepareTables(final String tableName, boolean refreshMetadata) thro
       public void testFix4376() throws Exception {
         prepareTables("4376_1", true);
     
    -    testBuilder()
    -      .sqlQuery("SELECT COUNT(*) AS `count` FROM dfs.tmp.`4376_1/60*`")
    -      .ordered()
    -      .baselineColumns("count").baselineValues(1984L)
    -      .go();
    +    int actualRecordCount = testSql("SELECT * FROM dfs.tmp.`4376_1/60*`");
    +    int expectedRecordCount = 1984;
    +    assertEquals(String.format("Received unexpected number of rows in output: expected=%d, received=%s",
    --- End diff --
    
    `expected=%d` -> `expected = %d`. Please correct here and in other cases.
      


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160524662
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/SchemalessScan.java ---
    @@ -0,0 +1,80 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.base;
    +
    +import com.fasterxml.jackson.annotation.JsonTypeName;
    +import com.google.common.base.Preconditions;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.common.expression.SchemaPath;
    +import org.apache.drill.exec.physical.PhysicalOperatorSetupException;
    +import org.apache.drill.exec.planner.logical.DynamicDrillTable;
    +import org.apache.drill.exec.proto.CoordinationProtos;
    +
    +import java.util.List;
    +
    +/**
    + *  The type of scan operator, which allows to scan schemaless tables ({@link DynamicDrillTable} with null selection)
    + */
    +@JsonTypeName("schemaless-scan")
    +public class SchemalessScan extends AbstractGroupScan implements SubScan {
    +
    +  public SchemalessScan(String userName) {
    +    super(userName);
    +  }
    +
    +  public SchemalessScan(AbstractGroupScan that) {
    +    super(that);
    +  }
    +
    +  @Override
    +  public void applyAssignments(List<CoordinationProtos.DrillbitEndpoint> endpoints) throws PhysicalOperatorSetupException {
    +  }
    +
    +  @Override
    +  public SubScan getSpecificScan(int minorFragmentId) throws ExecutionSetupException {
    +    return this;
    +  }
    +
    +  @Override
    +  public int getMaxParallelizationWidth() {
    +    return 1;
    +  }
    +
    +  @Override
    +  public String getDigest() {
    +    return toString();
    --- End diff --
    
    What value `toString` will return?


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160711830
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/SchemalessScan.java ---
    @@ -0,0 +1,90 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.base;
    +
    +import com.fasterxml.jackson.annotation.JsonTypeName;
    +import com.google.common.base.Preconditions;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.common.expression.SchemaPath;
    +import org.apache.drill.exec.physical.PhysicalOperatorSetupException;
    +import org.apache.drill.exec.planner.logical.DynamicDrillTable;
    +import org.apache.drill.exec.proto.CoordinationProtos;
    +
    +import java.util.List;
    +
    +/**
    + *  The type of scan operator, which allows to scan schemaless tables ({@link DynamicDrillTable} with null selection)
    + */
    +@JsonTypeName("schemaless-scan")
    +public class SchemalessScan extends AbstractGroupScan implements SubScan {
    +
    +  private String selectionRoot;
    --- End diff --
    
    final


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160526940
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java ---
    @@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, FileStatus dir) throws IOExcep
         }
     
         boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
    -      Path p = new Path(dir.getPath(), ParquetFileWriter.PARQUET_METADATA_FILE);
           try {
    -        if (fs.exists(p)) {
    -          return true;
    -        } else {
    -
    -          if (metaDataFileExists(fs, dir)) {
    -            return true;
    -          }
    -          List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, dir.getPath(), false);
    -          return !statuses.isEmpty() && super.isFileReadable(fs, statuses.get(0));
    -        }
    +        // There should be at least one file, which is readable by Drill
    +        List<FileStatus> statuses = DrillFileSystemUtil.listFiles(fs, dir.getPath(), false);
    +        return !statuses.isEmpty() && super.isFileReadable(fs, statuses.get(0));
    --- End diff --
    
    Why did you remove `metaDataFileExists(fs, dir)` check? `DrillFileSystemUtil` won't list dot files.
     Plus it seems you break the assumptions highlighted in `isReadable` method above.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160666020
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java ---
    @@ -119,6 +125,15 @@ public static void setupDefaultTestCluster() throws Exception {
         // turns on the verbose errors in tests
         // sever side stacktraces are added to the message before sending back to the client
         test("ALTER SESSION SET `exec.errors.verbose` = true");
    +    emptyDirCreating();
    --- End diff --
    
    Agree.
    I have changed it. Empty directory is created in the scope of one test. But to avoid duplicating code, when in the class there are several test cases with using empty dir, the last is created once for entire class. 


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160525739
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/TestEmptyInputSql.java ---
    @@ -177,4 +177,33 @@ public void testQueryEmptyCsv() throws Exception {
             .run();
       }
     
    +  @Test
    +  public void testEmptyDirectory() throws Exception {
    +    final BatchSchema expectedSchema = new SchemaBuilder().build();
    +
    +    testBuilder()
    +        .sqlQuery("select * from dfs.`%s`", EMPTY_DIR_NAME)
    +        .schemaBaseLine(expectedSchema)
    +        .build()
    +        .run();
    +  }
    +
    +  @Test
    +  public void testEmptyDirectoryAndFieldInQuery() throws Exception {
    +    final List<Pair<SchemaPath, TypeProtos.MajorType>> expectedSchema = Lists.newArrayList();
    +    final TypeProtos.MajorType majorType = TypeProtos.MajorType.newBuilder()
    +        .setMinorType(TypeProtos.MinorType.INT) // field "key" is absent in schema-less table
    --- End diff --
    
    schema-less -> schemaless


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r161205848
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
    @@ -568,6 +570,22 @@ public void nullMixedComparatorEqualJoinHelper(final String query) throws Except
             .go();
       }
     
    +  /** InnerJoin with empty dir table on nullable cols, MergeJoin */
    +  // TODO: the same tests should be added for HashJoin operator, DRILL-6070
    +  @Test
    --- End diff --
    
    Please add test for merge join with ignore annotation. Also please check that NLJ also works fine. If not, please fix.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160996060
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java ---
    @@ -424,6 +429,23 @@ public MetadataContext getMetaContext() {
         return metaContext;
       }
     
    +  /**
    +   * @return true if this file selectionRoot points to an empty directory, false otherwise
    +   */
    +  public boolean isEmptyDirectory() {
    +    return emptyDirectory;
    +  }
    +
    +  /**
    +   * Setting this as true allows to identify this as empty directory file selection
    +   *
    +   * @param emptyDirectory empty directory flag
    +   */
    +  public void setEmptyDirectory(boolean emptyDirectory) {
    --- End diff --
    
    Done


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160525926
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java ---
    @@ -1197,4 +1197,64 @@ public void testFieldWithDots() throws Exception {
           .baselineValues("1", "2", "1", null, "a")
           .go();
       }
    -}
    \ No newline at end of file
    +
    +  @Test
    +  public void testUnionAllRightEmptyDir() throws Exception {
    --- End diff --
    
    1. Does union (aka union distinct) also work fine?
    2. Case when bot directories in union / union all are empty?
      


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160524828
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java ---
    @@ -78,19 +78,24 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws ValidationException, RelConv
     
           final Table table = schema.getTable(tableName);
     
    -      if(table == null){
    +      if(table == null) {
    --- End diff --
    
    Please add space before `if`, here and below.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160712431
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java ---
    @@ -424,6 +429,23 @@ public MetadataContext getMetaContext() {
         return metaContext;
       }
     
    +  /**
    +   * @return true if this file selectionRoot points to an empty directory, false otherwise
    +   */
    +  public boolean isEmptyDirectory() {
    +    return emptyDirectory;
    +  }
    +
    +  /**
    +   * Setting this as true allows to identify this as empty directory file selection
    +   *
    +   * @param emptyDirectory empty directory flag
    +   */
    +  public void setEmptyDirectory(boolean emptyDirectory) {
    --- End diff --
    
    Please use set empty directory without flag.


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r160663507
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java ---
    @@ -78,19 +78,24 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws ValidationException, RelConv
     
           final Table table = schema.getTable(tableName);
     
    -      if(table == null){
    +      if(table == null) {
    --- End diff --
    
    Done


---

[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r165023581
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
    @@ -568,6 +570,22 @@ public void nullMixedComparatorEqualJoinHelper(final String query) throws Except
             .go();
       }
     
    +  /** InnerJoin with empty dir table on nullable cols, MergeJoin */
    +  // TODO: the same tests should be added for HashJoin operator, DRILL-6070
    +  @Test
    --- End diff --
    
    The bug was founded for NLJ and empty tables. I have resolved that issue.
    The separate test class is added for empty dir tables and different join operators.
    
    Also I have made refactoring for the TestHashJoinAdvanced, TestMergeJoinAdvanced, TestNestedLoopJoin classes.


---