You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jinfeng Ni <jn...@maprtech.com> on 2015/04/01 22:00:18 UTC

Re: Review Request 30701: DRILL-2173 partition queries for dynamic partition pruning

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30701/#review78562
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DirectoryExplorers.java
<https://reviews.apache.org/r/30701/#comment127434>

    Not sure if the case-insensitive comparison should be used as default, or it should depend on the schema (i.e HBase schema could use a different sensitive policy from FileSystemSchema, etc), or it should be passed in as a parameter of udf maxdir(). In the query 
    " select * 
      from dfs.my_workspace.data_directory 
      where dir0 in (select MAX(dir0) from dfs.my_workspace.data_directory)"
    
    Aggregate function max() could use case sensitive string comparison. If this maxdir UDF chooses to use case-insensitive, then after partition pruning, it might return different query results.



exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorer.java
<https://reviews.apache.org/r/30701/#comment127432>

    What's the purpose of passing partitionColumns and partitionValues? In FileSystemSchema or WorkspaceSchema getSubPartitions, those two parameters are not used. UDF DirectoryExplorers just passes two empty list. I'm not clear why the interface need these two additional parameters, on top of "schema" and "table".


- Jinfeng Ni


On March 25, 2015, 5:54 p.m., Jason Altekruse wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30701/
> -----------------------------------------------------------
> 
> (Updated March 25, 2015, 5:54 p.m.)
> 
> 
> Review request for drill, Jacques Nadeau, Mehant Baid, Parth Chandra, and Venki Korukanti.
> 
> 
> Bugs: DRILL-2173
>     https://issues.apache.org/jira/browse/DRILL-2173
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Adds a new interface for UDFs to access partition information. Together with 2060 which allows constant expression folding this will allow UDFs that can query against partition information and then scan a subset of data. Example use case, find the most recent directory and only that partition worth of data.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseSchemaFactory.java 7b76092 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveSchemaFactory.java 023517b 
>   contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoSchemaFactory.java 32c42ba 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java ab121b0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DirectoryExplorers.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/interpreter/InterpreterEvaluator.java 35c35ec 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 5e31e5c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 3b51a69 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/UdfUtilities.java f7a1a04 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 90e3ef4 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractStoragePlugin.java b032fce 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorer.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorerImpl.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionNotFoundException.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaPartitionExplorer.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 2c0d8b8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java 4a3eba9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java 7c8d9b3 
>   exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/TestConstantFolding.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30701/diff/
> 
> 
> Testing
> -------
> 
> Test have been run on a very recent version, made a few minor cleanup edits since, waiting on another run, but do not anticipate issues.
> 
> 
> Thanks,
> 
> Jason Altekruse
> 
>


Re: Review Request 30701: DRILL-2173 partition queries for dynamic partition pruning

Posted by Jinfeng Ni <jn...@maprtech.com>.

> On April 1, 2015, 1 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DirectoryExplorers.java, line 82
> > <https://reviews.apache.org/r/30701/diff/6/?file=906263#file906263line82>
> >
> >     Not sure if the case-insensitive comparison should be used as default, or it should depend on the schema (i.e HBase schema could use a different sensitive policy from FileSystemSchema, etc), or it should be passed in as a parameter of udf maxdir(). In the query 
> >     " select * 
> >       from dfs.my_workspace.data_directory 
> >       where dir0 in (select MAX(dir0) from dfs.my_workspace.data_directory)"
> >     
> >     Aggregate function max() could use case sensitive string comparison. If this maxdir UDF chooses to use case-insensitive, then after partition pruning, it might return different query results.
> 
> Jason Altekruse wrote:
>     The primary use case we had in mind with this feature was actually just finding recent data, so all of the partition names were numeric. For the sake of date formats that are arranged such that a string comparison can give the corret result, ie YYYY-MM-DD or similar, the case sensitivity wouldn't matter. I think there are a lot of possibilities of ways that users might want to query there partition information, and I think it might be best to leave open the interface for writing custom UDFs in these cases. I could pass a flag to this UDF, or wriate another to do the same operation but case-sensitively.

Make sense either add a flag, or provide another UDF which will use case-sensitive compare. (similar to the function implementation "like" and "ilike").


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30701/#review78562
-----------------------------------------------------------


On March 25, 2015, 5:54 p.m., Jason Altekruse wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30701/
> -----------------------------------------------------------
> 
> (Updated March 25, 2015, 5:54 p.m.)
> 
> 
> Review request for drill, Jacques Nadeau, Mehant Baid, Parth Chandra, and Venki Korukanti.
> 
> 
> Bugs: DRILL-2173
>     https://issues.apache.org/jira/browse/DRILL-2173
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Adds a new interface for UDFs to access partition information. Together with 2060 which allows constant expression folding this will allow UDFs that can query against partition information and then scan a subset of data. Example use case, find the most recent directory and only that partition worth of data.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseSchemaFactory.java 7b76092 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveSchemaFactory.java 023517b 
>   contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoSchemaFactory.java 32c42ba 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java ab121b0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DirectoryExplorers.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/interpreter/InterpreterEvaluator.java 35c35ec 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 5e31e5c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 3b51a69 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/UdfUtilities.java f7a1a04 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 90e3ef4 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractStoragePlugin.java b032fce 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorer.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorerImpl.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionNotFoundException.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaPartitionExplorer.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 2c0d8b8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java 4a3eba9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java 7c8d9b3 
>   exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/TestConstantFolding.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30701/diff/
> 
> 
> Testing
> -------
> 
> Test have been run on a very recent version, made a few minor cleanup edits since, waiting on another run, but do not anticipate issues.
> 
> 
> Thanks,
> 
> Jason Altekruse
> 
>


Re: Review Request 30701: DRILL-2173 partition queries for dynamic partition pruning

Posted by Jason Altekruse <al...@gmail.com>.

> On April 1, 2015, 8 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DirectoryExplorers.java, line 82
> > <https://reviews.apache.org/r/30701/diff/6/?file=906263#file906263line82>
> >
> >     Not sure if the case-insensitive comparison should be used as default, or it should depend on the schema (i.e HBase schema could use a different sensitive policy from FileSystemSchema, etc), or it should be passed in as a parameter of udf maxdir(). In the query 
> >     " select * 
> >       from dfs.my_workspace.data_directory 
> >       where dir0 in (select MAX(dir0) from dfs.my_workspace.data_directory)"
> >     
> >     Aggregate function max() could use case sensitive string comparison. If this maxdir UDF chooses to use case-insensitive, then after partition pruning, it might return different query results.

The primary use case we had in mind with this feature was actually just finding recent data, so all of the partition names were numeric. For the sake of date formats that are arranged such that a string comparison can give the corret result, ie YYYY-MM-DD or similar, the case sensitivity wouldn't matter. I think there are a lot of possibilities of ways that users might want to query there partition information, and I think it might be best to leave open the interface for writing custom UDFs in these cases. I could pass a flag to this UDF, or wriate another to do the same operation but case-sensitively.


> On April 1, 2015, 8 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorer.java, line 101
> > <https://reviews.apache.org/r/30701/diff/6/?file=906270#file906270line101>
> >
> >     What's the purpose of passing partitionColumns and partitionValues? In FileSystemSchema or WorkspaceSchema getSubPartitions, those two parameters are not used. UDF DirectoryExplorers just passes two empty list. I'm not clear why the interface need these two additional parameters, on top of "schema" and "table".

These columns were added for use with storage systems that track partition column names. It is the case that they are not used for the only two current implementations of the interface in the file system and workspace schemas. These are primarily useful for Hive, as we can do partition pruning currently on partition columns. Adding this to the interface allowed generalizing this functionality to enable future use in Hive. It would be possible that would could have two different interface to avoid confusion in the cases where the partion columns are not needed.


- Jason


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30701/#review78562
-----------------------------------------------------------


On March 26, 2015, 12:54 a.m., Jason Altekruse wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30701/
> -----------------------------------------------------------
> 
> (Updated March 26, 2015, 12:54 a.m.)
> 
> 
> Review request for drill, Jacques Nadeau, Mehant Baid, Parth Chandra, and Venki Korukanti.
> 
> 
> Bugs: DRILL-2173
>     https://issues.apache.org/jira/browse/DRILL-2173
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Adds a new interface for UDFs to access partition information. Together with 2060 which allows constant expression folding this will allow UDFs that can query against partition information and then scan a subset of data. Example use case, find the most recent directory and only that partition worth of data.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseSchemaFactory.java 7b76092 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveSchemaFactory.java 023517b 
>   contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoSchemaFactory.java 32c42ba 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java ab121b0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DirectoryExplorers.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/interpreter/InterpreterEvaluator.java 35c35ec 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 5e31e5c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 3b51a69 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/UdfUtilities.java f7a1a04 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 90e3ef4 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractStoragePlugin.java b032fce 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorer.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionExplorerImpl.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/PartitionNotFoundException.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaPartitionExplorer.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 2c0d8b8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java 4a3eba9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java 7c8d9b3 
>   exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/TestConstantFolding.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30701/diff/
> 
> 
> Testing
> -------
> 
> Test have been run on a very recent version, made a few minor cleanup edits since, waiting on another run, but do not anticipate issues.
> 
> 
> Thanks,
> 
> Jason Altekruse
> 
>