You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jinfeng Ni <jn...@maprtech.com> on 2015/06/03 23:14:38 UTC
Review Request 35026: DRILL-3246: Query planning support for
partition by clause in CTAS statement
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35026/
-----------------------------------------------------------
Review request for drill and Venki Korukanti.
Repository: drill-git
Description
-------
Main code change :
1) Modify Drill's SQL parser to allow partition by clause in CTAS statement
2) Modify Drill's query planner to do semantics validation/checking, and generate query plan to support the partition by clause.
In the query plan for the CTAS statement, Drill will ensure data are sorted according to the partition columns. The sort could be partial sort. Therefore, multiple rows with the same partition column values could end up in different partition.
Diffs
-----
exec/java-exec/src/main/codegen/includes/parserImpls.ftl 1605b06
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/CreateTableEntry.java 673e8c6
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWriterRel.java fc93c3e
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/FileSystemCreateTableEntry.java 6784888
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WriterPrule.java 5790665
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java 2866b8c
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java 3edcdb2
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java bfa89a5
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java 9fd9d92
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java 57cfde9
exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 6afce1a
exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 4e50bc1
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java fa9aa89
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 5668c54
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java b1135d0
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 233c32b
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyWriter.java e12c5b3
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 322a88d
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetWriter.java 75f0e74
exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java f0422d3
Diff: https://reviews.apache.org/r/35026/diff/
Testing
-------
Unit test.
Precommit regression test.
Thanks,
Jinfeng Ni
Re: Review Request 35026: DRILL-3246: Query planning support for
partition by clause in CTAS statement
Posted by Jinfeng Ni <jn...@maprtech.com>.
> On June 3, 2015, 4:03 p.m., Steven Phillips wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java, line 97
> > <https://reviews.apache.org/r/35026/diff/1/?file=977654#file977654line97>
> >
> > Should we maybe use "PARTITIONED BY" instead, to match Hive's syntax?
>
> Jinfeng Ni wrote:
> Hive does use "PARTITIONED BY". However, seems at least MySQL[1], Oracle[2], DB2[3], use "PARTITION BY" .
>
> Since Hive's syntax is more like HiveQL, in stead of SQL, seems to me it makes sense to use the same syntax as MySql/Oracle/DB2, which are more or less stick to SQL standard.
>
> 1. https://dev.mysql.com/doc/refman/5.7/en/partitioning-types.html
>
> 2. http://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin001.htm#i1006455
>
> 3. http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.partition.doc/doc/t0021578.html
Another reason to use "PARTITION": according to SQL standard, "PARTITION" is SQL reserved word, but not "PARTITIONED".
"PARTITION BY" follows the similar way of "ORDER BY" (vs ORDERED BY), "GROUP BY" (vs "GROUPED BY").
- Jinfeng
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35026/#review86500
-----------------------------------------------------------
On June 3, 2015, 2:14 p.m., Jinfeng Ni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35026/
> -----------------------------------------------------------
>
> (Updated June 3, 2015, 2:14 p.m.)
>
>
> Review request for drill and Venki Korukanti.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> Main code change :
>
> 1) Modify Drill's SQL parser to allow partition by clause in CTAS statement
> 2) Modify Drill's query planner to do semantics validation/checking, and generate query plan to support the partition by clause.
>
> In the query plan for the CTAS statement, Drill will ensure data are sorted according to the partition columns. The sort could be partial sort. Therefore, multiple rows with the same partition column values could end up in different partition.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/codegen/includes/parserImpls.ftl 1605b06
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/CreateTableEntry.java 673e8c6
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWriterRel.java fc93c3e
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/FileSystemCreateTableEntry.java 6784888
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WriterPrule.java 5790665
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java 2866b8c
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java 3edcdb2
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java bfa89a5
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java 9fd9d92
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java 57cfde9
> exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 6afce1a
> exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 4e50bc1
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java fa9aa89
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 5668c54
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java b1135d0
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 233c32b
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyWriter.java e12c5b3
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 322a88d
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetWriter.java 75f0e74
> exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java f0422d3
>
> Diff: https://reviews.apache.org/r/35026/diff/
>
>
> Testing
> -------
>
> Unit test.
>
> Precommit regression test.
>
>
> Thanks,
>
> Jinfeng Ni
>
>
Re: Review Request 35026: DRILL-3246: Query planning support for
partition by clause in CTAS statement
Posted by Jinfeng Ni <jn...@maprtech.com>.
> On June 3, 2015, 4:03 p.m., Steven Phillips wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java, line 97
> > <https://reviews.apache.org/r/35026/diff/1/?file=977654#file977654line97>
> >
> > Should we maybe use "PARTITIONED BY" instead, to match Hive's syntax?
Hive does use "PARTITIONED BY". However, seems at least MySQL[1], Oracle[2], DB2[3], use "PARTITION BY" .
Since Hive's syntax is more like HiveQL, in stead of SQL, seems to me it makes sense to use the same syntax as MySql/Oracle/DB2, which are more or less stick to SQL standard.
1. https://dev.mysql.com/doc/refman/5.7/en/partitioning-types.html
2. http://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin001.htm#i1006455
3. http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.partition.doc/doc/t0021578.html
- Jinfeng
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35026/#review86500
-----------------------------------------------------------
On June 3, 2015, 2:14 p.m., Jinfeng Ni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35026/
> -----------------------------------------------------------
>
> (Updated June 3, 2015, 2:14 p.m.)
>
>
> Review request for drill and Venki Korukanti.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> Main code change :
>
> 1) Modify Drill's SQL parser to allow partition by clause in CTAS statement
> 2) Modify Drill's query planner to do semantics validation/checking, and generate query plan to support the partition by clause.
>
> In the query plan for the CTAS statement, Drill will ensure data are sorted according to the partition columns. The sort could be partial sort. Therefore, multiple rows with the same partition column values could end up in different partition.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/codegen/includes/parserImpls.ftl 1605b06
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/CreateTableEntry.java 673e8c6
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWriterRel.java fc93c3e
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/FileSystemCreateTableEntry.java 6784888
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WriterPrule.java 5790665
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java 2866b8c
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java 3edcdb2
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java bfa89a5
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java 9fd9d92
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java 57cfde9
> exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 6afce1a
> exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 4e50bc1
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java fa9aa89
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 5668c54
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java b1135d0
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 233c32b
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyWriter.java e12c5b3
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 322a88d
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetWriter.java 75f0e74
> exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java f0422d3
>
> Diff: https://reviews.apache.org/r/35026/diff/
>
>
> Testing
> -------
>
> Unit test.
>
> Precommit regression test.
>
>
> Thanks,
>
> Jinfeng Ni
>
>
Re: Review Request 35026: DRILL-3246: Query planning support for
partition by clause in CTAS statement
Posted by Steven Phillips <sp...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35026/#review86500
-----------------------------------------------------------
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java
<https://reviews.apache.org/r/35026/#comment138544>
Should we maybe use "PARTITIONED BY" instead, to match Hive's syntax?
- Steven Phillips
On June 3, 2015, 9:14 p.m., Jinfeng Ni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35026/
> -----------------------------------------------------------
>
> (Updated June 3, 2015, 9:14 p.m.)
>
>
> Review request for drill and Venki Korukanti.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> Main code change :
>
> 1) Modify Drill's SQL parser to allow partition by clause in CTAS statement
> 2) Modify Drill's query planner to do semantics validation/checking, and generate query plan to support the partition by clause.
>
> In the query plan for the CTAS statement, Drill will ensure data are sorted according to the partition columns. The sort could be partial sort. Therefore, multiple rows with the same partition column values could end up in different partition.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/codegen/includes/parserImpls.ftl 1605b06
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/CreateTableEntry.java 673e8c6
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWriterRel.java fc93c3e
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/FileSystemCreateTableEntry.java 6784888
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WriterPrule.java 5790665
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java 2866b8c
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java 3edcdb2
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java bfa89a5
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java 9fd9d92
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java 57cfde9
> exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 6afce1a
> exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 4e50bc1
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java fa9aa89
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 5668c54
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java b1135d0
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 233c32b
> exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyWriter.java e12c5b3
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 322a88d
> exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetWriter.java 75f0e74
> exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java f0422d3
>
> Diff: https://reviews.apache.org/r/35026/diff/
>
>
> Testing
> -------
>
> Unit test.
>
> Precommit regression test.
>
>
> Thanks,
>
> Jinfeng Ni
>
>