You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jinfeng Ni <jn...@maprtech.com> on 2015/06/03 23:14:38 UTC

Review Request 35026: DRILL-3246: Query planning support for partition by clause in CTAS statement

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35026/
-----------------------------------------------------------

Review request for drill and Venki Korukanti.


Repository: drill-git


Description
-------

Main code change :

1) Modify Drill's SQL parser to allow partition by clause in CTAS statement
2) Modify Drill's query planner to do semantics validation/checking, and generate query plan to support the partition by clause.

In the query plan for the CTAS statement, Drill will ensure data are sorted according to the partition columns. The sort could be partial sort. Therefore, multiple rows with the same partition column values could end up in different partition.


Diffs
-----

  exec/java-exec/src/main/codegen/includes/parserImpls.ftl 1605b06 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/CreateTableEntry.java 673e8c6 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWriterRel.java fc93c3e 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/FileSystemCreateTableEntry.java 6784888 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WriterPrule.java 5790665 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java 2866b8c 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java 3edcdb2 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java bfa89a5 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java 9fd9d92 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java 57cfde9 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 6afce1a 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 4e50bc1 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java fa9aa89 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 5668c54 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java b1135d0 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 233c32b 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyWriter.java e12c5b3 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 322a88d 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetWriter.java 75f0e74 
  exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java f0422d3 

Diff: https://reviews.apache.org/r/35026/diff/


Testing
-------

Unit test. 

Precommit regression test.


Thanks,

Jinfeng Ni


Re: Review Request 35026: DRILL-3246: Query planning support for partition by clause in CTAS statement

Posted by Jinfeng Ni <jn...@maprtech.com>.

> On June 3, 2015, 4:03 p.m., Steven Phillips wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java, line 97
> > <https://reviews.apache.org/r/35026/diff/1/?file=977654#file977654line97>
> >
> >     Should we maybe use "PARTITIONED BY" instead, to match Hive's syntax?
> 
> Jinfeng Ni wrote:
>     Hive does use "PARTITIONED BY". However, seems at least MySQL[1],  Oracle[2], DB2[3],  use "PARTITION BY" .
>     
>     Since Hive's syntax is more like HiveQL, in stead of SQL, seems to me it makes sense to use the same syntax as MySql/Oracle/DB2, which are more or less stick to SQL standard.
>     
>     1. https://dev.mysql.com/doc/refman/5.7/en/partitioning-types.html
>     
>     2. http://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin001.htm#i1006455
>     
>     3. http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.partition.doc/doc/t0021578.html

Another reason to use "PARTITION":  according to SQL standard, "PARTITION" is SQL reserved word, but not "PARTITIONED". 

"PARTITION BY" follows the similar way of "ORDER BY" (vs ORDERED BY), "GROUP BY" (vs "GROUPED BY").


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35026/#review86500
-----------------------------------------------------------


On June 3, 2015, 2:14 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35026/
> -----------------------------------------------------------
> 
> (Updated June 3, 2015, 2:14 p.m.)
> 
> 
> Review request for drill and Venki Korukanti.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Main code change :
> 
> 1) Modify Drill's SQL parser to allow partition by clause in CTAS statement
> 2) Modify Drill's query planner to do semantics validation/checking, and generate query plan to support the partition by clause.
> 
> In the query plan for the CTAS statement, Drill will ensure data are sorted according to the partition columns. The sort could be partial sort. Therefore, multiple rows with the same partition column values could end up in different partition.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/codegen/includes/parserImpls.ftl 1605b06 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/CreateTableEntry.java 673e8c6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWriterRel.java fc93c3e 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/FileSystemCreateTableEntry.java 6784888 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WriterPrule.java 5790665 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java 2866b8c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java 3edcdb2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java bfa89a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java 9fd9d92 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java 57cfde9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 6afce1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 4e50bc1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java fa9aa89 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 5668c54 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java b1135d0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 233c32b 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyWriter.java e12c5b3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 322a88d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetWriter.java 75f0e74 
>   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java f0422d3 
> 
> Diff: https://reviews.apache.org/r/35026/diff/
> 
> 
> Testing
> -------
> 
> Unit test. 
> 
> Precommit regression test.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 35026: DRILL-3246: Query planning support for partition by clause in CTAS statement

Posted by Jinfeng Ni <jn...@maprtech.com>.

> On June 3, 2015, 4:03 p.m., Steven Phillips wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java, line 97
> > <https://reviews.apache.org/r/35026/diff/1/?file=977654#file977654line97>
> >
> >     Should we maybe use "PARTITIONED BY" instead, to match Hive's syntax?

Hive does use "PARTITIONED BY". However, seems at least MySQL[1],  Oracle[2], DB2[3],  use "PARTITION BY" .

Since Hive's syntax is more like HiveQL, in stead of SQL, seems to me it makes sense to use the same syntax as MySql/Oracle/DB2, which are more or less stick to SQL standard.

1. https://dev.mysql.com/doc/refman/5.7/en/partitioning-types.html

2. http://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin001.htm#i1006455

3. http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.partition.doc/doc/t0021578.html


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35026/#review86500
-----------------------------------------------------------


On June 3, 2015, 2:14 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35026/
> -----------------------------------------------------------
> 
> (Updated June 3, 2015, 2:14 p.m.)
> 
> 
> Review request for drill and Venki Korukanti.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Main code change :
> 
> 1) Modify Drill's SQL parser to allow partition by clause in CTAS statement
> 2) Modify Drill's query planner to do semantics validation/checking, and generate query plan to support the partition by clause.
> 
> In the query plan for the CTAS statement, Drill will ensure data are sorted according to the partition columns. The sort could be partial sort. Therefore, multiple rows with the same partition column values could end up in different partition.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/codegen/includes/parserImpls.ftl 1605b06 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/CreateTableEntry.java 673e8c6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWriterRel.java fc93c3e 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/FileSystemCreateTableEntry.java 6784888 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WriterPrule.java 5790665 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java 2866b8c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java 3edcdb2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java bfa89a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java 9fd9d92 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java 57cfde9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 6afce1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 4e50bc1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java fa9aa89 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 5668c54 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java b1135d0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 233c32b 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyWriter.java e12c5b3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 322a88d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetWriter.java 75f0e74 
>   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java f0422d3 
> 
> Diff: https://reviews.apache.org/r/35026/diff/
> 
> 
> Testing
> -------
> 
> Unit test. 
> 
> Precommit regression test.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 35026: DRILL-3246: Query planning support for partition by clause in CTAS statement

Posted by Steven Phillips <sp...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35026/#review86500
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java
<https://reviews.apache.org/r/35026/#comment138544>

    Should we maybe use "PARTITIONED BY" instead, to match Hive's syntax?


- Steven Phillips


On June 3, 2015, 9:14 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35026/
> -----------------------------------------------------------
> 
> (Updated June 3, 2015, 9:14 p.m.)
> 
> 
> Review request for drill and Venki Korukanti.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Main code change :
> 
> 1) Modify Drill's SQL parser to allow partition by clause in CTAS statement
> 2) Modify Drill's query planner to do semantics validation/checking, and generate query plan to support the partition by clause.
> 
> In the query plan for the CTAS statement, Drill will ensure data are sorted according to the partition columns. The sort could be partial sort. Therefore, multiple rows with the same partition column values could end up in different partition.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/codegen/includes/parserImpls.ftl 1605b06 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/CreateTableEntry.java 673e8c6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillWriterRel.java fc93c3e 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/FileSystemCreateTableEntry.java 6784888 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WriterPrule.java 5790665 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java 2866b8c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java 3edcdb2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java bfa89a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateTable.java 9fd9d92 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java 57cfde9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java 6afce1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/SubSchemaWrapper.java 4e50bc1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java fa9aa89 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 5668c54 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java b1135d0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 233c32b 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyWriter.java e12c5b3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 322a88d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetWriter.java 75f0e74 
>   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java f0422d3 
> 
> Diff: https://reviews.apache.org/r/35026/diff/
> 
> 
> Testing
> -------
> 
> Unit test. 
> 
> Precommit regression test.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>