You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Fei Xu (JIRA)" <ji...@apache.org> on 2018/05/17 09:42:00 UTC
[jira] [Comment Edited] (CALCITE-2312) Support Partition By in sql select statement

    [ https://issues.apache.org/jira/browse/CALCITE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478405#comment-16478405 ] 

Fei Xu edited comment on CALCITE-2312 at 5/17/18 9:41 AM:
----------------------------------------------------------

Thanks for the comment. 

So, may be I can start from SqlCreateTable, add PARTITION BY strategy, just like adding PRIMARY KEY.  SqlCreateTable represents a relation table, maybe SqlCreateStream is more suitable for stream.




was (Author: xu fei):
Thanks for the comment. 

So, may be I can start from SqlCreateTable, add PARTITION BY strategy, just like adding PRIMARY KEY.  

> Support Partition By in sql select statement
> --------------------------------------------
>
>                 Key: CALCITE-2312
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2312
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Fei Xu
>            Assignee: Julian Hyde
>            Priority: Major
>
> I noticed that calcite already have an Exchange RelNode represents distribution on RelNode's input, but no sql clause support. 
>  
> We are use calcite building SQL layer for out streaming platform. and in streaming computation, data shuffle is a very important function, not only for our engine, but also for our users.
> For example, In stream-join-table, the engine will load the table data to a cache at runtime, and stream join table is actually stream look up table.
> If the stream data could partition by hash or range, before look up table. It will be cache friendly cause particular stream data look up particular table data. 
>  
> So I consider do some extensions on sql clause to support Exchange, e.g. 
> {code:java}
> // shuffle data using hash_distribution
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY o.productId
> // shuffle data to a singleton node
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY SINGLETON
> // shuffle data to all nodes 
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY BROADCAST
> // shuffle data to random node
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY RANDOM
> // shuffle data using round-robin policy
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY ROUND_ROBIN
> // shuffle data using range policy
> // Current I'm not sure about the appropriate clause to represents range 
> // shuffle, so it is just an demo.
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY RANGE o.productId, 0, 4096 
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)