You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Fei Xu (JIRA)" <ji...@apache.org> on 2018/05/16 09:44:00 UTC

[jira] [Updated] (CALCITE-2312) Support Partition By in sql select statement

     [ https://issues.apache.org/jira/browse/CALCITE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fei Xu updated CALCITE-2312:
----------------------------
    Description: 
I noticed that calcite already have an Exchange RelNode represents distribution on RelNode's input, but no sql clause support. 

 

We are use calcite building SQL layer for out streaming platform. and in streaming computation, data shuffle is a very important function, not only for our engine, but also for our users.

For example, In stream-join-table, the engine will load the table data to a cache at runtime, and stream join table is actually stream look up table.

If the stream data could partition by hash or range, before look up table. It will be cache friendly cause particular stream data look up particular table data. 

 

So I consider do some extensions on sql clause to support Exchange, e.g. 
{code:java}
// shuffle data using hash_distribution
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY o.productId

// shuffle data to a singleton node
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY SINGLETON

// shuffle data to all nodes 
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY BROADCAST

// shuffle data to random node
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY RANDOM

// shuffle data using round-robin policy
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY ROUND_ROBIN

// shuffle data using range policy
// Current I'm not sure about the appropriate clause to represents range 
// shuffle, so it is just an demo.
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY RANGE o.productId, 0, 4096 
{code}
 

  was:
I noticed that calcite already have an Exchange RelNode represents distribution on RelNode's input, but no sql clause support. 

We are use calcite building SQL layer for out streaming platform. and in streaming computation, data shuffle is a very important function, not only for our engine, but also for our users.

For example, In stream-join-table, the engine will load the table data to a cache at runtime, and stream join table is actually stream look up table.

If the stream data could partition by hash or range, before look up table. It will be cache friendly cause particular stream data look up particular table data. 

 

So I consider do some extensions on sql clause to support Exchange, e.g.

 
{code:java}
// shuffle data using hash_distribution
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY o.productId

// shuffle data to a singleton node
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY SINGLETON

// shuffle data to all nodes 
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY BROADCAST

// shuffle data to random node
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY RANDOM

// shuffle data using round-robin policy
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY ROUND_ROBIN

// shuffle data using range policy
// Current I'm not sure about the appropriate clause to represents range 
// shuffle, so it is just an demo.
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY RANGE o.productId, 0, 4096 
{code}
 


> Support Partition By in sql select statement
> --------------------------------------------
>
>                 Key: CALCITE-2312
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2312
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Fei Xu
>            Assignee: Julian Hyde
>            Priority: Major
>
> I noticed that calcite already have an Exchange RelNode represents distribution on RelNode's input, but no sql clause support. 
>  
> We are use calcite building SQL layer for out streaming platform. and in streaming computation, data shuffle is a very important function, not only for our engine, but also for our users.
> For example, In stream-join-table, the engine will load the table data to a cache at runtime, and stream join table is actually stream look up table.
> If the stream data could partition by hash or range, before look up table. It will be cache friendly cause particular stream data look up particular table data. 
>  
> So I consider do some extensions on sql clause to support Exchange, e.g. 
> {code:java}
> // shuffle data using hash_distribution
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY o.productId
> // shuffle data to a singleton node
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY SINGLETON
> // shuffle data to all nodes 
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY BROADCAST
> // shuffle data to random node
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY RANDOM
> // shuffle data using round-robin policy
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY ROUND_ROBIN
> // shuffle data using range policy
> // Current I'm not sure about the appropriate clause to represents range 
> // shuffle, so it is just an demo.
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY RANGE o.productId, 0, 4096 
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)