You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Fei Xu (JIRA)" <ji...@apache.org> on 2018/05/16 09:44:00 UTC

[jira] [Created] (CALCITE-2312) Support Partition By in sql select statement

Fei Xu created CALCITE-2312:
-------------------------------

             Summary: Support Partition By in sql select statement
                 Key: CALCITE-2312
                 URL: https://issues.apache.org/jira/browse/CALCITE-2312
             Project: Calcite
          Issue Type: New Feature
          Components: core
            Reporter: Fei Xu
            Assignee: Julian Hyde


I noticed that calcite already have an Exchange RelNode represents distribution on RelNode's input, but no sql clause support. 

We are use calcite building SQL layer for out streaming platform. and in streaming computation, data shuffle is a very important function, not only for our engine, but also for our users.

For example, In stream-join-table, the engine will load the table data to a cache at runtime, and stream join table is actually stream look up table.

If the stream data could partition by hash or range, before look up table. It will be cache friendly cause particular stream data look up particular table data. 

 

So I consider do some extensions on sql clause to support Exchange, e.g.

 
{code:java}
// shuffle data using hash_distribution
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY o.productId

// shuffle data to a singleton node
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY SINGLETON

// shuffle data to all nodes 
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY BROADCAST

// shuffle data to random node
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY RANDOM

// shuffle data using round-robin policy
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY ROUND_ROBIN

// shuffle data using range policy
// Current I'm not sure about the appropriate clause to represents range 
// shuffle, so it is just an demo.
INSERT INTO output
SELECT
  *
FROM orders o
PARTITION BY RANGE o.productId, 0, 4096 
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)