You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Hans Zeller (JIRA)" <ji...@apache.org> on 2015/07/30 03:40:04 UTC
[jira] [Comment Edited] (TRAFODION-50) LP Blueprint: cmp-presplit-unsalted-tables - Add SQL syntax to pre-split unsalted tables into regions

    [ https://issues.apache.org/jira/browse/TRAFODION-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647043#comment-14647043 ] 

Hans Zeller edited comment on TRAFODION-50 at 7/30/15 1:39 AM:
---------------------------------------------------------------

Here is a proposal for a new syntax: SPLIT BY. I want to avoid PARTITION BY, because I hope we can eventually use that for a much more tightly managed model where we take control over region placement, allow co-location of tables in region servers, and push things like groupby and join down to the region server level. SPLIT BY will be the light version of that, an initial split, but then HBase can split further if it wants to.

create table lineitem(
l_orderkey int not null,
l_linenum int not null,
...
primary key (l_orderkey, l_linenum))
SPLIT BY (l_orderkey)
(add first key (10000),
add first key (20000)
);


was (Author: hzeller):
Here is a proposal for a new syntax: SPLIT BY. I want to avoid PARTITION BY, because I hope we can eventually use that for a much more tightly managed model where we take control over region placement, allow co-location of tables in region servers, and push things like groupby and join down to the region server level. SPLIT BY will be the light version of that, an initial split, but then HBase can split further it wants to.

create table lineitem(
l_orderkey int not null,
l_linenum int not null,
...
primary key (l_orderkey, l_linenum))
SPLIT BY (l_orderkey)
(add first key (10000),
add first key (20000)
);

> LP Blueprint: cmp-presplit-unsalted-tables - Add SQL syntax to pre-split unsalted tables into regions
> -----------------------------------------------------------------------------------------------------
>
>                 Key: TRAFODION-50
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-50
>             Project: Apache Trafodion
>          Issue Type: New Feature
>          Components: sql-cmp
>            Reporter: Hans Zeller
>            Priority: Critical
>             Fix For: 2.0-incubating
>
>
> Currently, Trafodion creates tables as a single-region HBase table, unless the table is salted. Salted tables have one region per salt bucket. We would like to add SQL syntax to allow pre-splitting of unsalted tables as well. We would like to offer two new table options, SPLIT BY and PARTITION BY. Both allow specification of split keys. SPLIT BY will simply pre-split the table, with no special split policy. PARTITION BY will add a prefix split policy that ensures that all rows with common PARTITION BY column values will remain in the same table.
> Example:
> create table lineitem(
>    l_orderkey int not null,
>    l_linenum int not null,
>    ...
>    primary key (l_orderkey, l_linenum))
> PARTITION BY (l_orderkey)
> (add first key (10000),
>  add first key (20000)
> );
> This will create three regions containing for values [<min>, 10000), [10000, 20000) and [20000, <max>) it will add a prefix split policy for a 4 byte prefix, the length of a non-nullable INT column. HBase will still be able to split the regions further, but we will have a guarantee that all rows for a given l_orderkey are located in the same region.
> If SPLIT BY is used instead of PARTITION BY, the same regions will be created, but no custom split policy will be added. Note that the PARTITION BY / SPLIT BY column(s) have to be a prefix of the clustering key of the table. PARTITION BY and SPLIT BY are not allowed for salted tables. We may or may not support pre-splitting of divisioned tables in the first release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)