You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by ye qi <ji...@gmail.com> on 2021/02/20 15:24:46 UTC

[Proposal] Support of LIST partition

List partition

Doris currently only supports Range partitioning, where data is usually
partitioned by time columns.

However, in some scenarios, users want to partition by some enumerated
values of columns, such as by city, etc.
Design

To add support for List partitioning, the following functional points need
to be considered.

   1. Support for List partition syntax in creating table statements.
   2. Support for adding and deleting List partition syntax.
   3. Support for List partitioning in various load operations.
   4. Support for List partition pruning during query.

List partitioned tables do not need to consider dynamic partitioning.
Detailed designSyntax

The main changes involved here include.

   1. Implementation of the subclass ListPartitionDesc of the parsing class
   PartitionDesc
   2. Implementation of metadata class PartitionInfo subclass
   ListPartitionInfo
   3. Support for parsing and checking ListPartitionDesc in CreateTableStmt
   4. Support for the creation of List Partition tables in Catalog class.
   5. Metadata persistence-related changes.

The syntax is referenced from MySQL and Oracle
Single partition column

CREATE TABLE tb1 (
    k1 int, k2 varchar(128), k3 int, v1 int, v2 int
)
PARTITION BY LIST(k1)
(
    PARTITION p1 VALUES IN ("1", "3", "5"),
    PARTITION p2 VALUES IN ("2", "4", "6"),
    ...
)
...
;

Multi-partition columns

CREATE TABLE tb2 (
    k1 int, k2 varchar(128), k3 int, v1 int, v2 int
)
PARTITION BY LIST(k1, k2)
(
    PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")),
    PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2",
"tianjin")),
    PARTITION p3 VALUES IN (("3", "beijing")),
    ...
)
...
;

NOTE: Each partition needs to ensure that the partition values are unique.
Add partition

ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9");
ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin"));

Load

The current load methods of Doris include Stream Load, INSERT, Routine
Load, Broker Load, Hadoop Load, Spark Load.

Among them, Stream Load, INSERT, Routine Load, and Broker Load all use
TabletSink class for data distribution. Our first phase supports List
partition support for these load operations.

The main changes involved include:

   1. Changes related to the Descriptors.TOlapTablePartitionParam structure
   in the Thrift structure TOlapTableSink
   2. Changes related to the OlapTablePartition object in the OlapTableSink
   class on the BE side.

Query

The query mainly needs to implement the List Partition pruning function.

The main changes involved include:

   1. Implementing the subclass ListPartitionPruner of PartitionPruner

Partition related

Support operations related to partitioned tables, such as recover,
truncate, temporary partition, restore, replace, etc.

Re:Re: [Proposal] Support of LIST partition

Posted by 陈明雨 <mo...@163.com>.
The default partition need to be considered later.



--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmingyu@apache.org





在 2021-02-23 10:37:13,"ling miao" <li...@apache.org> 写道:
>Hi JianLiang,
>
>Looking forward to your detailed design and PR.
>
>Ling Miao
>
>ye qi <ji...@gmail.com> 于2021年2月22日周一 下午7:25写道:
>
>> Hi, Ling Miao.
>>
>> Thanks for your advice.
>> I'll think about it and get back to you.
>>
>> Jianliang Qi
>>
>> On Mon, Feb 22, 2021 at 5:31 PM ling miao <li...@apache.org> wrote:
>>
>>> Hi JianLiang,
>>>
>>> Thank you for your proposal, I think this function is still necessary for
>>> some large dimension tables.
>>> This means that data that is not generated according to time can also be
>>> partitioned.
>>>
>>> Of course, since this is a change to metadata, all loads, queries, and
>>> other DDL operations may need to be changed and developed.
>>> Please be considerate when designing.
>>>
>>> Ling Miao
>>>
>>> ye qi <ji...@gmail.com> 于2021年2月21日周日 上午1:12写道:
>>>
>>>> List partition
>>>>
>>>> Doris currently only supports Range partitioning, where data is usually
>>>> partitioned by time columns.
>>>>
>>>> However, in some scenarios, users want to partition by some enumerated
>>>> values of columns, such as by city, etc.
>>>> Design
>>>>
>>>> To add support for List partitioning, the following functional points
>>>> need
>>>> to be considered.
>>>>
>>>>    1. Support for List partition syntax in creating table statements.
>>>>    2. Support for adding and deleting List partition syntax.
>>>>    3. Support for List partitioning in various load operations.
>>>>    4. Support for List partition pruning during query.
>>>>
>>>> List partitioned tables do not need to consider dynamic partitioning.
>>>> Detailed designSyntax
>>>>
>>>> The main changes involved here include.
>>>>
>>>>    1. Implementation of the subclass ListPartitionDesc of the parsing
>>>> class
>>>>    PartitionDesc
>>>>    2. Implementation of metadata class PartitionInfo subclass
>>>>    ListPartitionInfo
>>>>    3. Support for parsing and checking ListPartitionDesc in
>>>> CreateTableStmt
>>>>    4. Support for the creation of List Partition tables in Catalog class.
>>>>    5. Metadata persistence-related changes.
>>>>
>>>> The syntax is referenced from MySQL and Oracle
>>>> Single partition column
>>>>
>>>> CREATE TABLE tb1 (
>>>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>>>> )
>>>> PARTITION BY LIST(k1)
>>>> (
>>>>     PARTITION p1 VALUES IN ("1", "3", "5"),
>>>>     PARTITION p2 VALUES IN ("2", "4", "6"),
>>>>     ...
>>>> )
>>>> ...
>>>> ;
>>>>
>>>> Multi-partition columns
>>>>
>>>> CREATE TABLE tb2 (
>>>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>>>> )
>>>> PARTITION BY LIST(k1, k2)
>>>> (
>>>>     PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")),
>>>>     PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2",
>>>> "tianjin")),
>>>>     PARTITION p3 VALUES IN (("3", "beijing")),
>>>>     ...
>>>> )
>>>> ...
>>>> ;
>>>>
>>>> NOTE: Each partition needs to ensure that the partition values are
>>>> unique.
>>>> Add partition
>>>>
>>>> ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9");
>>>> ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin"));
>>>>
>>>> Load
>>>>
>>>> The current load methods of Doris include Stream Load, INSERT, Routine
>>>> Load, Broker Load, Hadoop Load, Spark Load.
>>>>
>>>> Among them, Stream Load, INSERT, Routine Load, and Broker Load all use
>>>> TabletSink class for data distribution. Our first phase supports List
>>>> partition support for these load operations.
>>>>
>>>> The main changes involved include:
>>>>
>>>>    1. Changes related to the Descriptors.TOlapTablePartitionParam
>>>> structure
>>>>    in the Thrift structure TOlapTableSink
>>>>    2. Changes related to the OlapTablePartition object in the
>>>> OlapTableSink
>>>>    class on the BE side.
>>>>
>>>> Query
>>>>
>>>> The query mainly needs to implement the List Partition pruning function.
>>>>
>>>> The main changes involved include:
>>>>
>>>>    1. Implementing the subclass ListPartitionPruner of PartitionPruner
>>>>
>>>> Partition related
>>>>
>>>> Support operations related to partitioned tables, such as recover,
>>>> truncate, temporary partition, restore, replace, etc.
>>>>
>>>

Re: [Proposal] Support of LIST partition

Posted by ling miao <li...@apache.org>.
Hi JianLiang,

Looking forward to your detailed design and PR.

Ling Miao

ye qi <ji...@gmail.com> 于2021年2月22日周一 下午7:25写道:

> Hi, Ling Miao.
>
> Thanks for your advice.
> I'll think about it and get back to you.
>
> Jianliang Qi
>
> On Mon, Feb 22, 2021 at 5:31 PM ling miao <li...@apache.org> wrote:
>
>> Hi JianLiang,
>>
>> Thank you for your proposal, I think this function is still necessary for
>> some large dimension tables.
>> This means that data that is not generated according to time can also be
>> partitioned.
>>
>> Of course, since this is a change to metadata, all loads, queries, and
>> other DDL operations may need to be changed and developed.
>> Please be considerate when designing.
>>
>> Ling Miao
>>
>> ye qi <ji...@gmail.com> 于2021年2月21日周日 上午1:12写道:
>>
>>> List partition
>>>
>>> Doris currently only supports Range partitioning, where data is usually
>>> partitioned by time columns.
>>>
>>> However, in some scenarios, users want to partition by some enumerated
>>> values of columns, such as by city, etc.
>>> Design
>>>
>>> To add support for List partitioning, the following functional points
>>> need
>>> to be considered.
>>>
>>>    1. Support for List partition syntax in creating table statements.
>>>    2. Support for adding and deleting List partition syntax.
>>>    3. Support for List partitioning in various load operations.
>>>    4. Support for List partition pruning during query.
>>>
>>> List partitioned tables do not need to consider dynamic partitioning.
>>> Detailed designSyntax
>>>
>>> The main changes involved here include.
>>>
>>>    1. Implementation of the subclass ListPartitionDesc of the parsing
>>> class
>>>    PartitionDesc
>>>    2. Implementation of metadata class PartitionInfo subclass
>>>    ListPartitionInfo
>>>    3. Support for parsing and checking ListPartitionDesc in
>>> CreateTableStmt
>>>    4. Support for the creation of List Partition tables in Catalog class.
>>>    5. Metadata persistence-related changes.
>>>
>>> The syntax is referenced from MySQL and Oracle
>>> Single partition column
>>>
>>> CREATE TABLE tb1 (
>>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>>> )
>>> PARTITION BY LIST(k1)
>>> (
>>>     PARTITION p1 VALUES IN ("1", "3", "5"),
>>>     PARTITION p2 VALUES IN ("2", "4", "6"),
>>>     ...
>>> )
>>> ...
>>> ;
>>>
>>> Multi-partition columns
>>>
>>> CREATE TABLE tb2 (
>>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>>> )
>>> PARTITION BY LIST(k1, k2)
>>> (
>>>     PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")),
>>>     PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2",
>>> "tianjin")),
>>>     PARTITION p3 VALUES IN (("3", "beijing")),
>>>     ...
>>> )
>>> ...
>>> ;
>>>
>>> NOTE: Each partition needs to ensure that the partition values are
>>> unique.
>>> Add partition
>>>
>>> ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9");
>>> ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin"));
>>>
>>> Load
>>>
>>> The current load methods of Doris include Stream Load, INSERT, Routine
>>> Load, Broker Load, Hadoop Load, Spark Load.
>>>
>>> Among them, Stream Load, INSERT, Routine Load, and Broker Load all use
>>> TabletSink class for data distribution. Our first phase supports List
>>> partition support for these load operations.
>>>
>>> The main changes involved include:
>>>
>>>    1. Changes related to the Descriptors.TOlapTablePartitionParam
>>> structure
>>>    in the Thrift structure TOlapTableSink
>>>    2. Changes related to the OlapTablePartition object in the
>>> OlapTableSink
>>>    class on the BE side.
>>>
>>> Query
>>>
>>> The query mainly needs to implement the List Partition pruning function.
>>>
>>> The main changes involved include:
>>>
>>>    1. Implementing the subclass ListPartitionPruner of PartitionPruner
>>>
>>> Partition related
>>>
>>> Support operations related to partitioned tables, such as recover,
>>> truncate, temporary partition, restore, replace, etc.
>>>
>>

Re: [Proposal] Support of LIST partition

Posted by ye qi <ji...@gmail.com>.
Hi, Ling Miao.

Thanks for your advice.
I'll think about it and get back to you.

Jianliang Qi

On Mon, Feb 22, 2021 at 5:31 PM ling miao <li...@apache.org> wrote:

> Hi JianLiang,
>
> Thank you for your proposal, I think this function is still necessary for
> some large dimension tables.
> This means that data that is not generated according to time can also be
> partitioned.
>
> Of course, since this is a change to metadata, all loads, queries, and
> other DDL operations may need to be changed and developed.
> Please be considerate when designing.
>
> Ling Miao
>
> ye qi <ji...@gmail.com> 于2021年2月21日周日 上午1:12写道:
>
>> List partition
>>
>> Doris currently only supports Range partitioning, where data is usually
>> partitioned by time columns.
>>
>> However, in some scenarios, users want to partition by some enumerated
>> values of columns, such as by city, etc.
>> Design
>>
>> To add support for List partitioning, the following functional points need
>> to be considered.
>>
>>    1. Support for List partition syntax in creating table statements.
>>    2. Support for adding and deleting List partition syntax.
>>    3. Support for List partitioning in various load operations.
>>    4. Support for List partition pruning during query.
>>
>> List partitioned tables do not need to consider dynamic partitioning.
>> Detailed designSyntax
>>
>> The main changes involved here include.
>>
>>    1. Implementation of the subclass ListPartitionDesc of the parsing
>> class
>>    PartitionDesc
>>    2. Implementation of metadata class PartitionInfo subclass
>>    ListPartitionInfo
>>    3. Support for parsing and checking ListPartitionDesc in
>> CreateTableStmt
>>    4. Support for the creation of List Partition tables in Catalog class.
>>    5. Metadata persistence-related changes.
>>
>> The syntax is referenced from MySQL and Oracle
>> Single partition column
>>
>> CREATE TABLE tb1 (
>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>> )
>> PARTITION BY LIST(k1)
>> (
>>     PARTITION p1 VALUES IN ("1", "3", "5"),
>>     PARTITION p2 VALUES IN ("2", "4", "6"),
>>     ...
>> )
>> ...
>> ;
>>
>> Multi-partition columns
>>
>> CREATE TABLE tb2 (
>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>> )
>> PARTITION BY LIST(k1, k2)
>> (
>>     PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")),
>>     PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2",
>> "tianjin")),
>>     PARTITION p3 VALUES IN (("3", "beijing")),
>>     ...
>> )
>> ...
>> ;
>>
>> NOTE: Each partition needs to ensure that the partition values are unique.
>> Add partition
>>
>> ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9");
>> ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin"));
>>
>> Load
>>
>> The current load methods of Doris include Stream Load, INSERT, Routine
>> Load, Broker Load, Hadoop Load, Spark Load.
>>
>> Among them, Stream Load, INSERT, Routine Load, and Broker Load all use
>> TabletSink class for data distribution. Our first phase supports List
>> partition support for these load operations.
>>
>> The main changes involved include:
>>
>>    1. Changes related to the Descriptors.TOlapTablePartitionParam
>> structure
>>    in the Thrift structure TOlapTableSink
>>    2. Changes related to the OlapTablePartition object in the
>> OlapTableSink
>>    class on the BE side.
>>
>> Query
>>
>> The query mainly needs to implement the List Partition pruning function.
>>
>> The main changes involved include:
>>
>>    1. Implementing the subclass ListPartitionPruner of PartitionPruner
>>
>> Partition related
>>
>> Support operations related to partitioned tables, such as recover,
>> truncate, temporary partition, restore, replace, etc.
>>
>

Re: [Proposal] Support of LIST partition

Posted by ling miao <li...@apache.org>.
Hi JianLiang,

Thank you for your proposal, I think this function is still necessary for
some large dimension tables.
This means that data that is not generated according to time can also be
partitioned.

Of course, since this is a change to metadata, all loads, queries, and
other DDL operations may need to be changed and developed.
Please be considerate when designing.

Ling Miao

ye qi <ji...@gmail.com> 于2021年2月21日周日 上午1:12写道:

> List partition
>
> Doris currently only supports Range partitioning, where data is usually
> partitioned by time columns.
>
> However, in some scenarios, users want to partition by some enumerated
> values of columns, such as by city, etc.
> Design
>
> To add support for List partitioning, the following functional points need
> to be considered.
>
>    1. Support for List partition syntax in creating table statements.
>    2. Support for adding and deleting List partition syntax.
>    3. Support for List partitioning in various load operations.
>    4. Support for List partition pruning during query.
>
> List partitioned tables do not need to consider dynamic partitioning.
> Detailed designSyntax
>
> The main changes involved here include.
>
>    1. Implementation of the subclass ListPartitionDesc of the parsing class
>    PartitionDesc
>    2. Implementation of metadata class PartitionInfo subclass
>    ListPartitionInfo
>    3. Support for parsing and checking ListPartitionDesc in CreateTableStmt
>    4. Support for the creation of List Partition tables in Catalog class.
>    5. Metadata persistence-related changes.
>
> The syntax is referenced from MySQL and Oracle
> Single partition column
>
> CREATE TABLE tb1 (
>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
> )
> PARTITION BY LIST(k1)
> (
>     PARTITION p1 VALUES IN ("1", "3", "5"),
>     PARTITION p2 VALUES IN ("2", "4", "6"),
>     ...
> )
> ...
> ;
>
> Multi-partition columns
>
> CREATE TABLE tb2 (
>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
> )
> PARTITION BY LIST(k1, k2)
> (
>     PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")),
>     PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2",
> "tianjin")),
>     PARTITION p3 VALUES IN (("3", "beijing")),
>     ...
> )
> ...
> ;
>
> NOTE: Each partition needs to ensure that the partition values are unique.
> Add partition
>
> ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9");
> ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin"));
>
> Load
>
> The current load methods of Doris include Stream Load, INSERT, Routine
> Load, Broker Load, Hadoop Load, Spark Load.
>
> Among them, Stream Load, INSERT, Routine Load, and Broker Load all use
> TabletSink class for data distribution. Our first phase supports List
> partition support for these load operations.
>
> The main changes involved include:
>
>    1. Changes related to the Descriptors.TOlapTablePartitionParam structure
>    in the Thrift structure TOlapTableSink
>    2. Changes related to the OlapTablePartition object in the OlapTableSink
>    class on the BE side.
>
> Query
>
> The query mainly needs to implement the List Partition pruning function.
>
> The main changes involved include:
>
>    1. Implementing the subclass ListPartitionPruner of PartitionPruner
>
> Partition related
>
> Support operations related to partitioned tables, such as recover,
> truncate, temporary partition, restore, replace, etc.
>