You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by "Miao,Ling" <mi...@baidu.com> on 2019/04/09 08:39:44 UTC

[DISCUSS] Unify the load framework in Doris

Hi all,

I prepare to unify the load framework in Doris.

The background is following:
There are different load frameworks belong to different load methods in Doris. On the one hand, the performance of broker and mini load which use the old framework is lower than stream load. On the other hand, the loading of large file is not supported by the framework of broker load which includes non-streaming sink.

The purposes of ‘Unify the Load Framework’ are following:

  1.  Improve the performance of broker and mini load by using the framework of stream load.
  2.  The loading of large file is supported by the unified framework.

I am looking forward to your suggestions of ‘Unify the Load Framework'.

Best Regards,
Emmy Miao

Re: [DISCUSS] Unify the load framework in Doris

Posted by Zhao Chun <bu...@gmail.com>.
It will be a good job for Doris.

Here are some code to do same thing, for example, the CSVScanNode can be
replaced by BrokerScanNode.
And BrokerScanNode seems to be changed another name, because that it not
only can scan from broker, but also from local file or memory sink.
Can you change it to another name?

Could you create an issue in GitHub to record this work, and add more
detail to it.

Thanks,
ZHAO Chun



Miao,Ling <mi...@baidu.com> 于2019年4月9日周二 下午4:40写道:

> Hi all,
>
> I prepare to unify the load framework in Doris.
>
> The background is following:
> There are different load frameworks belong to different load methods in
> Doris. On the one hand, the performance of broker and mini load which use
> the old framework is lower than stream load. On the other hand, the loading
> of large file is not supported by the framework of broker load which
> includes non-streaming sink.
>
> The purposes of ‘Unify the Load Framework’ are following:
>
>   1.  Improve the performance of broker and mini load by using the
> framework of stream load.
>   2.  The loading of large file is supported by the unified framework.
>
> I am looking forward to your suggestions of ‘Unify the Load Framework'.
>
> Best Regards,
> Emmy Miao
>

Re:Re: [DISCUSS] Unify the load framework in Doris

Posted by 陈明雨 <mo...@163.com>.
Good job!
Expecting your result!


--
此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmingyu@apache.org



At 2019-06-21 17:43:00, "Miao,Ling" <mi...@baidu.com> wrote:
>Hi all,
>
>I already finished the new load framework(streaming) of broker load and mini load. It’s time to verify the performance of new framework.
>
>This test will ignore the effect of rollup and query. The result of test will only depends on the performance of new load framework.
>
>
>The environment:
>Node 1:
>
>  *   Memory: about 3G
>  *   Disk: SATA about 2T
>  *   Network: about 10000Mb/s
>
>Node 2,3,4:
>
>  *   Memory: about 27G
>  *   Disk: SSD about 369G
>  *   Network: about 10000Mb/s
>
>The prepare of test:
>
>  1.  Install doris:
>
>  *   One frontend and three backends.
>  *   Three backends are located in different node(2,3,4).
>  *   The frontend is located Node1.
>
>  1.  Create table:
>
>  *   Situation 1: The table has only one tablet and 3 replicas.
>SQL: create table xxx (about 34 columns: 16 sum aggregation columns) ENGINE=OLAP AGGREGATE KEY(2 columns) DISTRIBUTED BY HASH(2 columns) BUCKETS 1 PROPERTIES ("storage_type" = "COLUMN");
>  *   Situation 2: The table has 100 tablet and every tablet has 3 replicase.
>SQL: create table xxx (about 34 columns: sum 16) ENGINE=OLAP AGGREGATE KEY(2 columns) DISTRIBUTED BY HASH(2 columns) BUCKETS 100 PROPERTIES ("storage_type" = "COLUMN");
>
>The data of test:
>Load different size (2K, 40M, 20G, 280G) of file: catalog_sales.dat
>
>The step of test:
>
>  1.  Load data by non-streaming broker load
>  2.  Load data by streaming broker load
>  3.  Load data by non-streaming mini load
>  4.  Load data by streaming mini load
>
>Is the plan of test precise? I am looking forward to your suggestions of ’Testing the performance of new load framework(streaming)’
>
>Best Regards,
>Emmy Miao
>
>
>发件人: Ling Miao <mi...@baidu.com>>
>日期: 2019年4月9日 星期二 下午4:39
>至: "dev@doris.apache.org<ma...@doris.apache.org>" <de...@doris.apache.org>>
>抄送: palo-rd <pa...@baidu.com>>, Ling Miao <mi...@baidu.com>>
>主题: [DISCUSS] Unify the load framework in Doris
>
>Hi all,
>
>I prepare to unify the load framework in Doris.
>
>The background is following:
>There are different load frameworks belong to different load methods in Doris. On the one hand, the performance of broker and mini load which use the old framework is lower than stream load. On the other hand, the loading of large file is not supported by the framework of broker load which includes non-streaming sink.
>
>The purposes of ‘Unify the Load Framework’ are following:
>
>  1.  Improve the performance of broker and mini load by using the framework of stream load.
>  2.  The loading of large file is supported by the unified framework.
>
>I am looking forward to your suggestions of ‘Unify the Load Framework'.
>
>Best Regards,
>Emmy Miao

Re: [DISCUSS] Unify the load framework in Doris

Posted by "Miao,Ling" <mi...@baidu.com>.
Hi all,

I already finished the new load framework(streaming) of broker load and mini load. It’s time to verify the performance of new framework.

This test will ignore the effect of rollup and query. The result of test will only depends on the performance of new load framework.


The environment:
Node 1:

  *   Memory: about 3G
  *   Disk: SATA about 2T
  *   Network: about 10000Mb/s

Node 2,3,4:

  *   Memory: about 27G
  *   Disk: SSD about 369G
  *   Network: about 10000Mb/s

The prepare of test:

  1.  Install doris:

  *   One frontend and three backends.
  *   Three backends are located in different node(2,3,4).
  *   The frontend is located Node1.

  1.  Create table:

  *   Situation 1: The table has only one tablet and 3 replicas.
SQL: create table xxx (about 34 columns: 16 sum aggregation columns) ENGINE=OLAP AGGREGATE KEY(2 columns) DISTRIBUTED BY HASH(2 columns) BUCKETS 1 PROPERTIES ("storage_type" = "COLUMN");
  *   Situation 2: The table has 100 tablet and every tablet has 3 replicase.
SQL: create table xxx (about 34 columns: sum 16) ENGINE=OLAP AGGREGATE KEY(2 columns) DISTRIBUTED BY HASH(2 columns) BUCKETS 100 PROPERTIES ("storage_type" = "COLUMN");

The data of test:
Load different size (2K, 40M, 20G, 280G) of file: catalog_sales.dat

The step of test:

  1.  Load data by non-streaming broker load
  2.  Load data by streaming broker load
  3.  Load data by non-streaming mini load
  4.  Load data by streaming mini load

Is the plan of test precise? I am looking forward to your suggestions of ’Testing the performance of new load framework(streaming)’

Best Regards,
Emmy Miao


发件人: Ling Miao <mi...@baidu.com>>
日期: 2019年4月9日 星期二 下午4:39
至: "dev@doris.apache.org<ma...@doris.apache.org>" <de...@doris.apache.org>>
抄送: palo-rd <pa...@baidu.com>>, Ling Miao <mi...@baidu.com>>
主题: [DISCUSS] Unify the load framework in Doris

Hi all,

I prepare to unify the load framework in Doris.

The background is following:
There are different load frameworks belong to different load methods in Doris. On the one hand, the performance of broker and mini load which use the old framework is lower than stream load. On the other hand, the loading of large file is not supported by the framework of broker load which includes non-streaming sink.

The purposes of ‘Unify the Load Framework’ are following:

  1.  Improve the performance of broker and mini load by using the framework of stream load.
  2.  The loading of large file is supported by the unified framework.

I am looking forward to your suggestions of ‘Unify the Load Framework'.

Best Regards,
Emmy Miao