You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by xuchuanyin <xu...@hust.edu.cn> on 2018/11/07 15:54:28 UTC

Enhancement on compaction performance

Hi all:
I am raising a PR to enhance the performance of compaction. The PR number is #2906.

Based on my experiments using about 72GB LineItem data ( in 100GB TPCH data), I got the following results.

Code Branch	Prefetch	Batch Size (default 100)	Load1 (s)	Load2 (s)	Load3 (s)	Compact 3 Loads (s)	Time Reduced
master	NA	100	447.4	445.9	450.1	661.3	Base Line
master	NA	32000	441.5	454.4	456.8	641.2	+3.0%
PR2906	enable	100	445.3	450.2	445.3	411.8	+37.7%
PR2906	enable	32000	438.7	446.8	441.8	333.1	+49.6%
PR2906	disable	100	458.1	459.4	450.9	659.5	+0.3%
PR2906	disable	32000	472.0	446.8	457.1	654.5	+1.0%
Note: These tests are under spark-2.2 version

The results show that compaction performance is almost doubled if configured properly.
It also shows even if this feature is disabled, the compaction performance still not decrease.

So here:

1. I do want to make this feature ‘enabled’ by default.

2. Besides, I’d want the others in the community also test this feature and check whether we can benefit from this feature.

Any feedback is welcome.


Re: Enhancement on compaction performance

Posted by xuchuanyin <xu...@hust.edu.cn>.
Oh, I didn't notice the memory consumption at that time.

We all know that the resource utilization is low during compaction.
Using prefetch means that We are doing query background and it will surely
consume more resources. 
Current size of prefetch is controlled by the 'carbon.detail.batch.size' and
by default is 100 which means extra 100 rows will be kept in memory before
it is retrieved.
So the memory overhead consists the memory consumed by the query plus the
memory of the #carbon.detail.batch.size records.





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Enhancement on compaction performance

Posted by Jacky Li <ja...@qq.com>.
Hi Xuchuanyin,

This feature is great for compaction. I wonder do you observe more memory is used since it prefetch data in the memory? Do you have any number?

Regards,
Jacky

> 在 2018年11月7日,下午11:54,xuchuanyin <xu...@hust.edu.cn> 写道:
> 
> Hi all:
> I am raising a PR to enhance the performance of compaction. The PR number is #2906.
> 
> Based on my experiments using about 72GB LineItem data ( in 100GB TPCH data), I got the following results.
> 
> Code Branch	Prefetch	Batch Size (default 100)	Load1 (s)	Load2 (s)	Load3 (s)	Compact 3 Loads (s)	Time Reduced
> master	NA	100	447.4	445.9	450.1	661.3	Base Line
> master	NA	32000	441.5	454.4	456.8	641.2	+3.0%
> PR2906	enable	100	445.3	450.2	445.3	411.8	+37.7%
> PR2906	enable	32000	438.7	446.8	441.8	333.1	+49.6%
> PR2906	disable	100	458.1	459.4	450.9	659.5	+0.3%
> PR2906	disable	32000	472.0	446.8	457.1	654.5	+1.0%
> Note: These tests are under spark-2.2 version
> 
> The results show that compaction performance is almost doubled if configured properly.
> It also shows even if this feature is disabled, the compaction performance still not decrease.
> 
> So here:
> 
> 1. I do want to make this feature ‘enabled’ by default.
> 
> 2. Besides, I’d want the others in the community also test this feature and check whether we can benefit from this feature.
> 
> Any feedback is welcome.
> 
> 


Re: Enhancement on compaction performance

Posted by xuchuanyin <xu...@hust.edu.cn>.
Hi, all:

The previous experiment uses 3 huawei ecs instances as workers each with 16
cores and 32GB. Spark executor use 12 cores and 24GB. Using 74GB LineItem in
100GB TPCH.

Today I run another experiment using 1 huawei RH2288 machine with 32 cores
and 128GB. Spark executor use 30 cores and 90GB. Using 7.3GB LineItem in
10GB TPCH. And the results are as below:

Code Branch	Prefetch	Batch Size (default 100)	Load1 (s)	Load2 (s)	Load3 (s)
Compact 3 Loads (s)	Time Reduced	Perf Enhanced
master	NA		100		147.4 	142.3 	144.6 	201.4 	Baseline	Baseline
master	NA		32000	140.8 	138.7 	141.6 	196.2 	2.6%	2.7%
PR2906	enable	100		143.9 	142.5 	146.2 	99.9 		50.4%	101.6%
PR2906	enable	32000	142.1 	139.3 	136.9 	98.3 		51.2%	104.9%
PR2906	disable	100		146.7 	137.4 	139.6 	200.6 	0.4%	0.4%
PR2906	disable	32000	145.2 	145.0 	139.7 	195.7 	2.8%	2.9%

It also shows this PR will not decrease the compaction performance if
disabled and will enhance the performance if enabled.




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/