You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Chin Wei <lo...@gmail.com> on 2019/11/08 09:07:29 UTC

Data Load performance degrade when number of segment increase

Hi Community,

I notice that when the number of segments increased, the time taken to load
data increase as well.
After checking, whenever we load 1 csv file, it call readLoadMetadata 9
times. For a table with 10,000 segments each readLoadMetadata call took
50ms.

Is there any plan to improve this or any area that I can look at to improve
it.

Regards,
Chin Wei



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Data Load performance degrade when number of segment increase

Posted by Jacky Li <ja...@apache.org>.
Hi,

My suggestion is:
1. Reduce the number of call of readTableStatusFile as less as possible in both loading and query.
2. Cache maybe added inside SegmentStatusManager for LoadMetadtaDetails, and cache invalidation should be carefully done, like for case when dropping table.
3. Do compaction to merge small segment periodically in your application, to reduce the number of segments. After compaction, a small number of "compacted" segment entry will be remained in the table status file, and the "compacted" segment entry will be moved to history table status file. Check carbon.invisible.segments.preserve.count in http://carbondata.apache.org/configuration-parameters.html

If you want to work on it, you are welcome to submit JIRA and PRs.

Regards,
Jacky

On 2019/11/08 09:07:29, Chin Wei <lo...@gmail.com> wrote: 
> Hi Community,
> 
> I notice that when the number of segments increased, the time taken to load
> data increase as well.
> After checking, whenever we load 1 csv file, it call readLoadMetadata 9
> times. For a table with 10,000 segments each readLoadMetadata call took
> 50ms.
> 
> Is there any plan to improve this or any area that I can look at to improve
> it.
> 
> Regards,
> Chin Wei
> 
> 
> 
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>