You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "xuchuanyin (JIRA)" <ji...@apache.org> on 2018/04/02 06:36:00 UTC

[jira] [Created] (CARBONDATA-2304) Enhance compaction performance by enabling prefetch

xuchuanyin created CARBONDATA-2304:
--------------------------------------

             Summary: Enhance compaction performance by enabling prefetch
                 Key: CARBONDATA-2304
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2304
             Project: CarbonData
          Issue Type: Improvement
          Components: data-load
            Reporter: xuchuanyin
            Assignee: xuchuanyin


During compaction, carbondata will query on the segments and retrieve a row， then it will sort the rows and produce the final carbondata file.

Currently we find the poor performance in retrieving the rows, so adding prefetch for the rows will surely improve the compaction performance.

In my local tests, compacting 4 segments each with 100 thousand rows costs 30s with prefetch and 50s without prefetch.

In my tests in a larger cluster, compacting 6 segments each with 18GB raw data costs 45min with prefetch and 57min without prefetch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)