You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "Manish Gupta (JIRA)" <ji...@apache.org> on 2018/04/23 07:02:00 UTC

[jira] [Created] (CARBONDATA-2381) Improve compaction performance by filling batch result in columnar format and performing IO at blocklet level

Manish Gupta created CARBONDATA-2381:
----------------------------------------

             Summary: Improve compaction performance by filling batch result in columnar format and performing IO at blocklet level
                 Key: CARBONDATA-2381
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2381
             Project: CarbonData
          Issue Type: Improvement
    Affects Versions: 1.3.1
            Reporter: Manish Gupta
            Assignee: Manish Gupta


Problem: Compaction performance is slow as compared to data load. If compaction threshold is set to 6,6 then on minor compaction after 6 loads compaction performance is almost 6-7 times of the total load performance for 6 loads.

Analysis:
 # During compaction result filling is done in row format. Due to this as the number of columns increases the dimension and measure data filling time increases. This happens because in row filling we are not able to take advantage of OS cacheable buffers as we continuously read data for next column.
 # As compaction uses a page level reader flow wherein both IO and uncompression is done at page level, the IO and uncompression time increases in this model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)