You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "kumar vishal (JIRA)" <ji...@apache.org> on 2018/04/30 09:50:00 UTC

[jira] [Resolved] (CARBONDATA-2381) Improve compaction performance by filling batch result in columnar format and performing IO at blocklet level

     [ https://issues.apache.org/jira/browse/CARBONDATA-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

kumar vishal resolved CARBONDATA-2381.
--------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.0

> Improve compaction performance by filling batch result in columnar format and performing IO at blocklet level
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2381
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2381
>             Project: CarbonData
>          Issue Type: Improvement
>    Affects Versions: 1.3.1
>            Reporter: Manish Gupta
>            Assignee: Manish Gupta
>            Priority: Major
>             Fix For: 1.4.0
>
>          Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Problem: Compaction performance is slow as compared to data load. If compaction threshold is set to 6,6 then on minor compaction after 6 loads compaction performance is almost 6-7 times of the total load performance for 6 loads.
> Analysis:
>  # During compaction result filling is done in row format. Due to this as the number of columns increases the dimension and measure data filling time increases. This happens because in row filling we are not able to take advantage of OS cacheable buffers as we continuously read data for next column.
>  # As compaction uses a page level reader flow wherein both IO and uncompression is done at page level, the IO and uncompression time increases in this model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)