You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "xuchuanyin (JIRA)" <ji...@apache.org> on 2018/03/09 05:12:00 UTC

[jira] [Created] (CARBONDATA-2238) Optimization in unsafe sort during data loading

xuchuanyin created CARBONDATA-2238:
--------------------------------------

             Summary: Optimization in unsafe sort during data loading
                 Key: CARBONDATA-2238
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2238
             Project: CarbonData
          Issue Type: Improvement
          Components: data-load
            Reporter: xuchuanyin
            Assignee: xuchuanyin


Inspired by batch_sort, if we have enough memory, in local_sort with unsafe property, we can hold all the row pages in memory if possible and only spill the pages to disk as sort temp file if the memory is unavailable.

Before spilling the pages, we can do in-memory merge sort of the pages.

Each time we request an unsafe row page, if the memory is unavailable, we can trigger a merge sort for the in-memory pages and spill the result to disk as a sort temp file. So the incoming pages will be held into the memory instead of spilling to disk directly.

After this implementation, the data size during each spilling will be bigger than that of before and will benefit the disk IO.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)