You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "xuchuanyin (JIRA)" <ji...@apache.org> on 2018/04/09 02:52:00 UTC

[jira] [Resolved] (CARBONDATA-2238) Optimization in unsafe sort during data loading

     [ https://issues.apache.org/jira/browse/CARBONDATA-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xuchuanyin resolved CARBONDATA-2238.
------------------------------------
    Resolution: Fixed

> Optimization in unsafe sort during data loading
> -----------------------------------------------
>
>                 Key: CARBONDATA-2238
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2238
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: data-load
>            Reporter: xuchuanyin
>            Assignee: xuchuanyin
>            Priority: Major
>          Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Inspired by batch_sort, if we have enough memory, in local_sort with unsafe property, we can hold all the row pages in memory if possible and only spill the pages to disk as sort temp file if the memory is unavailable.
> Before spilling the pages, we can do in-memory merge sort of the pages.
> Each time we request an unsafe row page, if the memory is unavailable, we can trigger a merge sort for the in-memory pages and spill the result to disk as a sort temp file. So the incoming pages will be held into the memory instead of spilling to disk directly.
> After this implementation, the data size during each spilling will be bigger than that of before and will benefit the disk IO.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)