You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "Manish Gupta (JIRA)" <ji...@apache.org> on 2018/07/25 13:30:00 UTC

[jira] [Resolved] (CARBONDATA-2638) Implement driver min max caching for specified columns and segregate block and blocklet cache

     [ https://issues.apache.org/jira/browse/CARBONDATA-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manish Gupta resolved CARBONDATA-2638.
--------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.1

> Implement driver min max caching for specified columns and segregate block and blocklet cache
> ---------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2638
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2638
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Manish Gupta
>            Assignee: Manish Gupta
>            Priority: Major
>             Fix For: 1.4.1
>
>         Attachments: Driver_Block_Cache.docx
>
>
> *Background*
> Current implementation of Blocklet dataMap caching in driver is that it caches the min and max values of all the columns in schema by default. 
> *Problem*
>  Problem with this implementation is that as the number of loads increases the memory required to hold min and max values also increases considerably. We know that in most of the scenarios there is a single driver and memory configured for driver is less as compared to executor. With continuous increase in memory requirement driver can even go out of memory which makes the situation further worse.
> *Solution*
> 1. Cache only the required columns in Driver
> 2. Segregation of block and Blocklet level cache**
> For more details please check the attached document



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)