You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "xuchuanyin (JIRA)" <ji...@apache.org> on 2018/04/25 13:43:00 UTC

[jira] [Assigned] (CARBONDATA-2309) Add strategy to generate bigger carbondata files in case of small amount of data

     [ https://issues.apache.org/jira/browse/CARBONDATA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xuchuanyin reassigned CARBONDATA-2309:
--------------------------------------

    Assignee: wangsen  (was: xuchuanyin)

> Add strategy to generate bigger carbondata files in case of small amount of data
> --------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2309
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2309
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: data-load
>            Reporter: xuchuanyin
>            Assignee: wangsen
>            Priority: Major
>
> In some scenario, the input amount of loading data is small, but carbondata still distribute them to each executors (nodes) to do local-sort, thus resulting to small carbondata files generated by each executor. 
> In  some extreme conditions, if the cluster is big enough or if the amount of data is small enough, the carbondata file contains only one blocklet or page.
> I  think a new strategy should be introduced to solve the above problem.
> The new strategy should:
>  # be able to control the minimum amount of input data for each node
>  # ignore data locality otherwise it may always choose a small portion of particular nodes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)