You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "Ravindra Pesala (JIRA)" <ji...@apache.org> on 2018/11/27 06:38:00 UTC

[jira] [Resolved] (CARBONDATA-3118) Parallelize block pruning of default datamap in driver for filter query processing

     [ https://issues.apache.org/jira/browse/CARBONDATA-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravindra Pesala resolved CARBONDATA-3118.
-----------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.1

> Parallelize block pruning of default datamap in driver  for filter query processing
> -----------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-3118
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3118
>             Project: CarbonData
>          Issue Type: Improvement
>            Reporter: Ajantha Bhat
>            Assignee: Ajantha Bhat
>            Priority: Major
>             Fix For: 1.5.1
>
>          Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> *"Parallelize block pruning of default datamap in driver 
> for filter query processing"* 
> *Background:* 
> We do block pruning for the filter queries at the driver side. 
> In real time big data scenario, we can have millions of carbon files for 
> one carbon table. 
> It is currently observed that for 1 million carbon files it takes around 5 
> seconds for block pruning. As each carbon file takes around 0.005ms for 
> pruning 
> (with only one filter columns set in 'column_meta_cache' tblproperty). 
> If the files are more, we might take more time for block pruning. 
> Also, spark Job will not be launched until block pruning is completed. so, 
> the user will not know what is happening at that time and why spark job is 
> not launching. 
> currently, block pruning is taking time as each segment processing is 
> happening sequentially. we can reduce the time by parallelizing it. 
> *solution:*Keep default number of threads for block pruning as 4. 
> User can reduce this number by a carbon property 
> "carbon.max.driver.threads.for.pruning" to set between -> 1 to 4. 
> In TableDataMap.prune(), 
> group the segments as per the threads by distributing equal carbon files to 
> each thread. 
> Launch the threads for a group of segments to handle block pruning. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)