You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "dhatchayani (JIRA)" <ji...@apache.org> on 2019/03/15 09:23:00 UTC

[jira] [Updated] (CARBONDATA-3293) Prune datamaps improvement for count(*)

     [ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhatchayani updated CARBONDATA-3293:
------------------------------------
    Summary: Prune datamaps improvement for count(*)  (was: Prune datamaps improvement)

> Prune datamaps improvement for count(*)
> ---------------------------------------
>
>                 Key: CARBONDATA-3293
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3293
>             Project: CarbonData
>          Issue Type: Improvement
>            Reporter: dhatchayani
>            Assignee: dhatchayani
>            Priority: Major
>          Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> +*Problem:*+
> (1) Currently for count ( *) , the prune is same as select * query.  Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need and it is a time consuming process.
> (2) Pruning in select * query consumes time in convertToSafeRow() - converting the DataMapRow to safe as in an unsafe row to get the position of data, we need to traverse through the whole row to reach a position.
> (3) In case of filter queries, even if the blocklet is valid or invalid, we are converting the DataMapRow to safeRow. This conversion is time consuming increasing the number of blocklets.
>  
> +*Solution:*+
> (1) We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count ( *) query performance can be improved.
> (2) Maintain the data length also to the DataMapRow, so that traversing the whole row can be avoided. With the length we can directly hit the data position.
> (3) Read only the MinMax from the DataMapRow, decide whether scan is required on that blocklet, if required only then it can be converted to safeRow, if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)