You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Zelaine Fong (JIRA)" <ji...@apache.org> on 2016/02/16 23:31:18 UTC

[jira] [Assigned] (DRILL-4308) Aggregate operations on dir columns can be more efficient for certain use cases

     [ https://issues.apache.org/jira/browse/DRILL-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zelaine Fong reassigned DRILL-4308:
-----------------------------------

    Assignee: Jinfeng Ni

[~jni] - I believe the changes you're currently working on as part of DRILL-4387 will address this.  Right?

> Aggregate operations on dir<N> columns can be more efficient for certain use cases
> ----------------------------------------------------------------------------------
>
>                 Key: DRILL-4308
>                 URL: https://issues.apache.org/jira/browse/DRILL-4308
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.4.0
>            Reporter: Aman Sinha
>            Assignee: Jinfeng Ni
>
> For queries that perform plain aggregates or DISTINCT operations on the directory partition columns (dir0, dir1 etc.) and there are no other columns referenced in the query, the performance could be substantially improved by not having to scan the entire dataset.   
> Consider the following types of queries:
> {noformat}
> select  min(dir0) from largetable;
> select  distinct dir0 from largetable;
> {noformat}
> The number of distinct values of dir<N> columns is typically quite small and there's no reason to scan the large table.  This is also come as user feedback from some Drill users.  Of course, if there's any other column referenced in the query (WHERE, ORDER-BY etc.) then we cannot apply this optimization.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)