You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Oleksandr Diachenko (JIRA)" <ji...@apache.org> on 2017/04/04 19:07:41 UTC

[jira] [Updated] (HAWQ-1404) PXF to leverage file-level stats of ORC file and emit records for COUNT(*)

     [ https://issues.apache.org/jira/browse/HAWQ-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleksandr Diachenko updated HAWQ-1404:
--------------------------------------
    Fix Version/s: 2.3.0.0-incubating

> PXF to leverage file-level stats of ORC file and emit records for COUNT(*)
> --------------------------------------------------------------------------
>
>                 Key: HAWQ-1404
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1404
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: PXF
>            Reporter: Oleksandr Diachenko
>            Assignee: Oleksandr Diachenko
>             Fix For: 2.3.0.0-incubating
>
>
> For cases when user issues COUNT(*) queries without WHERE clause PXF should be able to leverage file-level stats for a ORC file and emit given number of records back to HAWQ, avoiding reading actual tuples from disk. This should be a first step in enabling PXF to use ORC stats(file, stripe and row group levels) so we can improve a wider range of aggregate queries.
> So whenever PXF receives "count" as AGG-TYPE parameters value - it should optimize it by emitting tuples using ORC file-level stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)