You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/11/10 19:56:11 UTC

[jira] [Commented] (HAWQ-44) Advanced statistics for PXF tables

    [ https://issues.apache.org/jira/browse/HAWQ-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999115#comment-14999115 ] 

ASF GitHub Bot commented on HAWQ-44:
------------------------------------

GitHub user hornn opened a pull request:

    https://github.com/apache/incubator-hawq/pull/92

    Analyze HAWQ-44

    Advanced statistics for PXF table.
    
    PXF sample rows are collected into a temporary table, where statistics are derived of them in the same way ANALYZE works for hawq tables.
    Statistics are gathered at 3 stages:
    1. Getting general statistics - number of fragments, size of data source, size of first fragment
    2. Count of first fragment tuples
    HAWQ uses these numbers to determine how many tuples are needed, and these parameters are translated to sampling ratio and number of sampled fragments.
    3. Sampling the PXF table based on the sampling ratio and number of fragments to be sampled. The returned tuples are saved in a temporary table.
    
    On the PXF side, a function has been made to the Fragmenter API, to allow gathering the stats of the first stage. In addition, a mechanism to sample rows on the fly was added to the Bridge.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hornn/incubator-hawq analyze_HAWQ-44

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/92.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #92
    
----
commit 4237b106c6b3787b7770162fec9e23cd53e20d5e
Author: Noa Horn <nh...@pivotal.io>
Date:   2015-10-12T19:20:01Z

    HAWQ-44. PXF Advanced Statistics: hawq side changes

commit 4eeb40f2a5aaded9e241b19580b7f42875853f4b
Author: Noa Horn <nh...@pivotal.io>
Date:   2015-10-18T10:34:13Z

    HAWQ-44. PXF Advanced Statistics: java side

commit 7c9c64584c7dd4d8b9e4525ef1fa347805b94699
Author: Noa Horn <nh...@pivotal.io>
Date:   2015-10-25T09:34:19Z

    HAWQ-44. documentation

commit ca7ebb118047147c95d4c998eb7650a65bc73045
Author: Noa Horn <nh...@pivotal.io>
Date:   2015-11-02T15:14:19Z

    HAWQ-44. Add function to Fragmenter API to retrieve fragments stats, with default implementation.
    Add specific implementation to HdfsDataFragmenter.
    Add code in HAWQ to call new API, and clean up code that called analyzer.

commit 38ab2e601908f755859fb65329aaf4e3be26ca8c
Author: Noa Horn <nh...@pivotal.io>
Date:   2015-11-04T21:02:21Z

    HAWQ-44. Update package name in new files

commit 8c7955c82a9d5c2e4a2db9f240d271c46ecb9bf9
Author: Noa Horn <nh...@pivotal.io>
Date:   2015-11-06T19:03:09Z

    HAWQ-44. Disable getFragmentsStats for HBase and Hive fragmenters

commit 4a8183a707c21cf242d454a280570c45bcd2d880
Author: Noa Horn <nh...@pivotal.io>
Date:   2015-11-10T02:29:07Z

    HAWQ-44. Change stats to include unit together with size of resource to avoid overflow.

commit 49b3e448436c32561af2fc16749ec484216770d6
Author: Noa Horn <nh...@pivotal.io>
Date:   2015-11-10T18:48:42Z

    HAWQ-44. Remove extra lines

----


> Advanced statistics for PXF tables
> ----------------------------------
>
>                 Key: HAWQ-44
>                 URL: https://issues.apache.org/jira/browse/HAWQ-44
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Noa Horn
>            Assignee: Noa Horn
>              Labels: Performance
>
> PXF will get full statistics on a table using sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)