You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/11/10 19:56:11 UTC
[jira] [Commented] (HAWQ-44) Advanced statistics for PXF tables
[ https://issues.apache.org/jira/browse/HAWQ-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999115#comment-14999115 ]
ASF GitHub Bot commented on HAWQ-44:
------------------------------------
GitHub user hornn opened a pull request:
https://github.com/apache/incubator-hawq/pull/92
Analyze HAWQ-44
Advanced statistics for PXF table.
PXF sample rows are collected into a temporary table, where statistics are derived of them in the same way ANALYZE works for hawq tables.
Statistics are gathered at 3 stages:
1. Getting general statistics - number of fragments, size of data source, size of first fragment
2. Count of first fragment tuples
HAWQ uses these numbers to determine how many tuples are needed, and these parameters are translated to sampling ratio and number of sampled fragments.
3. Sampling the PXF table based on the sampling ratio and number of fragments to be sampled. The returned tuples are saved in a temporary table.
On the PXF side, a function has been made to the Fragmenter API, to allow gathering the stats of the first stage. In addition, a mechanism to sample rows on the fly was added to the Bridge.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hornn/incubator-hawq analyze_HAWQ-44
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hawq/pull/92.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #92
----
commit 4237b106c6b3787b7770162fec9e23cd53e20d5e
Author: Noa Horn <nh...@pivotal.io>
Date: 2015-10-12T19:20:01Z
HAWQ-44. PXF Advanced Statistics: hawq side changes
commit 4eeb40f2a5aaded9e241b19580b7f42875853f4b
Author: Noa Horn <nh...@pivotal.io>
Date: 2015-10-18T10:34:13Z
HAWQ-44. PXF Advanced Statistics: java side
commit 7c9c64584c7dd4d8b9e4525ef1fa347805b94699
Author: Noa Horn <nh...@pivotal.io>
Date: 2015-10-25T09:34:19Z
HAWQ-44. documentation
commit ca7ebb118047147c95d4c998eb7650a65bc73045
Author: Noa Horn <nh...@pivotal.io>
Date: 2015-11-02T15:14:19Z
HAWQ-44. Add function to Fragmenter API to retrieve fragments stats, with default implementation.
Add specific implementation to HdfsDataFragmenter.
Add code in HAWQ to call new API, and clean up code that called analyzer.
commit 38ab2e601908f755859fb65329aaf4e3be26ca8c
Author: Noa Horn <nh...@pivotal.io>
Date: 2015-11-04T21:02:21Z
HAWQ-44. Update package name in new files
commit 8c7955c82a9d5c2e4a2db9f240d271c46ecb9bf9
Author: Noa Horn <nh...@pivotal.io>
Date: 2015-11-06T19:03:09Z
HAWQ-44. Disable getFragmentsStats for HBase and Hive fragmenters
commit 4a8183a707c21cf242d454a280570c45bcd2d880
Author: Noa Horn <nh...@pivotal.io>
Date: 2015-11-10T02:29:07Z
HAWQ-44. Change stats to include unit together with size of resource to avoid overflow.
commit 49b3e448436c32561af2fc16749ec484216770d6
Author: Noa Horn <nh...@pivotal.io>
Date: 2015-11-10T18:48:42Z
HAWQ-44. Remove extra lines
----
> Advanced statistics for PXF tables
> ----------------------------------
>
> Key: HAWQ-44
> URL: https://issues.apache.org/jira/browse/HAWQ-44
> Project: Apache HAWQ
> Issue Type: New Feature
> Components: PXF
> Reporter: Noa Horn
> Assignee: Noa Horn
> Labels: Performance
>
> PXF will get full statistics on a table using sampling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)