You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Noa Horn (JIRA)" <ji...@apache.org> on 2015/11/24 21:06:11 UTC

[jira] [Created] (HAWQ-191) Remove Analyzer from PXF

Noa Horn created HAWQ-191:
-----------------------------

             Summary: Remove Analyzer from PXF
                 Key: HAWQ-191
                 URL: https://issues.apache.org/jira/browse/HAWQ-191
             Project: Apache HAWQ
          Issue Type: Improvement
          Components: PXF
            Reporter: Noa Horn
            Assignee: Goden Yao


Analyzer plugin was used to gather statistics when running ANALYZE.
The API provides one function getEstimatedStats() which returns the estimated number of tuples, blocks and the size of block.
We also have one implementation for it - HdfsAnalyzer.

After the introduction of advanced stats (HAWQ-44), the Analyzer is no longer used by HAWQ. Instead a new function in the Fragmenter API (getFragmentsStats) is used to gather initial statistics for the data source, and further queries gather sampling tuples for that data source.

The advantage in the new approach is that the Fragmenter.getFragmentsStats uses only the Fragmenter to gather stats. The Analyzer, on the other hand, instantiated both Fragmenter and Accessor of the table in order to estimate the number of tuples. In the HdfsAnalyzer implementation, it caused a dependency of pxf-hdfs jar on pxf-service (which takes care of instantiating the plugins), which is contrary to the isolation we want to achieve between core functionality (pxf-service) and the plugins (pxf-hdfs, pxf-hive, pxf-hbase, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)