You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by karanmehta93 <gi...@git.apache.org> on 2018/12/20 07:50:15 UTC

[GitHub] phoenix issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR i...

Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    _First of all, apologies for loooong PR._ (Most of it is refactoring but still its hard to review)
    
    **Here's the high level idea** 
    1. 7 classes were inherited from `StatsCollectorIT`, testing stats collection for different types of table properties. There was a lot of redundancy in the test suite. Also, all the tests were running with namespaces enabled all the time (This is because it is set once for the JVM and we cannot go back without restarting the server). We were controlling the parameterized property for new `PhoenixConnection`, which is disallowed according to documentation.
    The code is now refactored to have only 3 classes, 
    `NamespaceMappedStatsCollectorIT` --> namespaces enabled, collect stats via snapshots as well as SQL statement
    `NonTxStatsCollectorIT` --> mutable/immutable tables, column encoded/non column encoded
    `TxStatsCollectorIT` --> mutable/immutable tables, column encoded/non column encoded, TEPHRA/OMID
    
    2. The `StatsCollectorIT` is renamed to `BaseStatsCollectorIT` and tests have been improved to cover certain scenarios. More tests coming along the way.
    
    3. Server side changes:
    `DefaultStatisticsCollector` is now an abstract class, RegionServerStatisticsCollector and `MapperStatisticsCollector` are its children. The former is triggered for SQL statements and the latter is used for this Jira (Map Reduce Job). Most of the common code is moved to base class.
    
    4. The snapshot scanner has been improved to collect statistics if the scan is configured accordingly. A `NoOpStatisticsCollector` instance is instantiated if its a regular phoenix MR job on snapshots. 
    
    5. Also have the configuration changes in `PhoenixConfigurationUtil` class.
    
    Finally, `UpdateStatisticsTool` is the tool to launch the MR job.
    
    This is the v1 version for some initial feedback. Please comment wherever its not clear.
    
    **Coming up:** 
    More tests covering other scenarios.
    Perf testing for sample tables and the results.
    Better/useful log lines
    General code cleanup for nits


---