You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by karanmehta93 <gi...@git.apache.org> on 2018/12/20 03:46:16 UTC

[GitHub] phoenix pull request #419: PHOENIX-4009 Run UPDATE STATISTICS command by usi...

GitHub user karanmehta93 opened a pull request:

    https://github.com/apache/phoenix/pull/419

    PHOENIX-4009 Run UPDATE STATISTICS command by using MR integration on…

    … snapshots

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/karanmehta93/phoenix 4.x-HBase-1.4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #419
    
----
commit 5c4057647bcb0a576939e215f9a58ba4fb66ff6e
Author: Karan Mehta <ka...@...>
Date:   2018-12-20T03:38:49Z

    PHOENIX-4009 Run UPDATE STATISTICS command by using MR integration on snapshots

----


---

[GitHub] phoenix issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR i...

Posted by dbwong <gi...@git.apache.org>.
Github user dbwong commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    @karanmehta93  I'll try and take a look tonight.


---

[GitHub] phoenix issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR i...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    @twdsilva @dbwong @BinShi-SecularBird Please review.



---

[GitHub] phoenix pull request #419: PHOENIX-4009 Run UPDATE STATISTICS command by usi...

Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/419#discussion_r243718523
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/parse/UpdateStatisticsStatement.java ---
    @@ -23,6 +23,7 @@
     
     import java.util.Map;
     
    +import org.apache.phoenix.jdbc.PhoenixStatement.Operation;
    --- End diff --
    
    nit: revert these changes


---

[GitHub] phoenix issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR i...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    @twdsilva @ChinmaySKulkarni @joshelser @chrajeshbabu @dbwong @BinShi-SecularBird
    
    Please review.


---

[GitHub] phoenix pull request #419: PHOENIX-4009 Run UPDATE STATISTICS command by usi...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/419#discussion_r243720533
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java ---
    @@ -154,7 +154,8 @@
     
         public enum SchemaType {
             TABLE,
    -        QUERY;
    +        QUERY,
    +        UPDATE_STATS
    --- End diff --
    
    I am also not quite sure about the exact use case. The SchemaType.TABLE enum value is only used by phoenix-hive module. The other option is SchemaType.QUERY which is default option to run regular queries.


---

[GitHub] phoenix pull request #419: PHOENIX-4009 Run UPDATE STATISTICS command by usi...

Posted by twdsilva <gi...@git.apache.org>.
Github user twdsilva commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/419#discussion_r243718429
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java ---
    @@ -154,7 +154,8 @@
     
         public enum SchemaType {
             TABLE,
    -        QUERY;
    +        QUERY,
    +        UPDATE_STATS
    --- End diff --
    
    I am not very familiar with the TABLE/QUERY enum, whats the difference?


---

[GitHub] phoenix issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR i...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    _First of all, apologies for loooong PR._ (Most of it is refactoring but still its hard to review)
    
    **Here's the high level idea** 
    1. 7 classes were inherited from `StatsCollectorIT`, testing stats collection for different types of table properties. There was a lot of redundancy in the test suite. Also, all the tests were running with namespaces enabled all the time (This is because it is set once for the JVM and we cannot go back without restarting the server). We were controlling the parameterized property for new `PhoenixConnection`, which is disallowed according to documentation.
    The code is now refactored to have only 3 classes, 
    `NamespaceMappedStatsCollectorIT` --> namespaces enabled, collect stats via snapshots as well as SQL statement
    `NonTxStatsCollectorIT` --> mutable/immutable tables, column encoded/non column encoded
    `TxStatsCollectorIT` --> mutable/immutable tables, column encoded/non column encoded, TEPHRA/OMID
    
    2. The `StatsCollectorIT` is renamed to `BaseStatsCollectorIT` and tests have been improved to cover certain scenarios. More tests coming along the way.
    
    3. Server side changes:
    `DefaultStatisticsCollector` is now an abstract class, RegionServerStatisticsCollector and `MapperStatisticsCollector` are its children. The former is triggered for SQL statements and the latter is used for this Jira (Map Reduce Job). Most of the common code is moved to base class.
    
    4. The snapshot scanner has been improved to collect statistics if the scan is configured accordingly. A `NoOpStatisticsCollector` instance is instantiated if its a regular phoenix MR job on snapshots. 
    
    5. Also have the configuration changes in `PhoenixConfigurationUtil` class.
    
    Finally, `UpdateStatisticsTool` is the tool to launch the MR job.
    
    This is the v1 version for some initial feedback. Please comment wherever its not clear.
    
    **Coming up:** 
    More tests covering other scenarios.
    Perf testing for sample tables and the results.
    Better/useful log lines
    General code cleanup for nits


---

[GitHub] phoenix issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR i...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    Design doc: https://salesforce.quip.com/fjMhA2bbBkK4 (is also linked to Jira)
    Jira has more details too https://issues.apache.org/jira/browse/PHOENIX-4009


---

[GitHub] phoenix issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR i...

Posted by ChinmaySKulkarni <gi...@git.apache.org>.
Github user ChinmaySKulkarni commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    @karanmehta93 can you provide URLs to some design docs/basic stats docs? Will help understand the overall idea much better for someone not so familiar with stats.


---

[GitHub] phoenix issue #419: PHOENIX-4009 Run UPDATE STATISTICS command by using MR i...

Posted by karanmehta93 <gi...@git.apache.org>.
Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    @gjacoby126 @vincentpoon If you guys have some time.


---