You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by ramkrish86 <gi...@git.apache.org> on 2014/08/25 14:53:55 UTC

[GitHub] phoenix pull request: Phoenix 180

GitHub user ramkrish86 opened a pull request:

    https://github.com/apache/phoenix/pull/8

    Phoenix 180

    First revision for review.  There are some open points here.  Still no test case added to consume the stats.  Will do the performance test once the system is available with me.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ramkrish86/phoenix-1 Phoenix-180

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/8.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8
    
----
commit df59996c9c0b19a44262de36e28d34de910faaaa
Author: Ramkrishna <ra...@intel.com>
Date:   2014-08-22T08:48:19Z

    Phoenix-180

commit 09f8ebdcc468ac9fdce9a62ea7377d62c294d2a9
Author: Ramkrishna <ra...@intel.com>
Date:   2014-08-25T12:45:26Z

    Phoenix-180

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16698634
  
    --- Diff: phoenix-core/src/main/antlr3/PhoenixSQL.g ---
    @@ -104,12 +102,13 @@ tokens
         MINVALUE='minvalue';
         MAXVALUE='maxvalue';
         CYCLE='cycle';
    +    ANALYZE='analyze';
    +    UPDATE_STATISTICS='update_statistics';
    --- End diff --
    
    Ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16797475
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    That sounds fine, but then switch region name after cf name so we can get the stats more easily. I think a good pk would be: <table name><cf name><region name>. Then use the stats name as a regular key value column with the correct type (i.e. VARBINARY ARRAY for guideposts). Probably cleaner to store the stats on the PColumnFamily instead of PTable too. You can still pass then through the PTableImpl constructor, but might as well break them up by CF and pass through the constituent parts to the PColumnFamilyImpl constructor. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16720739
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    It's fine to collect it per region. I'm talking about the *usage* of these guideposts. Instead of driving this per region in this loop, we can just get back *all* guideposts for a table here and return them. No need to break this down per region. Thus, I don't think PTable needs to keep a map. All it needs is all the guideposts for a table. I don't think the logic to get them to "round-robin" per region server is worth preserving. We can just randomly shuffle the guideposts and we'll get the same effect. This routine will become very simple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16830455
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    Ok..This can be done. The only thing is multiple select queries would reach the stats table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699399
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/PTableStats.java ---
    @@ -22,23 +22,116 @@
     import org.apache.hadoop.hbase.HRegionInfo;
     
     
    -/**
    - * Interface for Phoenix table statistics. Statistics is collected on the server
    - * side and can be used for various purpose like splitting region for scanning, etc.
    - * 
    - * The table is defined on the client side, but it is populated on the server side. The client
    - * should not populate any data to the statistics object.
    +/*
    + * The table is defined on the client side, but it is populated on the server side. The client should not populate any data to the
    + * statistics object.
      */
     public interface PTableStats {
     
         /**
    -     * Given the region info, returns an array of bytes that is the current estimate of key
    -     * distribution inside that region. The keys should split that region into equal chunks.
    +     * Given the region info, returns an array of bytes that is the current estimate of key distribution inside that region. The keys should
    +     * split that region into equal chunks.
          * 
          * @param region
          * @return array of keys
          */
    -    byte[][] getRegionGuidePosts(HRegionInfo region);
    +    byte[] getRegionGuidePostsOfARegion(HRegionInfo region);
    +
    +    /**
    +     * Returns a map of with key as the regions of the table and the guide posts collected as byte[] created from VARBINARY ARRAY
    +     * 
    +     * @return map of region and the guide posts as VARBINARY array
    +     */
    +    Map<String, byte[]> getGuidePosts();
    +
    +    /**
    +     * Returns the max key of a region
    +     * 
    +     * @param region
    +     * @return maxkey of a region
    +     */
    +    byte[] getMaxKeyOfARegion(HRegionInfo region);
    +
    +    /**
    +     * Returns the min key of a region
    +     * 
    +     * @param region
    +     * @return min key of a region
    +     */
    +    byte[] getMinKeyOfARegion(HRegionInfo region);
    +
    +    /**
    +     * Returns a map with key as the regions of the table and the max key in that region
    +     * 
    +     * @return
    +     */
    +    Map<String, byte[]> getMaxKey();
    +
    +    /**
    +     * Returns a map with key as the regions of the table and the min key in that region
    +     * 
    +     * @return
    +     */
    +    Map<String, byte[]> getMinKey();
    +
    +    /**
    +     * Set the min key of a region
    +     * 
    +     * @param region
    +     * @param minKey
    +     * @param offset
    +     * @param length
    +     */
    +    void setMinKey(HRegionInfo region, byte[] minKey, int offset, int length);
    +
    +    /**
    +     * Set the min key of the region. Here the region name is represented as the Bytes.toBytes(region.getRegionNameAsString).
    --- End diff --
    
    I think it may be possible. Let me see.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695690
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    How about something like this:
    
    * If there's a single region, we don't try to divide it
    * When a table splits, we initiate an update stats call on it - can we capture that in a coprocessor? If not, perhaps we can infer that it happened based on the current stats versus the number of regions.
    * We have the concept of a "minimum time to recalc stats" as a separate config and store the time when the stats were last calculated in the stats table. This will prevent too many analyze stats from being called (i.e. it wouldn't be dangerous to invoke an analyze stats here on the client, even if many clients do it, as the ones after the first one would be a noop).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699208
  
    --- Diff: phoenix-protocol/src/main/StatisticsCollect.proto ---
    @@ -0,0 +1,20 @@
    +option java_package = "org.apache.phoenix.coprocessor.generated";
    +option java_outer_classname = "StatCollectorProtos";
    +option java_generic_services = true;
    +option java_generate_equals_and_hash = true;
    +option optimize_for = SPEED;
    +
    +
    +message StatCollectRequest {
    +  required bytes tenantIdBytes = 1;
    +  required bytes schemaNameBytes = 2;
    --- End diff --
    
    Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16765083
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    +        HTableInterface ht = null;
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getTenantId() != null) {
    +            tenIdBytes = connection.getTenantId().getBytes();
    +        }
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getSchema() != null) {
    +            schNameInBytes = Bytes.toBytes(connection.getSchema());
    +        }
    +        final byte[] tenantIdBytes = tenIdBytes;
    +        final byte[] schemaNameInBytes = schNameInBytes;
    +        try {
    +            ht = connection.getQueryServices().getTable(Bytes.toBytes(tableName));
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r17067622
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    MAy be the inclusive and exclusive thing was the reason.  Now it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699418
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    Yes, the latter, but with the above caveat (to age-out a cache entry - it's an option on the guava-based Cache): The invalidation is done only when the schema changes or index is removed/added which also means there is no need for periodic scanning of stats table.
    
    This will be much simpler and we only have to manage a single cache. Perf should be fine - it's just an extra Scan and if it was done recently will be in the block cache anyway.
    
    Make sure you update the size of PTableImpl by taking into account the size of a PTableStatsImpl so the cache is sized correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16698733
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    --- End diff --
    
    Ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16794301
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    Here's what I think we should do here:
    - Store guideposts per column family. It's probably easiest if the PK is of the following form:
    <cf varchar not null><guidepost varbinary null>. I'm not sure there's any value in using a VARBINARY ARRAY. We should just make sure that we can delete the old guideposts and add the new ones easily.
    - Here, you'd still want to loop through the regions as above, but you want to get all guideposts for the column families involved in the query. Let's take the simple case where there's only one. In that case, you'd intersect all the region boundaries with the guideposts - this will be a bit easier if the guideposts are sorted already. The set of intersections will be what gets returned here.
    - For the multi-column family case, I think we want to do the same processing as above per column family and then we'll coalesce any overlapping ranges.
    - We have the intersect and coalesce methods you'll need in our KeyRange class, so the code should be relatively small


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16962143
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    Good point. The best way to ensure other clients will re-read the metadata from the server is to update a timestamp on a Cell in the row. Jeffrey made a change like this to "force" clients to do this by doing a Put of the empty key value. See MetaDataEndPointImpl.updateIndexState() line 1538:
    
                        if(dataTableKey != null) {
                            // make a copy of tableMetadata
                            tableMetadata = new ArrayList<Mutation>(tableMetadata);
                            // insert an empty KV to trigger time stamp update on data table row
                            Put p = new Put(dataTableKey);
                            p.add(TABLE_FAMILY_BYTES, QueryConstants.EMPTY_COLUMN_BYTES, timeStamp, ByteUtil.EMPTY_BYTE_ARRAY);
                            tableMetadata.add(p);
                        }
    
    How about having your invalidate method do that instead? In this case, the client makes a getTable() call in MetaDataClient.updateCache() and the new PTable is sent across (which would include the stats).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16704394
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    In one of the other comments in the JIRA, it was decided to do based on timer task. Hence did that way. Ok so for the cache the maxTTL will be the frequency at which the stat updation should happen, which is 15 mins. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16975421
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    So this put that we make should have a timestamp (the current ts) added with it or leave it to the server to add it.
    
        // Query for the latest table first, since it's not cached
                table = buildTable(key, cacheKey, region, HConstants.LATEST_TIMESTAMP);
                if (table != null && table.getTimeStamp() < clientTimeStamp) {
                    return table;
                }
                // Otherwise, query for an older version of the table - it won't be cached
                return buildTable(key, cacheKey, region, clientTimeStamp);
    
    Suppose we add a cell (empty cell) so that we can udpdate the cache, still the above code will not allow to do that because before the first 'if' we try to use HConstants.LATEST_TIMESTAMP and inside buildTable() we were were updating the metaDataCache for the table with the timestamp of the cell that we created. (fine till here) but the if conditon would fail because table.getTimeStamp = the timestamp of the emtpy kv and the clienttimestamp = 130 (for eg). So it goes into the second build table. There the passed timeSTamp = 130. So we create a PTable instance but with its timestamp as 130 and we don't udpate the metadatacache. As this table (with ts = 130) is returned out 
    
                    if (table.getTimeStamp() != tableTimeStamp) {
                    builder.setTable(PTableImpl.toProto(table));
                }
    before building the PTable from the protos we check for the existing time stamp and the table's timestamp both of the matches and we end up in no created the table from the proto.  This internally means that for the FromCompiler the table ref is not updated with the latest. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16694581
  
    --- Diff: phoenix-core/src/it/java/org/apace/phoenix/hbase/schema/stats/StatsCollectorIT.java ---
    @@ -0,0 +1,260 @@
    +package org.apace.phoenix.hbase.schema.stats;
    +
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_TERMINATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.PHOENIX_TEST_DRIVER_URL_PARAM;
    +import static org.apache.phoenix.util.TestUtil.LOCALHOST;
    +import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
    +import static org.junit.Assert.assertTrue;
    +
    +import java.sql.Array;
    +import java.sql.Connection;
    +import java.sql.DriverManager;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.SQLException;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Properties;
    +
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.CellScanner;
    +import org.apache.hadoop.hbase.HBaseConfiguration;
    +import org.apache.hadoop.hbase.HBaseTestingUtility;
    +import org.apache.hadoop.hbase.HConstants;
    +import org.apache.hadoop.hbase.HRegionLocation;
    +import org.apache.hadoop.hbase.client.HTable;
    +import org.apache.hadoop.hbase.client.Result;
    +import org.apache.hadoop.hbase.client.ResultScanner;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.compile.SequenceManager;
    +import org.apache.phoenix.compile.StatementContext;
    +import org.apache.phoenix.coprocessor.MetaDataProtocol.MetaDataMutationResult;
    +import org.apache.phoenix.end2end.ClientManagedTimeTest;
    +import org.apache.phoenix.iterate.DefaultParallelIteratorRegionSplitter;
    +import org.apache.phoenix.jdbc.PhoenixConnection;
    +import org.apache.phoenix.jdbc.PhoenixStatement;
    +import org.apache.phoenix.jdbc.PhoenixTestDriver;
    +import org.apache.phoenix.parse.HintNode;
    +import org.apache.phoenix.query.BaseTest;
    +import org.apache.phoenix.query.ConnectionQueryServices;
    +import org.apache.phoenix.query.KeyRange;
    +import org.apache.phoenix.query.QueryServices;
    +import org.apache.phoenix.schema.PTable;
    +import org.apache.phoenix.schema.PTableKey;
    +import org.apache.phoenix.schema.TableRef;
    +import org.apache.phoenix.schema.stat.StatisticsConstants;
    +import org.apache.phoenix.util.PhoenixRuntime;
    +import org.apache.phoenix.util.PropertiesUtil;
    +import org.apache.phoenix.util.ReadOnlyProps;
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +
    +import com.google.common.collect.Maps;
    +
    +@Category(ClientManagedTimeTest.class)
    +public class StatsCollectorIT extends BaseTest {
    --- End diff --
    
    We're trying to get away from ClientManagedTimeTests. Can you derive from HBaseManagedTimeTest and use the corresponding annotation instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695090
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    Rather than have a separate cache for this, I'd rather piggyback on the existing cache. How about if we just read the stats anytime we re-load the PTable? Then an updateStats would just need to invalidate the cache entry for the table. Next time the table is accessed, the stats table would be re-read.
    
    One side note: we'll want to make sure that the PStats info has a getEstimatedByteSize() and that PTable takes this into account for it's getEstimatedByteSize() so that this is accounted for in the size of each cached PTable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699116
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java ---
    @@ -84,7 +84,7 @@
         public static final int DEFAULT_TARGET_QUERY_CONCURRENCY = 32;
         public static final int DEFAULT_MAX_QUERY_CONCURRENCY = 64;
         public static final String DEFAULT_DATE_FORMAT = DateUtil.DEFAULT_DATE_FORMAT;
    -    public static final int DEFAULT_STATS_UPDATE_FREQ_MS = 15 * 60000; // 15min
    +    public static final int DEFAULT_STATS_UPDATE_FREQ_MS = 500; // 15min
    --- End diff --
    
    Ok.. will use that QueryServicesTestOptions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699137
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    --- End diff --
    
    Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16698962
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
        Then an updateStats would just need to invalidate the cache entry for the table. Next time the table is accessed, the stats    table would be re-read.
        How about if we just read the stats anytime we re-load the PTable
    So you mean when ever getTable() is called collect the udpateStats in a seperate thread.  If the stats was really changed then we would be returning the new stats that was updated (after invalidation). Or you mean that when ever we reload the table and populate the cache we do a scan of the STATS table and get the updated stats info in the main thread only.  The invalidation is done only when the schema changes or index is removed/added which also means there is no need for periodic scanning of stats table. Am i right here? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696129
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/PTable.java ---
    @@ -316,4 +316,6 @@ public static LinkType fromSerializedValue(byte serializedValue) {
         
         int getEstimatedSize();
         IndexType getIndexType();
    +    
    +    void setPTableStats(PTableStats stats);
    --- End diff --
    
    PTable should be immutable, so the stats should only be passed through the constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16998265
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    Also, your new invalidateTable() call should get the current table and then set the empty key value cell timestamp to currentTable.getTimeStamp() + 1. This should be done while the lock is in place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16975454
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    Am i missing something here? This happens for even HBaseManaged test cases. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16694608
  
    --- Diff: phoenix-core/src/it/java/org/apace/phoenix/hbase/schema/stats/StatsCollectorIT.java ---
    @@ -0,0 +1,260 @@
    +package org.apace.phoenix.hbase.schema.stats;
    +
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_TERMINATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.PHOENIX_TEST_DRIVER_URL_PARAM;
    +import static org.apache.phoenix.util.TestUtil.LOCALHOST;
    +import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
    +import static org.junit.Assert.assertTrue;
    +
    +import java.sql.Array;
    +import java.sql.Connection;
    +import java.sql.DriverManager;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.SQLException;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Properties;
    +
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.CellScanner;
    +import org.apache.hadoop.hbase.HBaseConfiguration;
    +import org.apache.hadoop.hbase.HBaseTestingUtility;
    +import org.apache.hadoop.hbase.HConstants;
    +import org.apache.hadoop.hbase.HRegionLocation;
    +import org.apache.hadoop.hbase.client.HTable;
    +import org.apache.hadoop.hbase.client.Result;
    +import org.apache.hadoop.hbase.client.ResultScanner;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.compile.SequenceManager;
    +import org.apache.phoenix.compile.StatementContext;
    +import org.apache.phoenix.coprocessor.MetaDataProtocol.MetaDataMutationResult;
    +import org.apache.phoenix.end2end.ClientManagedTimeTest;
    +import org.apache.phoenix.iterate.DefaultParallelIteratorRegionSplitter;
    +import org.apache.phoenix.jdbc.PhoenixConnection;
    +import org.apache.phoenix.jdbc.PhoenixStatement;
    +import org.apache.phoenix.jdbc.PhoenixTestDriver;
    +import org.apache.phoenix.parse.HintNode;
    +import org.apache.phoenix.query.BaseTest;
    +import org.apache.phoenix.query.ConnectionQueryServices;
    +import org.apache.phoenix.query.KeyRange;
    +import org.apache.phoenix.query.QueryServices;
    +import org.apache.phoenix.schema.PTable;
    +import org.apache.phoenix.schema.PTableKey;
    +import org.apache.phoenix.schema.TableRef;
    +import org.apache.phoenix.schema.stat.StatisticsConstants;
    +import org.apache.phoenix.util.PhoenixRuntime;
    +import org.apache.phoenix.util.PropertiesUtil;
    +import org.apache.phoenix.util.ReadOnlyProps;
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +
    +import com.google.common.collect.Maps;
    +
    +@Category(ClientManagedTimeTest.class)
    +public class StatsCollectorIT extends BaseTest {
    +  private static String url;
    +  private static PhoenixTestDriver driver;
    +  private static HBaseTestingUtility util;
    --- End diff --
    
    No need for your own driver or HBaseTestingUtility - you should be able to do everything you need at the Phoenix level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696487
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsCollector.java ---
    @@ -0,0 +1,295 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stat;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.Coprocessor;
    +import org.apache.hadoop.hbase.CoprocessorEnvironment;
    +import org.apache.hadoop.hbase.HTableDescriptor;
    +import org.apache.hadoop.hbase.KeyValue;
    +import org.apache.hadoop.hbase.KeyValueUtil;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
    +import org.apache.hadoop.hbase.coprocessor.CoprocessorException;
    +import org.apache.hadoop.hbase.coprocessor.CoprocessorService;
    +import org.apache.hadoop.hbase.coprocessor.ObserverContext;
    +import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
    +import org.apache.hadoop.hbase.protobuf.ResponseConverter;
    +import org.apache.hadoop.hbase.regionserver.HRegion;
    +import org.apache.hadoop.hbase.regionserver.InternalScanner;
    +import org.apache.hadoop.hbase.regionserver.KeyValueScanner;
    +import org.apache.hadoop.hbase.regionserver.RegionScanner;
    +import org.apache.hadoop.hbase.regionserver.ScanType;
    +import org.apache.hadoop.hbase.regionserver.Store;
    +import org.apache.hadoop.hbase.regionserver.StoreScanner;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectRequest;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectResponse;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectResponse.Builder;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectService;
    +import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
    +import org.apache.phoenix.schema.PDataType;
    +import org.apache.phoenix.schema.PhoenixArray;
    +import org.apache.phoenix.util.SchemaUtil;
    +
    +import com.google.protobuf.RpcCallback;
    +import com.google.protobuf.RpcController;
    +import com.google.protobuf.Service;
    +
    +/**
    + * An endpoint implementation that allows to collect the stats for a given
    + * region and groups the stat per family. This is also an RegionObserver that
    + * collects the information on compaction also. The user would be allowed to
    + * invoke this endpoint and thus populate the Phoenix stats table with the max
    + * key, min key and guide posts for the given region. The stats can be consumed
    + * by the stats associated with every PTable and the same can be used to
    + * parallelize the queries
    + */
    +public class StatisticsCollector extends BaseRegionObserver implements CoprocessorService,
    +    Coprocessor, StatisticsTracker, StatCollectService.Interface {
    +
    +  public static void addToTable(HTableDescriptor desc) throws IOException {
    +    desc.addCoprocessor(StatisticsCollector.class.getName());
    +  }
    +
    +  private byte[] min;
    +  private byte[] max;
    +  private long guidepostDepth;
    +  private long byteCount = 0;
    +  private List<byte[]> guidePosts = new ArrayList<byte[]>();
    +  private Set<byte[]> familyMap = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
    +  private RegionCoprocessorEnvironment env;
    +  protected StatisticsTable stats;
    +  private static final Log LOG = LogFactory.getLog(StatCollectService.class);
    +
    +  @Override
    +  public void collectStat(RpcController controller, StatCollectRequest request,
    +      RpcCallback<StatCollectResponse> done) {
    +    HRegion region = env.getRegion();
    +    if (familyMap != null) {
    +      familyMap.clear();
    +    }
    +    Scan scan = new Scan();
    +    // We should have a mechanism to set the caching here.
    +    // TODO : Should the scan object be coming from the endpoint?
    --- End diff --
    
    Pass through startRow/stopRow at a minimum.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695134
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -1594,4 +1775,46 @@ public boolean filterRowKey(byte[] buffer, int offset, int length) {
                 return cmp != 0;
             }
         }
    +    /**
    +     * A thread (Chore) that keeps reading the updated stats and invalidates the Meta cache incase there is 
    +     * a change in the stats
    +     */
    +    private static class StatisticsUpdator extends Chore {
    --- End diff --
    
    I don't think we'll need this at all if we just re-read the stats when we re-load the table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16694959
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    I forgot to mention this before, but it's important that we always use the guide posts here to determine the set of work that will be done. We'll get rid of the hint for not splitting up work. I'm not super keen on falling back to using Bytes.split(), but not sure what a better alternative would be. I suppose when a table is initially created this will be the case for a while. We should at a minimum log a warning if we fallback to this. Any ideas here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696446
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/PTableStats.java ---
    @@ -22,23 +22,116 @@
     import org.apache.hadoop.hbase.HRegionInfo;
     
     
    -/**
    - * Interface for Phoenix table statistics. Statistics is collected on the server
    - * side and can be used for various purpose like splitting region for scanning, etc.
    - * 
    - * The table is defined on the client side, but it is populated on the server side. The client
    - * should not populate any data to the statistics object.
    +/*
    + * The table is defined on the client side, but it is populated on the server side. The client should not populate any data to the
    + * statistics object.
      */
     public interface PTableStats {
     
         /**
    -     * Given the region info, returns an array of bytes that is the current estimate of key
    -     * distribution inside that region. The keys should split that region into equal chunks.
    +     * Given the region info, returns an array of bytes that is the current estimate of key distribution inside that region. The keys should
    +     * split that region into equal chunks.
          * 
          * @param region
          * @return array of keys
          */
    -    byte[][] getRegionGuidePosts(HRegionInfo region);
    +    byte[] getRegionGuidePostsOfARegion(HRegionInfo region);
    +
    +    /**
    +     * Returns a map of with key as the regions of the table and the guide posts collected as byte[] created from VARBINARY ARRAY
    +     * 
    +     * @return map of region and the guide posts as VARBINARY array
    +     */
    +    Map<String, byte[]> getGuidePosts();
    +
    +    /**
    +     * Returns the max key of a region
    +     * 
    +     * @param region
    +     * @return maxkey of a region
    +     */
    +    byte[] getMaxKeyOfARegion(HRegionInfo region);
    +
    +    /**
    +     * Returns the min key of a region
    +     * 
    +     * @param region
    +     * @return min key of a region
    +     */
    +    byte[] getMinKeyOfARegion(HRegionInfo region);
    +
    +    /**
    +     * Returns a map with key as the regions of the table and the max key in that region
    +     * 
    +     * @return
    +     */
    +    Map<String, byte[]> getMaxKey();
    +
    +    /**
    +     * Returns a map with key as the regions of the table and the min key in that region
    +     * 
    +     * @return
    +     */
    +    Map<String, byte[]> getMinKey();
    +
    +    /**
    +     * Set the min key of a region
    +     * 
    +     * @param region
    +     * @param minKey
    +     * @param offset
    +     * @param length
    +     */
    +    void setMinKey(HRegionInfo region, byte[] minKey, int offset, int length);
    +
    +    /**
    +     * Set the min key of the region. Here the region name is represented as the Bytes.toBytes(region.getRegionNameAsString).
    --- End diff --
    
    Keep PTableStats as immutable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16720928
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
    @@ -806,6 +849,11 @@ public AlterIndexStatement alterIndex(NamedTableNode indexTableNode, String data
             public ExplainStatement explain(BindableStatement statement) {
                 return new ExecutableExplainStatement(statement);
             }
    +        
    --- End diff --
    
    Ah, right - never mind - thought it was somehow related to ExecutableAddColumnStatement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16729698
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    I'd rather just keep only what we need, especially for PTable. Keep the server-side code that figures out what the min and max are, but we won't need to calculate it for now.
    
    I think once the guidepost work gets wrapped up, we can step back and do some design on what additional stats we'll need (to guide other optimization decisions) and how to surface these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16766530
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsScanner.java ---
    @@ -0,0 +1,109 @@
    +/*
    --- End diff --
    
    Removed StatisticsWriter and StatisticsValue( later we may need this like a simple POJO) when we need some more stats to be added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-53409242
  
        One thing I didn't see is how you're handling multiple column families. Is that still left to be done?
    No not yet in this patch.  I initially created a patch where there was some sort of grouping that happened per family. Later dropped it.
    
         Think a bit on when you think is the best time to "merge" the guideposts. Maybe when you read the stats table in MetaDataEndPointImpl you can execute this logic and then the PTableStats that get passed into PTable are already the "merged" ones?
    
    Merge in the sense- after the guideposts are collected per family? You mean this "merge"



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16796319
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    Might be better to do this check from the Phoenix client *before* you invoke the coprocessor. Also, I think you'd only want to update the last_analyze_start_date when the entire table is analyzed (as opposed to just part of it, for example when a split occurs - we should kick off an analyze for the two regions that split, or when a view is analyzed which would just re-analyze a part of the table)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r17061906
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    +1 to storing guideposts in sorted order (hopefully that's a byproduct of
    the way you process/add them.
    
    No need to adjust the guidepost - treat them similar to region boundaries.
    The first range should start from an empty byte array and go to the first
    guidepost, next one from first to next guidepost, and last one from last
    guidepost to an empty byte array.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695740
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
    @@ -806,6 +849,11 @@ public AlterIndexStatement alterIndex(NamedTableNode indexTableNode, String data
             public ExplainStatement explain(BindableStatement statement) {
                 return new ExecutableExplainStatement(statement);
             }
    +        
    --- End diff --
    
    Left over from testing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16794520
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsScanner.java ---
    @@ -0,0 +1,109 @@
    +/*
    --- End diff --
    
    Ok, whatever you think works best here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699150
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    Also, we won't use targetConcurrency and maxConcurrency. Instead we'll add all guideposts for the regions here (they'll get pruned later when we apply/intersect them with the scan start/stop row). This will help in a bunch of ways by dividing the query up into smaller equal sized scans - we won't get lease expiration exceptions from the server b/c the client will be specifying smaller chunks. We'll also get better fairness between queries, b/c no one query will dominate a thread for too long a time. It also simplifies things quite a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699028
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -1594,4 +1775,46 @@ public boolean filterRowKey(byte[] buffer, int offset, int length) {
                 return cmp != 0;
             }
         }
    +    /**
    +     * A thread (Chore) that keeps reading the updated stats and invalidates the Meta cache incase there is 
    +     * a change in the stats
    +     */
    +    private static class StatisticsUpdator extends Chore {
    --- End diff --
    
    I think all this periodic checking of stats table itself wont't be needed is what you mean? So my doubt is this suppose I have some select queries coming periodically to a table and there are updations also happening to that table, if we are able to collect the stats periodically, for the newer select queries we would be able to use the udpated information right? Only if we collect the stats during re-load it would mean that if there is no schema change or index addition/deletion the updated stats info may not be used. For me I think periodic updation of the stats would be a better choice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-54037058
  
    DefaultParallelIteratorsRegionSplitterIT - how about these test cases.  We need the new behaviour or we need to update the test cases to get the required result?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16698610
  
    --- Diff: phoenix-core/src/it/java/org/apace/phoenix/hbase/schema/stats/StatsCollectorIT.java ---
    @@ -0,0 +1,260 @@
    +package org.apace.phoenix.hbase.schema.stats;
    +
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_TERMINATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.PHOENIX_TEST_DRIVER_URL_PARAM;
    +import static org.apache.phoenix.util.TestUtil.LOCALHOST;
    +import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
    +import static org.junit.Assert.assertTrue;
    +
    +import java.sql.Array;
    +import java.sql.Connection;
    +import java.sql.DriverManager;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.SQLException;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Properties;
    +
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.CellScanner;
    +import org.apache.hadoop.hbase.HBaseConfiguration;
    +import org.apache.hadoop.hbase.HBaseTestingUtility;
    +import org.apache.hadoop.hbase.HConstants;
    +import org.apache.hadoop.hbase.HRegionLocation;
    +import org.apache.hadoop.hbase.client.HTable;
    +import org.apache.hadoop.hbase.client.Result;
    +import org.apache.hadoop.hbase.client.ResultScanner;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.compile.SequenceManager;
    +import org.apache.phoenix.compile.StatementContext;
    +import org.apache.phoenix.coprocessor.MetaDataProtocol.MetaDataMutationResult;
    +import org.apache.phoenix.end2end.ClientManagedTimeTest;
    +import org.apache.phoenix.iterate.DefaultParallelIteratorRegionSplitter;
    +import org.apache.phoenix.jdbc.PhoenixConnection;
    +import org.apache.phoenix.jdbc.PhoenixStatement;
    +import org.apache.phoenix.jdbc.PhoenixTestDriver;
    +import org.apache.phoenix.parse.HintNode;
    +import org.apache.phoenix.query.BaseTest;
    +import org.apache.phoenix.query.ConnectionQueryServices;
    +import org.apache.phoenix.query.KeyRange;
    +import org.apache.phoenix.query.QueryServices;
    +import org.apache.phoenix.schema.PTable;
    +import org.apache.phoenix.schema.PTableKey;
    +import org.apache.phoenix.schema.TableRef;
    +import org.apache.phoenix.schema.stat.StatisticsConstants;
    +import org.apache.phoenix.util.PhoenixRuntime;
    +import org.apache.phoenix.util.PropertiesUtil;
    +import org.apache.phoenix.util.ReadOnlyProps;
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +
    +import com.google.common.collect.Maps;
    +
    +@Category(ClientManagedTimeTest.class)
    +public class StatsCollectorIT extends BaseTest {
    +  private static String url;
    +  private static PhoenixTestDriver driver;
    +  private static HBaseTestingUtility util;
    --- End diff --
    
    Ok.. I added for doing some testing from my side if the entries are added to the stats table. Will remove them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696586
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsScanner.java ---
    @@ -0,0 +1,109 @@
    +/*
    --- End diff --
    
    These classes seem like overkill given what we're collecting. Can this be simplified?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696389
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/PTableImpl.java ---
    @@ -975,9 +1023,91 @@ public static PTable createFromProto(PTableProtos.PTable table) {
     
           return builder.build();
         }
    +    
    --- End diff --
    
    Is all this new code, or did it get moved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695765
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/parse/UpdateStatisticsStatement.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.parse;
    +
    +public class UpdateStatisticsStatement extends SingleTableStatement {
    +
    +  public UpdateStatisticsStatement(NamedTableNode table, int bindCount) {
    --- End diff --
    
    Just don't pass bindCount here and have a default method that returns 0. I can't think of a case where there'd be bind variables for this statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-54406335
  
    
        java.lang.IllegalArgumentException: KeyValue size too large
            at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1312)
            at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:941)
            at org.apache.hadoop.hbase.client.HTable.put(HTable.java:908)
            at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$Environment$HTableWrapper.put(CoprocessorHost.java:444)
            at org.apache.phoenix.schema.stat.StatisticsTable.updateStats(StatisticsTable.java:126)
            at org.apache.phoenix.schema.stat.StatisticsScanner.close(StatisticsScanner.java:90)
            at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:87)
            at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:109)
            at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1086)
            at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1480)
            at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:475)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
    
    What could be ideal value for those guidePost collection - Like after how many bytes could we collect the guideposts?  And for the performance.py what could be the best value based on its distribution?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16765696
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsCollector.java ---
    @@ -0,0 +1,295 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stat;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.Coprocessor;
    +import org.apache.hadoop.hbase.CoprocessorEnvironment;
    +import org.apache.hadoop.hbase.HTableDescriptor;
    +import org.apache.hadoop.hbase.KeyValue;
    +import org.apache.hadoop.hbase.KeyValueUtil;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
    +import org.apache.hadoop.hbase.coprocessor.CoprocessorException;
    +import org.apache.hadoop.hbase.coprocessor.CoprocessorService;
    +import org.apache.hadoop.hbase.coprocessor.ObserverContext;
    +import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
    +import org.apache.hadoop.hbase.protobuf.ResponseConverter;
    +import org.apache.hadoop.hbase.regionserver.HRegion;
    +import org.apache.hadoop.hbase.regionserver.InternalScanner;
    +import org.apache.hadoop.hbase.regionserver.KeyValueScanner;
    +import org.apache.hadoop.hbase.regionserver.RegionScanner;
    +import org.apache.hadoop.hbase.regionserver.ScanType;
    +import org.apache.hadoop.hbase.regionserver.Store;
    +import org.apache.hadoop.hbase.regionserver.StoreScanner;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectRequest;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectResponse;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectResponse.Builder;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectService;
    +import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
    +import org.apache.phoenix.schema.PDataType;
    +import org.apache.phoenix.schema.PhoenixArray;
    +import org.apache.phoenix.util.SchemaUtil;
    +
    +import com.google.protobuf.RpcCallback;
    +import com.google.protobuf.RpcController;
    +import com.google.protobuf.Service;
    +
    +/**
    + * An endpoint implementation that allows to collect the stats for a given
    + * region and groups the stat per family. This is also an RegionObserver that
    + * collects the information on compaction also. The user would be allowed to
    + * invoke this endpoint and thus populate the Phoenix stats table with the max
    + * key, min key and guide posts for the given region. The stats can be consumed
    + * by the stats associated with every PTable and the same can be used to
    + * parallelize the queries
    + */
    +public class StatisticsCollector extends BaseRegionObserver implements CoprocessorService,
    +    Coprocessor, StatisticsTracker, StatCollectService.Interface {
    +
    +  public static void addToTable(HTableDescriptor desc) throws IOException {
    +    desc.addCoprocessor(StatisticsCollector.class.getName());
    +  }
    +
    +  private byte[] min;
    +  private byte[] max;
    +  private long guidepostDepth;
    +  private long byteCount = 0;
    +  private List<byte[]> guidePosts = new ArrayList<byte[]>();
    +  private Set<byte[]> familyMap = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
    +  private RegionCoprocessorEnvironment env;
    +  protected StatisticsTable stats;
    +  private static final Log LOG = LogFactory.getLog(StatCollectService.class);
    +
    +  @Override
    +  public void collectStat(RpcController controller, StatCollectRequest request,
    +      RpcCallback<StatCollectResponse> done) {
    +    HRegion region = env.getRegion();
    +    if (familyMap != null) {
    +      familyMap.clear();
    +    }
    +    Scan scan = new Scan();
    +    // We should have a mechanism to set the caching here.
    +    // TODO : Should the scan object be coming from the endpoint?
    +    scan.setCaching(1000);
    +    // TODO : Should we scan fully or should we do as done StatsManagerImpl
    +    // reading the
    +    // first row and the
    +    // last row after setting scan.setReserved(true). But doing so we cannot get
    +    // the guideposts
    +    RegionScanner scanner = null;
    +    int count = 0;
    +    try {
    +      scanner = region.getScanner(scan);
    +      List<Cell> results = new ArrayList<Cell>();
    +      boolean hasMore = true;
    +      while (hasMore) {
    +        // Am getting duplicates here?  Need to avoid that
    +        hasMore = scanner.next(results);
    +        updateStat(results);
    +        count += results.size();
    +        results.clear();
    +        while (!hasMore) {
    +          break;
    +        }
    +      }
    +    } catch (IOException e) {
    +      LOG.error(e);
    +      ResponseConverter.setControllerException(controller, e);
    +    } finally {
    +      if (scanner != null) {
    +        try {
    +          scanner.close();
    +          IOException toThrow = null;
    +          try {
    +            // update the statistics table
    +            for (byte[] fam : familyMap) {
    +              byte[] tableKey = SchemaUtil.getTableKey(request.getTenantIdBytes().toByteArray(),
    +                  request.getSchemaNameBytes().toByteArray(), region.getRegionInfo().getTable()
    +                      .getName());
    +              stats.updateStats(new StatisticsWriter(tableKey,
    +                  Bytes.toBytes(region.getRegionInfo().getRegionNameAsString())), this, fam);
    +            }
    +          } catch (IOException e) {
    +            LOG.error("Failed to update statistics table!", e);
    +            toThrow = e;
    +          }
    +        } catch (IOException e) {
    +          LOG.error(e);
    +          ResponseConverter.setControllerException(controller, e);
    +        }
    +      }
    +    }
    +    Builder newBuilder = StatCollectResponse.newBuilder();
    +    newBuilder.setRowsScanned(count);
    +    StatCollectResponse result = newBuilder.build();
    +    done.run(result);
    +  }
    +
    +  /**
    +   * Update the current statistics based on the lastest batch of key-values from
    +   * the underlying scanner
    +   * 
    +   * @param results
    +   *          next batch of {@link KeyValue}s
    +   */
    +  protected void updateStat(final List<Cell> results) {
    +    for (Cell c : results) {
    +      updateStatistic(KeyValueUtil.ensureKeyValue(c));
    +    }
    +  }
    +
    +  @Override
    +  public Service getService() {
    +    return StatCollectorProtos.StatCollectService.newReflectiveService(this);
    +  }
    +
    +  @Override
    +  public void start(CoprocessorEnvironment env) throws IOException {
    +    if (env instanceof RegionCoprocessorEnvironment) {
    +      this.env = (RegionCoprocessorEnvironment) env;
    +    } else {
    +      throw new CoprocessorException("Must be loaded on a table region!");
    +    }
    +    HTableDescriptor desc = ((RegionCoprocessorEnvironment) env).getRegion().getTableDesc();
    +    // Get the stats table associated with the current table on which the CP is
    +    // triggered
    +    stats = StatisticsTable.getStatisticsTableForCoprocessor(env, desc.getName());
    +    guidepostDepth = env.getConfiguration().getLong(
    +        StatisticsConstants.HISTOGRAM_BYTE_DEPTH_CONF_KEY,
    +        StatisticsConstants.HISTOGRAM_DEFAULT_BYTE_DEPTH);
    +  }
    +
    +  @Override
    +  public void stop(CoprocessorEnvironment arg0) throws IOException {
    +    stats.close();
    +  }
    +
    +  @Override
    +  public InternalScanner preCompactScannerOpen(ObserverContext<RegionCoprocessorEnvironment> c,
    +      Store store, List<? extends KeyValueScanner> scanners, ScanType scanType, long earliestPutTs,
    +      InternalScanner s) throws IOException {
    +    // TODO : Major concern here is that we cannot get the complete name of the table with schema and tenantId here.  
    --- End diff --
    
    Changed this.  Let me check if it is woking fine. If so then no problem. Can remove the TODO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16720268
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16796576
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    ACtually in the current schema I have added CF in the primary key followed by the stats cols (includes min and max key and the guide posts.) I can collect the information per CF.  When we try collecting the guideposts it would be any way per region and inside that we could group per CF and write the same to the stats table.  The stats table wil have table name not null, region name, cf name followed by min, max and guide posts?
    By using VARBINARY ARRAY the entire set of guide posts collected for that region can be combined as one entry. 
    So if we are adding guide posts per CF then the PTableStats also will accept a map with key as CF and then the guide posts? Sorry for the asking more questions just making myself clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16704699
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    As there is no need for checking if there is any change in the stats we need not do any comparison of the stats too.  This would mean that it would be easy to make PTableStats immutable too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16795912
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    So inside the CP we actually do a scan of the table's region for which update stats is issued. So before that create an HTable instance to talk to the system.stats table and read the column where the last analyzed time is stored for that table and then proceed with the scan of the region? I am bit skeptical here of talking with the stats table inside the endpoint call for the table's region.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-54078575
  
    Yes, the DefaultParallelIteratorsRegionSplitterIT should be modified to test the new behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696289
  
    --- Diff: phoenix-protocol/src/main/StatisticsCollect.proto ---
    @@ -0,0 +1,20 @@
    +option java_package = "org.apache.phoenix.coprocessor.generated";
    +option java_outer_classname = "StatCollectorProtos";
    +option java_generic_services = true;
    +option java_generate_equals_and_hash = true;
    +option optimize_for = SPEED;
    +
    +
    +message StatCollectRequest {
    +  required bytes tenantIdBytes = 1;
    +  required bytes schemaNameBytes = 2;
    --- End diff --
    
    Include both schemaNameBytes and tableNameBytes here.
    Also, add a bytes startRow and bytes stopRow so we can do a "partial" status update. This will be especially useful to enable a tenant-specific table to update only its own stats.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16698996
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    One other caveat: since it's quite possible that the table won't change (i.e. won't have any columns or indexes added/removed) so will remain in the cache too long, we should modify the cache for a max TTL for a cache entry. This can be based on this update frequency for stats. That way, a table, regardless of if it's been changed, will age out of the cache and be re-loaded along with any new stats. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-53378107
  
    Nice start, @ramkrish86. I went through your pull and gave you some feedback. Please let me know if you have comments/questions. Thanks - this is going to be a big improvement!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699532
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsScanner.java ---
    @@ -0,0 +1,109 @@
    +/*
    --- End diff --
    
    Ok, I'll leave it up to you. That was just my first impression, but I'm fine with it if you think they add value without adding too much complexity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696980
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    --- End diff --
    
    See other examples here of how to get the PTable given the tenantId, schemaName, and tableName. You'd want to look it up and then use the PTable.getPhysicalName() to get the actual HBase table. See below for how to handle if there's a tenantId, but it would form a start/stop key that should be passed through the update stats method (it's fine to TODO this for now).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699156
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    +        HTableInterface ht = null;
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getTenantId() != null) {
    +            tenIdBytes = connection.getTenantId().getBytes();
    --- End diff --
    
    I will add a TODO and file a JIRA for now.  Once the chore part is done I will take up that JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16996015
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    The table build by buildTable() will have a timeStamp of MAX(timeStamp) over all its cells.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-54002950
  
    @ramkrish86 - just wanted to make sure you're not waiting on me for anything. I think this will be the "big" feature for 3.2/4.2 release. I think we should shoot for a monthly release schedule (like HBase does). If you can make the above changes and do 3-5M row local perf test, I think we'll be just about ready to pull this in. Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-54014952
  
    Sounds good, Ram. Thanks for all your effort on this one. Looking forward to see the perf numbers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16937281
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    One minor issue is the case where we have no column family. It's possible to not declare any key value column in a CREATE TABLE statement. In that case, the PTable will no column families. In that case, I'd suppose we could just store the guideposts on the PTable instead of the PColumnFamily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16730042
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    Ok. My thinking was as we are not collecting any other stats in the stats table other than min, max keys and the guide posts it made sense to design PTableStats to support these basics.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16726689
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    Ok.. So guidePosts can be for the table but the minkey and maxkey is per region right? Will work based on the feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699103
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/parse/UpdateStatisticsStatement.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.parse;
    +
    +public class UpdateStatisticsStatement extends SingleTableStatement {
    +
    +  public UpdateStatisticsStatement(NamedTableNode table, int bindCount) {
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16698592
  
    --- Diff: phoenix-core/src/it/java/org/apace/phoenix/hbase/schema/stats/StatsCollectorIT.java ---
    @@ -0,0 +1,260 @@
    +package org.apace.phoenix.hbase.schema.stats;
    +
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_TERMINATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.PHOENIX_TEST_DRIVER_URL_PARAM;
    +import static org.apache.phoenix.util.TestUtil.LOCALHOST;
    +import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
    +import static org.junit.Assert.assertTrue;
    +
    +import java.sql.Array;
    +import java.sql.Connection;
    +import java.sql.DriverManager;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.SQLException;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Properties;
    +
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.CellScanner;
    +import org.apache.hadoop.hbase.HBaseConfiguration;
    +import org.apache.hadoop.hbase.HBaseTestingUtility;
    +import org.apache.hadoop.hbase.HConstants;
    +import org.apache.hadoop.hbase.HRegionLocation;
    +import org.apache.hadoop.hbase.client.HTable;
    +import org.apache.hadoop.hbase.client.Result;
    +import org.apache.hadoop.hbase.client.ResultScanner;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.compile.SequenceManager;
    +import org.apache.phoenix.compile.StatementContext;
    +import org.apache.phoenix.coprocessor.MetaDataProtocol.MetaDataMutationResult;
    +import org.apache.phoenix.end2end.ClientManagedTimeTest;
    +import org.apache.phoenix.iterate.DefaultParallelIteratorRegionSplitter;
    +import org.apache.phoenix.jdbc.PhoenixConnection;
    +import org.apache.phoenix.jdbc.PhoenixStatement;
    +import org.apache.phoenix.jdbc.PhoenixTestDriver;
    +import org.apache.phoenix.parse.HintNode;
    +import org.apache.phoenix.query.BaseTest;
    +import org.apache.phoenix.query.ConnectionQueryServices;
    +import org.apache.phoenix.query.KeyRange;
    +import org.apache.phoenix.query.QueryServices;
    +import org.apache.phoenix.schema.PTable;
    +import org.apache.phoenix.schema.PTableKey;
    +import org.apache.phoenix.schema.TableRef;
    +import org.apache.phoenix.schema.stat.StatisticsConstants;
    +import org.apache.phoenix.util.PhoenixRuntime;
    +import org.apache.phoenix.util.PropertiesUtil;
    +import org.apache.phoenix.util.ReadOnlyProps;
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +
    +import com.google.common.collect.Maps;
    +
    +@Category(ClientManagedTimeTest.class)
    +public class StatsCollectorIT extends BaseTest {
    --- End diff --
    
    Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16705274
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
    @@ -806,6 +849,11 @@ public AlterIndexStatement alterIndex(NamedTableNode indexTableNode, String data
             public ExplainStatement explain(BindableStatement statement) {
                 return new ExecutableExplainStatement(statement);
             }
    +        
    --- End diff --
    
    You mean the empty line or the 
        @Override
         public UpdateStatisticsStatement updateStatistics(NamedTableNode table) {
              return new ExecutableUpdateStatisticsStatement(table, 0);
            }
    This is just an implementation in the ParseNodeFactory


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699284
  
    --- Diff: phoenix-core/src/it/java/org/apace/phoenix/hbase/schema/stats/StatsCollectorIT.java ---
    @@ -0,0 +1,260 @@
    +package org.apace.phoenix.hbase.schema.stats;
    +
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.JDBC_PROTOCOL_TERMINATOR;
    +import static org.apache.phoenix.util.PhoenixRuntime.PHOENIX_TEST_DRIVER_URL_PARAM;
    +import static org.apache.phoenix.util.TestUtil.LOCALHOST;
    +import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
    +import static org.junit.Assert.assertTrue;
    +
    +import java.sql.Array;
    +import java.sql.Connection;
    +import java.sql.DriverManager;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.SQLException;
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Properties;
    +
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.CellScanner;
    +import org.apache.hadoop.hbase.HBaseConfiguration;
    +import org.apache.hadoop.hbase.HBaseTestingUtility;
    +import org.apache.hadoop.hbase.HConstants;
    +import org.apache.hadoop.hbase.HRegionLocation;
    +import org.apache.hadoop.hbase.client.HTable;
    +import org.apache.hadoop.hbase.client.Result;
    +import org.apache.hadoop.hbase.client.ResultScanner;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.compile.SequenceManager;
    +import org.apache.phoenix.compile.StatementContext;
    +import org.apache.phoenix.coprocessor.MetaDataProtocol.MetaDataMutationResult;
    +import org.apache.phoenix.end2end.ClientManagedTimeTest;
    +import org.apache.phoenix.iterate.DefaultParallelIteratorRegionSplitter;
    +import org.apache.phoenix.jdbc.PhoenixConnection;
    +import org.apache.phoenix.jdbc.PhoenixStatement;
    +import org.apache.phoenix.jdbc.PhoenixTestDriver;
    +import org.apache.phoenix.parse.HintNode;
    +import org.apache.phoenix.query.BaseTest;
    +import org.apache.phoenix.query.ConnectionQueryServices;
    +import org.apache.phoenix.query.KeyRange;
    +import org.apache.phoenix.query.QueryServices;
    +import org.apache.phoenix.schema.PTable;
    +import org.apache.phoenix.schema.PTableKey;
    +import org.apache.phoenix.schema.TableRef;
    +import org.apache.phoenix.schema.stat.StatisticsConstants;
    +import org.apache.phoenix.util.PhoenixRuntime;
    +import org.apache.phoenix.util.PropertiesUtil;
    +import org.apache.phoenix.util.ReadOnlyProps;
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +import org.junit.experimental.categories.Category;
    +
    +import com.google.common.collect.Maps;
    +
    +@Category(ClientManagedTimeTest.class)
    +public class StatsCollectorIT extends BaseTest {
    +  private static String url;
    +  private static PhoenixTestDriver driver;
    +  private static HBaseTestingUtility util;
    --- End diff --
    
    You can get at stats in a number of ways: query the system.stats directly or use the PTableStats off of PTable (which is easy to get to by doing an unwrap to access the cache directly from ConnectionQueryServices)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r17064233
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    Make sure that your KeyRange is inclusive on lower and exclusive on upper
    part of range.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696921
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    +        HTableInterface ht = null;
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getTenantId() != null) {
    +            tenIdBytes = connection.getTenantId().getBytes();
    --- End diff --
    
    This is a tricky, but kind of interesting case. If there's a tenantId, then that means that the user has asked to analyze stats for a tenant-specific table. In that case, you'd want to get the physical table name from PTable to pass through the endpoint call, but we'd want an option of passing a startRow/stopRow to only analyze *part* of a table. It's ok if you want to TODO this, but please file a JIRA for this (and still always get the physical table name).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r17045111
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    
         this will be a bit easier if the guideposts are sorted already
    Ideally the guideposts should be sorted because we are inserting the guideposts with the following rowkey - tablename, fam and regioname - so considering there is only one fam those sets of guideposts would be sorted i feel. 
    Another is, if the scan has to retrieve the row that is part of the guidepost itself then the splits that we form from the guideposts, will actually needs to be incremented by +1 (last byte) so that the upperrange of the key range could be used as the stop row and will cover the exact row that is as part of the guide posts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16972273
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    Even after updating the cell. I ended up in the same problem :(. I think may be because of the test case that is clientmanaged. (DefaultParallelIteratorsRegionSplitterIT)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16708853
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
        Instead we'll add all guideposts for the regions here
    All the stats that we collect should be region based right? Or you mean collect the guideposts for the table and then use that information? But I doubt when we do the scan in the endpoint we would anyway collecting it per region.  The reason is if we try collecting it per region I fear if we would need to set the per region stats on the PTableStats object? Not making it immutable? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-53386020
  
    One thing I didn't see is how you're handling multiple column families. Is that still left to be done? Think a bit on when you think is the best time to "merge" the guideposts. Maybe when you read the stats table in MetaDataEndPointImpl you can execute this logic and then the PTableStats that get passed into PTable are already the "merged" ones? Probably don't want to do this in ParallelIterators, as you'd do it again and again for every query execution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-53618543
  
    Looking close, @ramkrish86. Nice work. I think you need to just code up the consumption of the guideposts in ParallelIterators (see my comment there), add the logic for a "min" elapsed time for stats collection, and then do some basic local perf testing to see the impact. Then we can follow up with the issue that handles views correctly (just passing through the start/stop row through your coprocessor call) and add some more tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16961974
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    Though we invalidate the meta data cache and again do update cache so that we get the latest stats updated to the PTableImpl or PColumnFamily but how to clear it from the TAbleRef? Still it holds the older reference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16767859
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
        We have the concept of a "minimum time to recalc stats" as a separate config and store the time when the stats were last calculated in the stats table.
    So storing this in the stats table means per table again? So we will read the stats table and seeing the timestamp we will not further read the stats table and will become a NOOP ? Am i right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16971394
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    Use a select query - we try to use Phoenix as much as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699329
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsScanner.java ---
    @@ -0,0 +1,109 @@
    +/*
    --- End diff --
    
    StatisticsScanner is something we may need if we want to collect the stats as part of major compaction.
    StatisticsWriter may be can be simplified and merged. similarly with StatisticsValue, can be made an inner class (POJO like class)
    StatisticsTracker i think it would be needed so that we could extend that and impl that in the observers.
    StatisticsUtils are just util methods to add some utitilites to extract the table name, region name from the STAT table's row key.
    StatisticsTable is a wrapper on the STATS HTable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16694641
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseParallelIteratorsRegionSplitterIT.java ---
    @@ -86,10 +87,14 @@ protected void initTableValues(long ts) throws Exception {
             conn.close();
         }
     
    -    protected static TableRef getTableRef(Connection conn, long ts) throws SQLException {
    +    protected static TableRef getTableRef(Connection conn, long ts, PTable pTable) throws SQLException {
             PhoenixConnection pconn = conn.unwrap(PhoenixConnection.class);
    -        TableRef table = new TableRef(null,pconn.getMetaDataCache().getTable(new PTableKey(pconn.getTenantId(), STABLE_NAME)),ts, false);
    -        return table;
    +        if (pTable == null) {
    --- End diff --
    
    Is this function really worth sharing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696378
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    +        HTableInterface ht = null;
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getTenantId() != null) {
    +            tenIdBytes = connection.getTenantId().getBytes();
    +        }
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getSchema() != null) {
    +            schNameInBytes = Bytes.toBytes(connection.getSchema());
    +        }
    +        final byte[] tenantIdBytes = tenIdBytes;
    +        final byte[] schemaNameInBytes = schNameInBytes;
    +        try {
    +            ht = connection.getQueryServices().getTable(Bytes.toBytes(tableName));
    +            Batch.Call<StatCollectService, StatCollectResponse> callable = new Batch.Call<StatCollectService, StatCollectResponse>() {
    +                ServerRpcController controller = new ServerRpcController();
    +                BlockingRpcCallback<StatCollectResponse> rpcCallback = new BlockingRpcCallback<StatCollectResponse>();
    +
    +                @Override
    +                public StatCollectResponse call(StatCollectService service) throws IOException {
    +                    StatCollectRequest.Builder builder = StatCollectRequest.newBuilder();
    +                    builder.setTenantIdBytes(HBaseZeroCopyByteString.wrap(tenantIdBytes));
    +                    builder.setSchemaNameBytes(HBaseZeroCopyByteString.wrap(schemaNameInBytes));
    --- End diff --
    
    Should be a builder.steTableNameBytes too, as schemaName and tableName are separate - see other calls on MetaDataEndpointImpl.
    
    We should also add a startRow/stopRow so we can do a partial update of stats. See note in protobuf def of stats request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699691
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/PTableStatsImpl.java ---
    @@ -16,40 +16,249 @@
      * limitations under the License.
      */
     package org.apache.phoenix.schema.stat;
    -
    + import java.util.Collection;
    +import java.util.HashMap;
     import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     import org.apache.hadoop.hbase.HRegionInfo;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
     
     import com.google.common.collect.ImmutableMap;
    + 
    + /**
    +  * Implementation for PTableStats.
    +  */
    + public class PTableStatsImpl implements PTableStats {
    + 
    +   public static final PTableStats NO_STATS = new PTableStatsImpl();
    +   // The map for guide posts should be immutable. We only take the current
    +   // snapshot from outside
    +   // method call and store it.
    +   private Map<String, byte[]> regionGuidePosts = new HashMap<String, byte[]>();
    +   // TODO : see if a thread safe map would be needed here
    +   private Map<String, byte[]> minKey = new HashMap<String, byte[]>();
    +   private Map<String, byte[]> maxKey = new HashMap<String, byte[]>();
    + 
    +   public PTableStatsImpl() {
    +     this(new HashMap<String, byte[]>(), new HashMap<String, byte[]>(),
    +         new HashMap<String, byte[]>());
    +   }
    + 
    +   public PTableStatsImpl(Map<String, byte[]> guidePosts,
    +       Map<String, byte[]> minKey, Map<String, byte[]> maxKey) {
    +     this.regionGuidePosts = guidePosts;
    +     this.minKey = minKey;
    +     this.maxKey = maxKey;
    +   }
    + 
    +   @Override
    +   public byte[] getRegionGuidePostsOfARegion(HRegionInfo region) {
    +     return regionGuidePosts.get(Bytes.toString(region.getRegionName()));
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getGuidePosts() {
    +     if (regionGuidePosts != null) {
    +       return ImmutableMap.copyOf(regionGuidePosts);
    +     }
    + 
    +     return null;
    +   }
    + 
    +   @Override
    +   public byte[] getMaxKeyOfARegion(HRegionInfo region) {
    +     return Bytes.copy(this.minKey.get(Bytes.toString(region.getRegionName())));
    +   }
    + 
    +   @Override
    +   public byte[] getMinKeyOfARegion(HRegionInfo region) {
    +     return Bytes.copy(this.maxKey.get(Bytes.toString(region.getRegionName())));
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getMaxKey() {
    +     if (maxKey != null) {
    +       return ImmutableMap.copyOf(maxKey);
    +     }
    +     return null;
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getMinKey() {
    +     if (minKey != null) {
    +       return ImmutableMap.copyOf(minKey);
    +     }
    +     return null;
    +   }
    + 
    +   @Override
    +   public void setMinKey(HRegionInfo region, byte[] minKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(minKey, offset, length);
    +     this.minKey.put(region.getRegionNameAsString(), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMaxKey(HRegionInfo region, byte[] maxKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(maxKey, offset, length);
    +     this.maxKey.put(region.getRegionNameAsString(), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setGuidePosts(HRegionInfo region, byte[] guidePosts, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(guidePosts, offset, length);
    +     this.regionGuidePosts.put(region.getRegionNameAsString(),
    +         ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMinKey(byte[] region, byte[] minKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(minKey, offset, length);
    +     this.minKey.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMaxKey(byte[] region, byte[] maxKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(maxKey, offset, length);
    +     this.maxKey.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setGuidePosts(byte[] region, byte[] guidePosts, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(guidePosts, offset, length);
    +     this.regionGuidePosts.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public boolean equals(Object obj) {
    +     if (!(obj instanceof PTableStats)) {
    +       return false;
    +     }
    +     PTableStats pTableStats = (PTableStats) obj;
    +     // compare the max keys
    +     Map<String, byte[]> thatMaxKey = pTableStats.getMaxKey();
    +     Map<String, byte[]> thisMaxKey = this.getMaxKey();
    +     if (thatMaxKey.size() != thisMaxKey.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatMaxKey.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisMaxKey.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    + 
    +     // Check for min key
    +     Map<String, byte[]> thatMinKey = pTableStats.getMinKey();
    +     Map<String, byte[]> thisMinKey = this.getMinKey();
    +     if (thatMinKey.size() != thisMinKey.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatMinKey.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisMinKey.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    + 
    +     // Check for guide posts
    +     Map<String, byte[]> thatGuidePosts = pTableStats.getGuidePosts();
    +     Map<String, byte[]> thisGuidePosts = this.getGuidePosts();
    +     if (thatGuidePosts.size() != thisGuidePosts.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatGuidePosts.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisGuidePosts.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    +     return true;
    +   }
     
    -
    -/**
    - * Implementation for PTableStats.
    - */
    -public class PTableStatsImpl implements PTableStats {
    -
    -    // The map for guide posts should be immutable. We only take the current snapshot from outside
    -    // method call and store it.
    -    private Map<String, byte[][]> regionGuidePosts;
    -
    -    public PTableStatsImpl() { }
    -
    -    public PTableStatsImpl(Map<String, byte[][]> stats) {
    -        regionGuidePosts = ImmutableMap.copyOf(stats);
    -    }
    -
    -    @Override
    -    public byte[][] getRegionGuidePosts(HRegionInfo region) {
    -        return regionGuidePosts.get(region.getRegionNameAsString());
    -    }
    -
    -    @Override
    -    public Map<String, byte[][]> getGuidePosts(){
    -      if(regionGuidePosts != null) {
    -        return ImmutableMap.copyOf(regionGuidePosts);
    -      }
    -      
    -      return null;
    -    }
    +   @Override
    +   public int hashCode() {
    +     int hash = 1;
    --- End diff --
    
    I wouldn't assume that only the size changing means it's different. Just compare the hash code in the equals check and add a hashCode member variable that gets set at construction time.
    
    Also, I'm not sure we need a Map - take a look and see if you can get away with just a List<byte[]> where they're ordered. See my comments in ParallelIterators - I believe well basically just use all of them, all the time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16764530
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -1594,4 +1775,46 @@ public boolean filterRowKey(byte[] buffer, int offset, int length) {
                 return cmp != 0;
             }
         }
    +    /**
    +     * A thread (Chore) that keeps reading the updated stats and invalidates the Meta cache incase there is 
    +     * a change in the stats
    +     */
    +    private static class StatisticsUpdator extends Chore {
    --- End diff --
    
    Done this. Introduce a new method in MEtaDataService.proto that invalidates the cache for a specific table key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16694672
  
    --- Diff: phoenix-core/src/main/antlr3/PhoenixSQL.g ---
    @@ -104,12 +102,13 @@ tokens
         MINVALUE='minvalue';
         MAXVALUE='maxvalue';
         CYCLE='cycle';
    +    ANALYZE='analyze';
    +    UPDATE_STATISTICS='update_statistics';
    --- End diff --
    
    I don't think we need to support both - let's just stick with ANALYZE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695198
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java ---
    @@ -84,7 +84,7 @@
         public static final int DEFAULT_TARGET_QUERY_CONCURRENCY = 32;
         public static final int DEFAULT_MAX_QUERY_CONCURRENCY = 64;
         public static final String DEFAULT_DATE_FORMAT = DateUtil.DEFAULT_DATE_FORMAT;
    -    public static final int DEFAULT_STATS_UPDATE_FREQ_MS = 15 * 60000; // 15min
    +    public static final int DEFAULT_STATS_UPDATE_FREQ_MS = 500; // 15min
    --- End diff --
    
    Change back to 15mins? If you want a different one for test purposes, see QueryServicesTestOptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16730944
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    Thanks, Ram. Yep, that's my fault - I thought we'd need these min/max keys, but turns out we don't. Just want to keep things as simple as we can.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16765207
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/PTable.java ---
    @@ -316,4 +316,6 @@ public static LinkType fromSerializedValue(byte serializedValue) {
         
         int getEstimatedSize();
         IndexType getIndexType();
    +    
    +    void setPTableStats(PTableStats stats);
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16765176
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    +        HTableInterface ht = null;
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getTenantId() != null) {
    +            tenIdBytes = connection.getTenantId().getBytes();
    +        }
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getSchema() != null) {
    +            schNameInBytes = Bytes.toBytes(connection.getSchema());
    +        }
    +        final byte[] tenantIdBytes = tenIdBytes;
    +        final byte[] schemaNameInBytes = schNameInBytes;
    +        try {
    +            ht = connection.getQueryServices().getTable(Bytes.toBytes(tableName));
    +            Batch.Call<StatCollectService, StatCollectResponse> callable = new Batch.Call<StatCollectService, StatCollectResponse>() {
    +                ServerRpcController controller = new ServerRpcController();
    +                BlockingRpcCallback<StatCollectResponse> rpcCallback = new BlockingRpcCallback<StatCollectResponse>();
    +
    +                @Override
    +                public StatCollectResponse call(StatCollectService service) throws IOException {
    +                    StatCollectRequest.Builder builder = StatCollectRequest.newBuilder();
    +                    builder.setTenantIdBytes(HBaseZeroCopyByteString.wrap(tenantIdBytes));
    +                    builder.setSchemaNameBytes(HBaseZeroCopyByteString.wrap(schemaNameInBytes));
    +                    service.collectStat(controller, builder.build(), rpcCallback);
    +                    if (controller.getFailedOn() != null) { throw controller.getFailedOn(); }
    +                    return rpcCallback.get();
    +                }
    +            };
    +            Map<byte[], StatCollectResponse> result = ht.coprocessorService(StatCollectService.class,
    +                    HConstants.EMPTY_BYTE_ARRAY, HConstants.EMPTY_BYTE_ARRAY, callable);
    +            StatCollectResponse next = result.values().iterator().next();
    +            long rowsScanned = next.getRowsScanned();
    +            return new MutationState((int)rowsScanned, connection);
    +        } catch (ServiceException e) {
    +            throw new SQLException("Unable to update the statistics for the table " + tableName, e);
    +        } catch (TableNotFoundException e) {
    +            throw new SQLException("Table not found " + tableName, e);
    --- End diff --
    
    Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16726785
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    Actually, I don't think we'll even need minKey/maxKey at this point. Just the guideposts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16728896
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
        Actually, I don't think we'll even need minKey/maxKey at this point. Just the guideposts
    Okie.  I will keep a map for the min key and max key at least in the impl level (incase we need to use it later )and a list for the guide posts then.  So the API will return a map for the min key and max key and list for the guide posts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16977664
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
                    if (table != null && table.getTimeStamp() < clientTimeStamp) {
                    return table;
                }
    
    Should the above condition be != in MetaDataEndpointImpl in doGetTable after we build the table that is from 0 to LATEST_TIMESTAMP?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16720778
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -228,10 +271,10 @@ public boolean apply(HRegionLocation location) {
     
         @Override
         public int getSplitsPerRegion(int numRegions) {
    --- End diff --
    
    We won't need this anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16793270
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
    --- End diff --
    
    Yes, correct - add a last_analyze_start_date or something like that in your system.stats table. Then in your coprocessor, before kicking off the stats collection, check the current date against this date and have it be a NOOP if less than the min time (based on config parameter).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695226
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    +        HTableInterface ht = null;
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getTenantId() != null) {
    +            tenIdBytes = connection.getTenantId().getBytes();
    +        }
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getSchema() != null) {
    +            schNameInBytes = Bytes.toBytes(connection.getSchema());
    +        }
    +        final byte[] tenantIdBytes = tenIdBytes;
    +        final byte[] schemaNameInBytes = schNameInBytes;
    +        try {
    +            ht = connection.getQueryServices().getTable(Bytes.toBytes(tableName));
    +            Batch.Call<StatCollectService, StatCollectResponse> callable = new Batch.Call<StatCollectService, StatCollectResponse>() {
    +                ServerRpcController controller = new ServerRpcController();
    +                BlockingRpcCallback<StatCollectResponse> rpcCallback = new BlockingRpcCallback<StatCollectResponse>();
    +
    +                @Override
    +                public StatCollectResponse call(StatCollectService service) throws IOException {
    +                    StatCollectRequest.Builder builder = StatCollectRequest.newBuilder();
    +                    builder.setTenantIdBytes(HBaseZeroCopyByteString.wrap(tenantIdBytes));
    +                    builder.setSchemaNameBytes(HBaseZeroCopyByteString.wrap(schemaNameInBytes));
    +                    service.collectStat(controller, builder.build(), rpcCallback);
    +                    if (controller.getFailedOn() != null) { throw controller.getFailedOn(); }
    +                    return rpcCallback.get();
    +                }
    +            };
    +            Map<byte[], StatCollectResponse> result = ht.coprocessorService(StatCollectService.class,
    +                    HConstants.EMPTY_BYTE_ARRAY, HConstants.EMPTY_BYTE_ARRAY, callable);
    +            StatCollectResponse next = result.values().iterator().next();
    +            long rowsScanned = next.getRowsScanned();
    +            return new MutationState((int)rowsScanned, connection);
    +        } catch (ServiceException e) {
    +            throw new SQLException("Unable to update the statistics for the table " + tableName, e);
    +        } catch (TableNotFoundException e) {
    +            throw new SQLException("Table not found " + tableName, e);
    --- End diff --
    
    Throw Phoenix TableNotFoundException here instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16971170
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    +        try {
    +            PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes));
    +            return stats;
    +        } finally {
    +            lock.readLock().unlock();
    +        }
    +    }
    +    
    +    private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env)
    --- End diff --
    
    That's right. It is the best way to do it. One more thing is inorder to retrive the latest time when the stats got updated so that based on that we don issue concurrent update stats query, should we do that in MetaDataClient with a select query or should be a normal get from the HTable? As a measure of good practice am asking this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16694781
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT
               } else {
                   addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null);
               }
    +        }        
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (tenantId != null) {
    +            tenIdBytes = tenantId.getBytes();
             }
    -
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (schemaName != null) {
    +            schNameInBytes = Bytes.toBytes(schemaName.getString());
    +        }
    +        PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes));
             return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, 
                 tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, 
                 indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, 
    -            multiTenant, viewType, viewIndexId, indexType);
    +            multiTenant, viewType, viewIndexId, indexType, stats);
         }
     
    +    private PTableStats updateStats(final byte[] tableNameBytes) {
    +        lock.readLock().lock();
    --- End diff --
    
    We wouldn't want to use a global read lock here - at most we want to have a lock with a name based on the table name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16695860
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    +        HTableInterface ht = null;
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getTenantId() != null) {
    +            tenIdBytes = connection.getTenantId().getBytes();
    +        }
    +        byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getSchema() != null) {
    +            schNameInBytes = Bytes.toBytes(connection.getSchema());
    +        }
    +        final byte[] tenantIdBytes = tenIdBytes;
    +        final byte[] schemaNameInBytes = schNameInBytes;
    +        try {
    +            ht = connection.getQueryServices().getTable(Bytes.toBytes(tableName));
    --- End diff --
    
    Create an ConnectionQueryServices.updateStats(String tenantId, String schemaName, String tableName) method and move this code to ConnectionQueryServicesImpl. The idea is that all HBase code is encapsulated in that class. You can make the one in ConnectionlessQueryServicesImpl be a noop. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16943000
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    Ok.. So in that case the stats wil be associated with the table directly. For now I will first finish the case so that the PTable has a PColumnFamily and the stats (guidePosts) are part of this PcolumnFamily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on the pull request:

    https://github.com/apache/phoenix/pull/8#issuecomment-54014739
  
    @JamesRTaylor - Thanks for the review.  I will do my level best to complete all the comments.  Some of them have now become design changes because now we will be collecting based on CF rather than region name. So the guide posts will be a map with key as CF and byte[] (representing the guideposts). It was a long weekend here and so could not take this up.  Will post an updated patch ASAP may be in a day or two.  After that will do performance testing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699454
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/PTableStatsImpl.java ---
    @@ -16,40 +16,249 @@
      * limitations under the License.
      */
     package org.apache.phoenix.schema.stat;
    -
    + import java.util.Collection;
    +import java.util.HashMap;
     import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     import org.apache.hadoop.hbase.HRegionInfo;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
     
     import com.google.common.collect.ImmutableMap;
    + 
    + /**
    +  * Implementation for PTableStats.
    +  */
    + public class PTableStatsImpl implements PTableStats {
    + 
    +   public static final PTableStats NO_STATS = new PTableStatsImpl();
    +   // The map for guide posts should be immutable. We only take the current
    +   // snapshot from outside
    +   // method call and store it.
    +   private Map<String, byte[]> regionGuidePosts = new HashMap<String, byte[]>();
    +   // TODO : see if a thread safe map would be needed here
    +   private Map<String, byte[]> minKey = new HashMap<String, byte[]>();
    +   private Map<String, byte[]> maxKey = new HashMap<String, byte[]>();
    + 
    +   public PTableStatsImpl() {
    +     this(new HashMap<String, byte[]>(), new HashMap<String, byte[]>(),
    +         new HashMap<String, byte[]>());
    +   }
    + 
    +   public PTableStatsImpl(Map<String, byte[]> guidePosts,
    +       Map<String, byte[]> minKey, Map<String, byte[]> maxKey) {
    +     this.regionGuidePosts = guidePosts;
    +     this.minKey = minKey;
    +     this.maxKey = maxKey;
    +   }
    + 
    +   @Override
    +   public byte[] getRegionGuidePostsOfARegion(HRegionInfo region) {
    +     return regionGuidePosts.get(Bytes.toString(region.getRegionName()));
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getGuidePosts() {
    +     if (regionGuidePosts != null) {
    +       return ImmutableMap.copyOf(regionGuidePosts);
    +     }
    + 
    +     return null;
    +   }
    + 
    +   @Override
    +   public byte[] getMaxKeyOfARegion(HRegionInfo region) {
    +     return Bytes.copy(this.minKey.get(Bytes.toString(region.getRegionName())));
    +   }
    + 
    +   @Override
    +   public byte[] getMinKeyOfARegion(HRegionInfo region) {
    +     return Bytes.copy(this.maxKey.get(Bytes.toString(region.getRegionName())));
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getMaxKey() {
    +     if (maxKey != null) {
    +       return ImmutableMap.copyOf(maxKey);
    +     }
    +     return null;
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getMinKey() {
    +     if (minKey != null) {
    +       return ImmutableMap.copyOf(minKey);
    +     }
    +     return null;
    +   }
    + 
    +   @Override
    +   public void setMinKey(HRegionInfo region, byte[] minKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(minKey, offset, length);
    +     this.minKey.put(region.getRegionNameAsString(), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMaxKey(HRegionInfo region, byte[] maxKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(maxKey, offset, length);
    +     this.maxKey.put(region.getRegionNameAsString(), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setGuidePosts(HRegionInfo region, byte[] guidePosts, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(guidePosts, offset, length);
    +     this.regionGuidePosts.put(region.getRegionNameAsString(),
    +         ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMinKey(byte[] region, byte[] minKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(minKey, offset, length);
    +     this.minKey.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMaxKey(byte[] region, byte[] maxKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(maxKey, offset, length);
    +     this.maxKey.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setGuidePosts(byte[] region, byte[] guidePosts, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(guidePosts, offset, length);
    +     this.regionGuidePosts.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public boolean equals(Object obj) {
    +     if (!(obj instanceof PTableStats)) {
    +       return false;
    +     }
    +     PTableStats pTableStats = (PTableStats) obj;
    +     // compare the max keys
    +     Map<String, byte[]> thatMaxKey = pTableStats.getMaxKey();
    +     Map<String, byte[]> thisMaxKey = this.getMaxKey();
    +     if (thatMaxKey.size() != thisMaxKey.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatMaxKey.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisMaxKey.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    + 
    +     // Check for min key
    +     Map<String, byte[]> thatMinKey = pTableStats.getMinKey();
    +     Map<String, byte[]> thisMinKey = this.getMinKey();
    +     if (thatMinKey.size() != thisMinKey.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatMinKey.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisMinKey.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    + 
    +     // Check for guide posts
    +     Map<String, byte[]> thatGuidePosts = pTableStats.getGuidePosts();
    +     Map<String, byte[]> thisGuidePosts = this.getGuidePosts();
    +     if (thatGuidePosts.size() != thisGuidePosts.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatGuidePosts.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisGuidePosts.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    +     return true;
    +   }
     
    -
    -/**
    - * Implementation for PTableStats.
    - */
    -public class PTableStatsImpl implements PTableStats {
    -
    -    // The map for guide posts should be immutable. We only take the current snapshot from outside
    -    // method call and store it.
    -    private Map<String, byte[][]> regionGuidePosts;
    -
    -    public PTableStatsImpl() { }
    -
    -    public PTableStatsImpl(Map<String, byte[][]> stats) {
    -        regionGuidePosts = ImmutableMap.copyOf(stats);
    -    }
    -
    -    @Override
    -    public byte[][] getRegionGuidePosts(HRegionInfo region) {
    -        return regionGuidePosts.get(region.getRegionNameAsString());
    -    }
    -
    -    @Override
    -    public Map<String, byte[][]> getGuidePosts(){
    -      if(regionGuidePosts != null) {
    -        return ImmutableMap.copyOf(regionGuidePosts);
    -      }
    -      
    -      return null;
    -    }
    +   @Override
    +   public int hashCode() {
    +     int hash = 1;
    --- End diff --
    
    Ya I too felt it was expensive.  But is that ok if we just depend on the size of the collected stats list?  If you see here once there is a change in size of the stats we just consider that something has changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16699479
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java ---
    @@ -1594,4 +1775,46 @@ public boolean filterRowKey(byte[] buffer, int offset, int length) {
                 return cmp != 0;
             }
         }
    +    /**
    +     * A thread (Chore) that keeps reading the updated stats and invalidates the Meta cache incase there is 
    +     * a change in the stats
    +     */
    +    private static class StatisticsUpdator extends Chore {
    --- End diff --
    
    See above caveat for aging out the PTable from the cache. Also, the manual call to ANALYZE will invalidate the table after the stats have been updated, so the next select would get the new stats.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16794419
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java ---
    @@ -461,6 +471,57 @@ public MutationState createTable(CreateTableStatement statement, byte[][] splits
             MutationPlan plan = compiler.compile(Collections.singletonList(tableRef), emptyCF, null, null, tableRef.getTimeStamp());
             return connection.getQueryServices().updateData(plan);
         }
    +    
    +    public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException {
    +        String tableName = updateStatisticsStmt.getTable().getName().getTableName();
    +        HTableInterface ht = null;
    +        byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY;
    +        if (connection.getTenantId() != null) {
    +            tenIdBytes = connection.getTenantId().getBytes();
    --- End diff --
    
    That's fine, but just make sure you always get the physical name from PTable so this won't break if an attempt is made to analyze a view.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16698623
  
    --- Diff: phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseParallelIteratorsRegionSplitterIT.java ---
    @@ -86,10 +87,14 @@ protected void initTableValues(long ts) throws Exception {
             conn.close();
         }
     
    -    protected static TableRef getTableRef(Connection conn, long ts) throws SQLException {
    +    protected static TableRef getTableRef(Connection conn, long ts, PTable pTable) throws SQLException {
             PhoenixConnection pconn = conn.unwrap(PhoenixConnection.class);
    -        TableRef table = new TableRef(null,pconn.getMetaDataCache().getTable(new PTableKey(pconn.getTenantId(), STABLE_NAME)),ts, false);
    -        return table;
    +        if (pTable == null) {
    --- End diff --
    
    May be not. will try removing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696763
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsCollector.java ---
    @@ -0,0 +1,295 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stat;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.Coprocessor;
    +import org.apache.hadoop.hbase.CoprocessorEnvironment;
    +import org.apache.hadoop.hbase.HTableDescriptor;
    +import org.apache.hadoop.hbase.KeyValue;
    +import org.apache.hadoop.hbase.KeyValueUtil;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
    +import org.apache.hadoop.hbase.coprocessor.CoprocessorException;
    +import org.apache.hadoop.hbase.coprocessor.CoprocessorService;
    +import org.apache.hadoop.hbase.coprocessor.ObserverContext;
    +import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
    +import org.apache.hadoop.hbase.protobuf.ResponseConverter;
    +import org.apache.hadoop.hbase.regionserver.HRegion;
    +import org.apache.hadoop.hbase.regionserver.InternalScanner;
    +import org.apache.hadoop.hbase.regionserver.KeyValueScanner;
    +import org.apache.hadoop.hbase.regionserver.RegionScanner;
    +import org.apache.hadoop.hbase.regionserver.ScanType;
    +import org.apache.hadoop.hbase.regionserver.Store;
    +import org.apache.hadoop.hbase.regionserver.StoreScanner;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectRequest;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectResponse;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectResponse.Builder;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectService;
    +import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
    +import org.apache.phoenix.schema.PDataType;
    +import org.apache.phoenix.schema.PhoenixArray;
    +import org.apache.phoenix.util.SchemaUtil;
    +
    +import com.google.protobuf.RpcCallback;
    +import com.google.protobuf.RpcController;
    +import com.google.protobuf.Service;
    +
    +/**
    + * An endpoint implementation that allows to collect the stats for a given
    + * region and groups the stat per family. This is also an RegionObserver that
    + * collects the information on compaction also. The user would be allowed to
    + * invoke this endpoint and thus populate the Phoenix stats table with the max
    + * key, min key and guide posts for the given region. The stats can be consumed
    + * by the stats associated with every PTable and the same can be used to
    + * parallelize the queries
    + */
    +public class StatisticsCollector extends BaseRegionObserver implements CoprocessorService,
    +    Coprocessor, StatisticsTracker, StatCollectService.Interface {
    +
    +  public static void addToTable(HTableDescriptor desc) throws IOException {
    +    desc.addCoprocessor(StatisticsCollector.class.getName());
    +  }
    +
    +  private byte[] min;
    +  private byte[] max;
    +  private long guidepostDepth;
    +  private long byteCount = 0;
    +  private List<byte[]> guidePosts = new ArrayList<byte[]>();
    +  private Set<byte[]> familyMap = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
    +  private RegionCoprocessorEnvironment env;
    +  protected StatisticsTable stats;
    +  private static final Log LOG = LogFactory.getLog(StatCollectService.class);
    +
    +  @Override
    +  public void collectStat(RpcController controller, StatCollectRequest request,
    +      RpcCallback<StatCollectResponse> done) {
    +    HRegion region = env.getRegion();
    +    if (familyMap != null) {
    +      familyMap.clear();
    +    }
    +    Scan scan = new Scan();
    +    // We should have a mechanism to set the caching here.
    +    // TODO : Should the scan object be coming from the endpoint?
    +    scan.setCaching(1000);
    +    // TODO : Should we scan fully or should we do as done StatsManagerImpl
    +    // reading the
    +    // first row and the
    +    // last row after setting scan.setReserved(true). But doing so we cannot get
    +    // the guideposts
    +    RegionScanner scanner = null;
    +    int count = 0;
    +    try {
    +      scanner = region.getScanner(scan);
    +      List<Cell> results = new ArrayList<Cell>();
    +      boolean hasMore = true;
    +      while (hasMore) {
    +        // Am getting duplicates here?  Need to avoid that
    +        hasMore = scanner.next(results);
    +        updateStat(results);
    +        count += results.size();
    +        results.clear();
    +        while (!hasMore) {
    +          break;
    +        }
    +      }
    +    } catch (IOException e) {
    +      LOG.error(e);
    +      ResponseConverter.setControllerException(controller, e);
    +    } finally {
    +      if (scanner != null) {
    +        try {
    +          scanner.close();
    +          IOException toThrow = null;
    +          try {
    +            // update the statistics table
    +            for (byte[] fam : familyMap) {
    +              byte[] tableKey = SchemaUtil.getTableKey(request.getTenantIdBytes().toByteArray(),
    +                  request.getSchemaNameBytes().toByteArray(), region.getRegionInfo().getTable()
    +                      .getName());
    +              stats.updateStats(new StatisticsWriter(tableKey,
    +                  Bytes.toBytes(region.getRegionInfo().getRegionNameAsString())), this, fam);
    +            }
    +          } catch (IOException e) {
    +            LOG.error("Failed to update statistics table!", e);
    +            toThrow = e;
    +          }
    +        } catch (IOException e) {
    +          LOG.error(e);
    +          ResponseConverter.setControllerException(controller, e);
    +        }
    +      }
    +    }
    +    Builder newBuilder = StatCollectResponse.newBuilder();
    +    newBuilder.setRowsScanned(count);
    +    StatCollectResponse result = newBuilder.build();
    +    done.run(result);
    +  }
    +
    +  /**
    +   * Update the current statistics based on the lastest batch of key-values from
    +   * the underlying scanner
    +   * 
    +   * @param results
    +   *          next batch of {@link KeyValue}s
    +   */
    +  protected void updateStat(final List<Cell> results) {
    +    for (Cell c : results) {
    +      updateStatistic(KeyValueUtil.ensureKeyValue(c));
    +    }
    +  }
    +
    +  @Override
    +  public Service getService() {
    +    return StatCollectorProtos.StatCollectService.newReflectiveService(this);
    +  }
    +
    +  @Override
    +  public void start(CoprocessorEnvironment env) throws IOException {
    +    if (env instanceof RegionCoprocessorEnvironment) {
    +      this.env = (RegionCoprocessorEnvironment) env;
    +    } else {
    +      throw new CoprocessorException("Must be loaded on a table region!");
    +    }
    +    HTableDescriptor desc = ((RegionCoprocessorEnvironment) env).getRegion().getTableDesc();
    +    // Get the stats table associated with the current table on which the CP is
    +    // triggered
    +    stats = StatisticsTable.getStatisticsTableForCoprocessor(env, desc.getName());
    +    guidepostDepth = env.getConfiguration().getLong(
    +        StatisticsConstants.HISTOGRAM_BYTE_DEPTH_CONF_KEY,
    +        StatisticsConstants.HISTOGRAM_DEFAULT_BYTE_DEPTH);
    +  }
    +
    +  @Override
    +  public void stop(CoprocessorEnvironment arg0) throws IOException {
    +    stats.close();
    +  }
    +
    +  @Override
    +  public InternalScanner preCompactScannerOpen(ObserverContext<RegionCoprocessorEnvironment> c,
    +      Store store, List<? extends KeyValueScanner> scanners, ScanType scanType, long earliestPutTs,
    +      InternalScanner s) throws IOException {
    +    // TODO : Major concern here is that we cannot get the complete name of the table with schema and tenantId here.  
    --- End diff --
    
    The schemaName and tableName are derived from the HTable name which you can get from the RegionCoprocessorEnvironment. We have util methods in SchemaUtil that'll give you back the schemaName and tableName given the byte[] or String of a full HTable name. Is this still an issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16765332
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsCollector.java ---
    @@ -0,0 +1,295 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.phoenix.schema.stat;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hbase.Cell;
    +import org.apache.hadoop.hbase.Coprocessor;
    +import org.apache.hadoop.hbase.CoprocessorEnvironment;
    +import org.apache.hadoop.hbase.HTableDescriptor;
    +import org.apache.hadoop.hbase.KeyValue;
    +import org.apache.hadoop.hbase.KeyValueUtil;
    +import org.apache.hadoop.hbase.client.Scan;
    +import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
    +import org.apache.hadoop.hbase.coprocessor.CoprocessorException;
    +import org.apache.hadoop.hbase.coprocessor.CoprocessorService;
    +import org.apache.hadoop.hbase.coprocessor.ObserverContext;
    +import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
    +import org.apache.hadoop.hbase.protobuf.ResponseConverter;
    +import org.apache.hadoop.hbase.regionserver.HRegion;
    +import org.apache.hadoop.hbase.regionserver.InternalScanner;
    +import org.apache.hadoop.hbase.regionserver.KeyValueScanner;
    +import org.apache.hadoop.hbase.regionserver.RegionScanner;
    +import org.apache.hadoop.hbase.regionserver.ScanType;
    +import org.apache.hadoop.hbase.regionserver.Store;
    +import org.apache.hadoop.hbase.regionserver.StoreScanner;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectRequest;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectResponse;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectResponse.Builder;
    +import org.apache.phoenix.coprocessor.generated.StatCollectorProtos.StatCollectService;
    +import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
    +import org.apache.phoenix.schema.PDataType;
    +import org.apache.phoenix.schema.PhoenixArray;
    +import org.apache.phoenix.util.SchemaUtil;
    +
    +import com.google.protobuf.RpcCallback;
    +import com.google.protobuf.RpcController;
    +import com.google.protobuf.Service;
    +
    +/**
    + * An endpoint implementation that allows to collect the stats for a given
    + * region and groups the stat per family. This is also an RegionObserver that
    + * collects the information on compaction also. The user would be allowed to
    + * invoke this endpoint and thus populate the Phoenix stats table with the max
    + * key, min key and guide posts for the given region. The stats can be consumed
    + * by the stats associated with every PTable and the same can be used to
    + * parallelize the queries
    + */
    +public class StatisticsCollector extends BaseRegionObserver implements CoprocessorService,
    +    Coprocessor, StatisticsTracker, StatCollectService.Interface {
    +
    +  public static void addToTable(HTableDescriptor desc) throws IOException {
    +    desc.addCoprocessor(StatisticsCollector.class.getName());
    +  }
    +
    +  private byte[] min;
    +  private byte[] max;
    +  private long guidepostDepth;
    +  private long byteCount = 0;
    +  private List<byte[]> guidePosts = new ArrayList<byte[]>();
    +  private Set<byte[]> familyMap = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
    +  private RegionCoprocessorEnvironment env;
    +  protected StatisticsTable stats;
    +  private static final Log LOG = LogFactory.getLog(StatCollectService.class);
    +
    +  @Override
    +  public void collectStat(RpcController controller, StatCollectRequest request,
    +      RpcCallback<StatCollectResponse> done) {
    +    HRegion region = env.getRegion();
    +    if (familyMap != null) {
    +      familyMap.clear();
    +    }
    +    Scan scan = new Scan();
    +    // We should have a mechanism to set the caching here.
    +    // TODO : Should the scan object be coming from the endpoint?
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16696186
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/schema/stat/PTableStatsImpl.java ---
    @@ -16,40 +16,249 @@
      * limitations under the License.
      */
     package org.apache.phoenix.schema.stat;
    -
    + import java.util.Collection;
    +import java.util.HashMap;
     import java.util.Map;
    +import java.util.Map.Entry;
    +import java.util.Set;
     
     import org.apache.hadoop.hbase.HRegionInfo;
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.phoenix.hbase.index.util.ImmutableBytesPtr;
     
     import com.google.common.collect.ImmutableMap;
    + 
    + /**
    +  * Implementation for PTableStats.
    +  */
    + public class PTableStatsImpl implements PTableStats {
    + 
    +   public static final PTableStats NO_STATS = new PTableStatsImpl();
    +   // The map for guide posts should be immutable. We only take the current
    +   // snapshot from outside
    +   // method call and store it.
    +   private Map<String, byte[]> regionGuidePosts = new HashMap<String, byte[]>();
    +   // TODO : see if a thread safe map would be needed here
    +   private Map<String, byte[]> minKey = new HashMap<String, byte[]>();
    +   private Map<String, byte[]> maxKey = new HashMap<String, byte[]>();
    + 
    +   public PTableStatsImpl() {
    +     this(new HashMap<String, byte[]>(), new HashMap<String, byte[]>(),
    +         new HashMap<String, byte[]>());
    +   }
    + 
    +   public PTableStatsImpl(Map<String, byte[]> guidePosts,
    +       Map<String, byte[]> minKey, Map<String, byte[]> maxKey) {
    +     this.regionGuidePosts = guidePosts;
    +     this.minKey = minKey;
    +     this.maxKey = maxKey;
    +   }
    + 
    +   @Override
    +   public byte[] getRegionGuidePostsOfARegion(HRegionInfo region) {
    +     return regionGuidePosts.get(Bytes.toString(region.getRegionName()));
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getGuidePosts() {
    +     if (regionGuidePosts != null) {
    +       return ImmutableMap.copyOf(regionGuidePosts);
    +     }
    + 
    +     return null;
    +   }
    + 
    +   @Override
    +   public byte[] getMaxKeyOfARegion(HRegionInfo region) {
    +     return Bytes.copy(this.minKey.get(Bytes.toString(region.getRegionName())));
    +   }
    + 
    +   @Override
    +   public byte[] getMinKeyOfARegion(HRegionInfo region) {
    +     return Bytes.copy(this.maxKey.get(Bytes.toString(region.getRegionName())));
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getMaxKey() {
    +     if (maxKey != null) {
    +       return ImmutableMap.copyOf(maxKey);
    +     }
    +     return null;
    +   }
    + 
    +   @Override
    +   public Map<String, byte[]> getMinKey() {
    +     if (minKey != null) {
    +       return ImmutableMap.copyOf(minKey);
    +     }
    +     return null;
    +   }
    + 
    +   @Override
    +   public void setMinKey(HRegionInfo region, byte[] minKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(minKey, offset, length);
    +     this.minKey.put(region.getRegionNameAsString(), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMaxKey(HRegionInfo region, byte[] maxKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(maxKey, offset, length);
    +     this.maxKey.put(region.getRegionNameAsString(), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setGuidePosts(HRegionInfo region, byte[] guidePosts, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(guidePosts, offset, length);
    +     this.regionGuidePosts.put(region.getRegionNameAsString(),
    +         ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMinKey(byte[] region, byte[] minKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(minKey, offset, length);
    +     this.minKey.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setMaxKey(byte[] region, byte[] maxKey, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(maxKey, offset, length);
    +     this.maxKey.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public void setGuidePosts(byte[] region, byte[] guidePosts, int offset, int length) {
    +     ImmutableBytesPtr ptr = new ImmutableBytesPtr(guidePosts, offset, length);
    +     this.regionGuidePosts.put(Bytes.toString(region), ImmutableBytesPtr.copyBytesIfNecessary(ptr));
    +   }
    + 
    +   @Override
    +   public boolean equals(Object obj) {
    +     if (!(obj instanceof PTableStats)) {
    +       return false;
    +     }
    +     PTableStats pTableStats = (PTableStats) obj;
    +     // compare the max keys
    +     Map<String, byte[]> thatMaxKey = pTableStats.getMaxKey();
    +     Map<String, byte[]> thisMaxKey = this.getMaxKey();
    +     if (thatMaxKey.size() != thisMaxKey.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatMaxKey.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisMaxKey.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    + 
    +     // Check for min key
    +     Map<String, byte[]> thatMinKey = pTableStats.getMinKey();
    +     Map<String, byte[]> thisMinKey = this.getMinKey();
    +     if (thatMinKey.size() != thisMinKey.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatMinKey.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisMinKey.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    + 
    +     // Check for guide posts
    +     Map<String, byte[]> thatGuidePosts = pTableStats.getGuidePosts();
    +     Map<String, byte[]> thisGuidePosts = this.getGuidePosts();
    +     if (thatGuidePosts.size() != thisGuidePosts.size()) {
    +       return false;
    +     } else {
    +       Set<Entry<String, byte[]>> thatEntrySet = thatGuidePosts.entrySet();
    +       Set<Entry<String, byte[]>> thisEntrySet = thisGuidePosts.entrySet();
    +       for (Entry<String, byte[]> thatEntry : thatEntrySet) {
    +         boolean matchFound = false;
    +         Entry<String, byte[]> temp = null;
    +         for (Entry<String, byte[]> thisEntry : thisEntrySet) {
    +           if (thatEntry.getKey().equals(thisEntry.getKey())) {
    +             matchFound = true;
    +             temp = thisEntry;
    +             break;
    +           }
    +         }
    +         if (!matchFound) {
    +           return false;
    +         } else {
    +           if (temp != null) {
    +             if (!Bytes.equals(thatEntry.getValue(), temp.getValue())) {
    +               return false;
    +             }
    +           } else {
    +             return false;
    +           }
    +         }
    + 
    +       }
    +     }
    +     return true;
    +   }
     
    -
    -/**
    - * Implementation for PTableStats.
    - */
    -public class PTableStatsImpl implements PTableStats {
    -
    -    // The map for guide posts should be immutable. We only take the current snapshot from outside
    -    // method call and store it.
    -    private Map<String, byte[][]> regionGuidePosts;
    -
    -    public PTableStatsImpl() { }
    -
    -    public PTableStatsImpl(Map<String, byte[][]> stats) {
    -        regionGuidePosts = ImmutableMap.copyOf(stats);
    -    }
    -
    -    @Override
    -    public byte[][] getRegionGuidePosts(HRegionInfo region) {
    -        return regionGuidePosts.get(region.getRegionNameAsString());
    -    }
    -
    -    @Override
    -    public Map<String, byte[][]> getGuidePosts(){
    -      if(regionGuidePosts != null) {
    -        return ImmutableMap.copyOf(regionGuidePosts);
    -      }
    -      
    -      return null;
    -    }
    +   @Override
    +   public int hashCode() {
    +     int hash = 1;
    --- End diff --
    
    Looks somewhat expensive to calculate. PTableStats should be immutable, so instead calculate this in a static method and set during construction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r16720836
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -228,10 +271,10 @@ public boolean apply(HRegionLocation location) {
     
         @Override
         public int getSplitsPerRegion(int numRegions) {
    --- End diff --
    
    The only caveat here is that for local indexes we shouldn't ever fall back on Bytes.split(), but if we get rid of that here, we'll need no special case for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r17067688
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    Oh you have suggested the same James :). Refreshing the git page.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: Phoenix 180

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/8#discussion_r17063004
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java ---
    @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
             //    split each region in s splits such that:
             //    s = max(x) where s * x < t
             //
    -        // The idea is to align splits with region boundaries. If rows are not evenly
    -        // distributed across regions, using this scheme compensates for regions that
    -        // have more rows than others, by applying tighter splits and therefore spawning
    -        // off more scans over the overloaded regions.
    -        int splitsPerRegion = getSplitsPerRegion(regions.size());
             // Create a multi-map of ServerName to List<KeyRange> which we'll use to round robin from to ensure
             // that we keep each region server busy for each query.
    -        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
    +        int splitsPerRegion = getSplitsPerRegion(regions.size());
    +        ListMultimap<HRegionLocation,KeyRange> keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
             if (splitsPerRegion == 1) {
                 for (HRegionLocation region : regions) {
    --- End diff --
    
    @JamesRTaylor - So for a single guide post it will be from empty byte array to the guide post and from that guidepost to the last empty byte array.  Now when I intersect this with the available regions then it becomes different and thus not able to use that split. I will check once again and get back on that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---