You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2014/10/08 07:42:34 UTC

[jira] [Created] (PHOENIX-1333) Store statistics guideposts as VARBINARY

James Taylor created PHOENIX-1333:
-------------------------------------

             Summary: Store statistics guideposts as VARBINARY
                 Key: PHOENIX-1333
                 URL: https://issues.apache.org/jira/browse/PHOENIX-1333
             Project: Phoenix
          Issue Type: Bug
            Reporter: James Taylor
            Assignee: ramkrishna.s.vasudevan
            Priority: Critical


There's a potential problem with storing the guideposts as a VARBINARY ARRAY, as pointed out by PHOENIX-1329. We'd run into this issue if we're collecting stats for a table with a trailing VARBINARY row key column if the value contained embedded null bytes. Because of this, we're better off storing guideposts as VARBINARY and serializing/deserializing in the following manner:
<byte length as vint><bytes><byte length as vint><bytes>...

We should also store as a separate KeyValue column the total number of guideposts. So the schema of SYSTEM.STATS would look like this now instead:
{code}
    public static final String CREATE_STATS_TABLE_METADATA = 
            "CREATE TABLE " + SYSTEM_CATALOG_SCHEMA + ".\"" + SYSTEM_STATS_TABLE + "\"(\n" +
            // PK columns
            PHYSICAL_NAME  + " VARCHAR NOT NULL," +
            COLUMN_FAMILY + " VARCHAR," +
            REGION_NAME + " VARCHAR," +
            GUIDE_POSTS  + " VARBINARY," +
            GUIDE_POSTS_COUNT + " SMALLINT," +
            MIN_KEY + " VARBINARY," + 
            MAX_KEY + " VARBINARY," +
            LAST_STATS_UPDATE_TIME+ " DATE, "+
            "CONSTRAINT " + SYSTEM_TABLE_PK_NAME + " PRIMARY KEY ("
            + PHYSICAL_NAME + ","
            + COLUMN_FAMILY + ","+ REGION_NAME+"))\n" +
            // TODO: should we support versioned stats?
            // Install split policy to prevent a physical table's stats from being split across regions.
            HTableDescriptor.SPLIT_POLICY + "='" + MetaDataSplitPolicy.class.getName() + "'\n";
{code}

Then the serialization code in StatisticsTable.addStats() would need to change to populate the GUIDE_POSTS_COUNT and serialize the GUIDE_POSTS in the new format.

The deserialization code is isolated to StatisticsUtil.readStatisitics(). It would need to read the GUIDE_POSTS_COUNT first for estimated sizing, and then deserialize the GUIDE_POSTS in the new format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)