You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2014/10/08 08:46:34 UTC

[jira] [Commented] (PHOENIX-1333) Store statistics guideposts as VARBINARY

    [ https://issues.apache.org/jira/browse/PHOENIX-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163167#comment-14163167 ] 

James Taylor commented on PHOENIX-1333:
---------------------------------------

One more minor, but important addition to the SYSTEM.STATS schema: include a column that captures the value of guidepost width (i.e. phoenix.stats.guidepost.width) as a BIGINT. Probably easiest to capture this per region like we're doing with the other values. Make sure this get serialized into the PStats and makes it's way into the PTable and PColumnFamily as well. The reason is that the config value may change, but it's important that we capture what it was when we ran the stats (so we can use it for costing).

> Store statistics guideposts as VARBINARY
> ----------------------------------------
>
>                 Key: PHOENIX-1333
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1333
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>
> There's a potential problem with storing the guideposts as a VARBINARY ARRAY, as pointed out by PHOENIX-1329. We'd run into this issue if we're collecting stats for a table with a trailing VARBINARY row key column if the value contained embedded null bytes. Because of this, we're better off storing guideposts as VARBINARY and serializing/deserializing in the following manner:
> <byte length as vint><bytes><byte length as vint><bytes>...
> We should also store as a separate KeyValue column the total number of guideposts. So the schema of SYSTEM.STATS would look like this now instead:
> {code}
>     public static final String CREATE_STATS_TABLE_METADATA = 
>             "CREATE TABLE " + SYSTEM_CATALOG_SCHEMA + ".\"" + SYSTEM_STATS_TABLE + "\"(\n" +
>             // PK columns
>             PHYSICAL_NAME  + " VARCHAR NOT NULL," +
>             COLUMN_FAMILY + " VARCHAR," +
>             REGION_NAME + " VARCHAR," +
>             GUIDE_POSTS  + " VARBINARY," +
>             GUIDE_POSTS_COUNT + " SMALLINT," +
>             MIN_KEY + " VARBINARY," + 
>             MAX_KEY + " VARBINARY," +
>             LAST_STATS_UPDATE_TIME+ " DATE, "+
>             "CONSTRAINT " + SYSTEM_TABLE_PK_NAME + " PRIMARY KEY ("
>             + PHYSICAL_NAME + ","
>             + COLUMN_FAMILY + ","+ REGION_NAME+"))\n" +
>             // TODO: should we support versioned stats?
>             // Install split policy to prevent a physical table's stats from being split across regions.
>             HTableDescriptor.SPLIT_POLICY + "='" + MetaDataSplitPolicy.class.getName() + "'\n";
> {code}
> Then the serialization code in StatisticsTable.addStats() would need to change to populate the GUIDE_POSTS_COUNT and serialize the GUIDE_POSTS in the new format.
> The deserialization code is isolated to StatisticsUtil.readStatisitics(). It would need to read the GUIDE_POSTS_COUNT first for estimated sizing, and then deserialize the GUIDE_POSTS in the new format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)