You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by Maryann Xue <ma...@gmail.com> on 2016/02/12 20:23:07 UTC

Question about table stats

Hi,

This was something I noticed when applying Phoenix table stats into
Calcite-Phoenix cost calculation: When executing the following code (a
slightly modified version of the existing StatisticsUtil method) to scan
stats table for a specific column-family and a specific start/stop key
range, I got guidepost rows that did not contain the rowCount or byteCount
cell, for all rows in the specified range. Apparently, I had set the
corresponding columns in the Scan (as shown below). Meanwhile, another
range of stats in the same table gave me the right result. I am wondering
if this is an expected behavior or it is a bug?

    public static PTableStats readStatistics(HTableInterface statsHTable,

            byte[] tableNameBytes, ImmutableBytesPtr cf, byte[] startKey,
byte[] stopKey,

            long clientTimeStamp)

            throws IOException {

        ImmutableBytesWritable ptr = new ImmutableBytesWritable();

        Scan s;

        if (cf == null) {

            s = MetaDataUtil.newTableRowsScan(tableNameBytes,
MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);

        } else {

            s = MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
tableNameBytes, cf, false),

                    getAdjustedKey(stopKey, tableNameBytes, cf, true),
MetaDataProtocol.MIN_TABLE_TIMESTAMP,

                    clientTimeStamp);

        }

        s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);

        s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);

        s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
QueryConstants.EMPTY_COLUMN_BYTES);

        ResultScanner scanner = null;

        long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;

        TreeMap<byte[], GuidePostsInfoBuilder> guidePostsInfoWriterPerCf =
new TreeMap<byte[], GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);

        try {

            scanner = statsHTable.getScanner(s);

            Result result = null;

            while ((result = scanner.next()) != null) {

                CellScanner cellScanner = result.cellScanner();

                long rowCount = 0;

                long byteCount = 0;

                byte[] cfName = null;

                int tableNameLength;

                int cfOffset;

                int cfLength;

                boolean valuesSet = false;

                // Only the two cells with quals
GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be retrieved

                while (cellScanner.advance()) {

                    Cell current = cellScanner.current();

                    if (!valuesSet) {

                        tableNameLength = tableNameBytes.length + 1;

                        cfOffset = current.getRowOffset() + tableNameLength;

                        cfLength = getVarCharLength(current.getRowArray(),
cfOffset,

                                current.getRowLength() - tableNameLength);

                        ptr.set(current.getRowArray(), cfOffset, cfLength);

                        valuesSet = true;

                    }

                    cfName = ByteUtil.copyKeyBytesIfNecessary(ptr);

                    if (Bytes.equals(current.getQualifierArray(), current
.getQualifierOffset(),

                            current.getQualifierLength(),
PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,

                            PhoenixDatabaseMetaData.
GUIDE_POSTS_ROW_COUNT_BYTES.length)) {

                        rowCount = PLong.INSTANCE.getCodec().decodeLong(
current.getValueArray(),

                                current.getValueOffset(),
SortOrder.getDefault());

                    } else if (Bytes.equals(current.getQualifierArray(),
current.getQualifierOffset(),

                            current.getQualifierLength(),
PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,

                            PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
length)) {

                        byteCount = PLong.INSTANCE.getCodec().decodeLong(
current.getValueArray(),

                                current.getValueOffset(),
SortOrder.getDefault());

                    }

                    if (current.getTimestamp() > timeStamp) {

                        timeStamp = current.getTimestamp();

                    }

                }

                if (cfName != null) {

                    byte[] newGPStartKey = getGuidePostsInfoFromRowKey(
tableNameBytes, cfName, result.getRow());

                    GuidePostsInfoBuilder guidePostsInfoWriter =
guidePostsInfoWriterPerCf.get(cfName);

                    if (guidePostsInfoWriter == null) {

                        guidePostsInfoWriter = new GuidePostsInfoBuilder();

                        guidePostsInfoWriterPerCf.put(cfName,
guidePostsInfoWriter);

                    }

                    guidePostsInfoWriter.addGuidePosts(newGPStartKey,
byteCount, rowCount);

                }

            }

            if (!guidePostsInfoWriterPerCf.isEmpty()) { return new
PTableStatsImpl(

                    getGuidePostsPerCf(guidePostsInfoWriterPerCf), timeStamp);
}

        } finally {

            if (scanner != null) {

                scanner.close();

            }

        }

        return PTableStats.EMPTY_STATS;
    }

Re: Question about table stats

Posted by James Taylor <ja...@apache.org>.
Thanks, Ankit.

Maryann - another option would be to use the stats stored off of the PTable
to prevent a query. You could enhance the current mechanism in
BaseResultIterators that figures out the parallel scans by putting a bit
more structure around it. This information is created when you call
QueryPlan.iterator() - it could be refactored into a different method as
well if you think that's cleaner.

    James

On Mon, Feb 15, 2016 at 12:35 AM, Ankit Singhal <an...@gmail.com>
wrote:

> I think then we need to store rowCount and byteCount at guidePost level. so
> I have created a Jira(PHOENIX-2683) and uploaded a patch for the same.
>
> On Sat, Feb 13, 2016 at 11:50 PM, James Taylor <ja...@apache.org>
> wrote:
>
> > The GUIDE_POSTS_WIDTH and GUIDE_POSTS_ROW_COUNT should contain the number
> > of bytes and number of rows which were traversed since the last
> guidepost.
> > So given some start key and stop key from a scan and knowledge that a
> given
> > column family is used in a query, you should be able to run a query like
> > this:
> >
> > SELECT SUM(GUIDE_POSTS_WIDTH) bytes_traversed,
> >     SUM(GUIDE_POSTS_ROW_COUNT) rows_traversed
> > FROM SYSTEM.STATS
> > WHERE COLUMN_FAMILY = :1
> > AND GUIDE_POST_KEY >= :2
> > AND GUIDE_POST_KEY < :3
> >
> > where :1 is the column family, :2 is the start row of the scan, and :3 is
> > the stop row of the scan. The result of the query should tell you the
> > bytes_traversed and the rows_traversed with a granularity of the
> > phoenix.stats.guidepost.width config parameter.
> >
> > We could even run this across all column families being traversed based
> on
> > the which ones are referenced and projected into the scan. Or we could
> use
> > the "empty column family" (using SchemaUtil.getEmptyColumnFamily() as
> > Maryann mentioned) which is the one that is typically projected. FWIW,
> the
> > logic of which guideposts are used by a query is here:
> > BaseResultIterators.getGuidePosts().
> >
> > Make sense? Is that the way it's working? If not, let's file a JIRA
> please.
> >
> > Thanks,
> > James
> >
> > On Sat, Feb 13, 2016 at 10:15 AM, Maryann Xue <ma...@gmail.com>
> > wrote:
> >
> > > Thank you, Ankit! I see what you mean. But I think what I queried was
> the
> > > default CF. SchemaUtil.getEmptyColumnFamily(), is that correct? I'll
> try
> > to
> > > see if I can reproduce this.
> > >
> > > On Sat, Feb 13, 2016 at 8:07 AM, Ankit Singhal <
> ankitsinghal59@gmail.com
> > >
> > > wrote:
> > >
> > > > Yes James, Query is using guidePosts as per the cf used in filter.
> > > > But I think Maryann is expecting that rowcount and bytescount should
> be
> > > > available at each guidePost key level, which we currently don't
> store.
> > > > currently, we can use metrics(like rowcount/bytecount) at cf level
> only
> > > > right?
> > > >
> > > > On Sat, Feb 13, 2016 at 11:34 AM, James Taylor <
> jamestaylor@apache.org
> > >
> > > > wrote:
> > > >
> > > > > We should have separate guideposts per cf, as the data distribution
> > may
> > > > be
> > > > > different. We use the default cf if it's being filtered on, but
> > > otherwise
> > > > > use a different cf.
> > > > >
> > > > > Is that how it works currently, Ankit?
> > > > >
> > > > > On Friday, February 12, 2016, Ankit Singhal <
> > ankitsinghal59@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > but I think we need these metrics at cf only right as per this
> > > comment-
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779
> > > > > >
> > > > > >
> > > > > > that's why we serialize aggregated value of region at cf level in
> > > first
> > > > > > guide post only.
> > > > > >
> > > > > > Regards,
> > > > > > Ankit Singhal
> > > > > >
> > > > > > On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <
> > maryann.xue@gmail.com
> > > > > > <javascript:;>> wrote:
> > > > > >
> > > > > > > Thanks a lot for the answer, James! The data size has well
> > exceeded
> > > > the
> > > > > > > guidepost width and the guideposts do exist but without
> > > corresponding
> > > > > > > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query
> > > > instead
> > > > > > and
> > > > > > > confirm that it is a bug.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Maryann
> > > > > > >
> > > > > > > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <
> > > > jamestaylor@apache.org
> > > > > > <javascript:;>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Maryann,
> > > > > > > > If the amount of data in a region is less than the guidepost
> > > width,
> > > > > > then
> > > > > > > > it's possible you'd get no guideposts for that region. Do you
> > > think
> > > > > > > that's
> > > > > > > > the case? If not, it sound like there may be a bug.
> > > > > > > >
> > > > > > > > Assuming you're querying to get the stats information, I'd
> > > > recommend
> > > > > > > doing
> > > > > > > > a Phoenix query directly. The code you're emulating uses
> > straight
> > > > > HBase
> > > > > > > > APIs because it's called from the server-side. It'd be a one
> > > liner
> > > > > as a
> > > > > > > > Phoenix query.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > James
> > > > > > > >
> > > > > > > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <
> > > > maryann.xue@gmail.com
> > > > > > <javascript:;>>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > This was something I noticed when applying Phoenix table
> > stats
> > > > into
> > > > > > > > > Calcite-Phoenix cost calculation: When executing the
> > following
> > > > code
> > > > > > (a
> > > > > > > > > slightly modified version of the existing StatisticsUtil
> > > method)
> > > > to
> > > > > > > scan
> > > > > > > > > stats table for a specific column-family and a specific
> > > > start/stop
> > > > > > key
> > > > > > > > > range, I got guidepost rows that did not contain the
> rowCount
> > > or
> > > > > > > > byteCount
> > > > > > > > > cell, for all rows in the specified range. Apparently, I
> had
> > > set
> > > > > the
> > > > > > > > > corresponding columns in the Scan (as shown below).
> > Meanwhile,
> > > > > > another
> > > > > > > > > range of stats in the same table gave me the right result.
> I
> > am
> > > > > > > wondering
> > > > > > > > > if this is an expected behavior or it is a bug?
> > > > > > > > >
> > > > > > > > >     public static PTableStats
> readStatistics(HTableInterface
> > > > > > > statsHTable,
> > > > > > > > >
> > > > > > > > >             byte[] tableNameBytes, ImmutableBytesPtr cf,
> > byte[]
> > > > > > > startKey,
> > > > > > > > > byte[] stopKey,
> > > > > > > > >
> > > > > > > > >             long clientTimeStamp)
> > > > > > > > >
> > > > > > > > >             throws IOException {
> > > > > > > > >
> > > > > > > > >         ImmutableBytesWritable ptr = new
> > > > ImmutableBytesWritable();
> > > > > > > > >
> > > > > > > > >         Scan s;
> > > > > > > > >
> > > > > > > > >         if (cf == null) {
> > > > > > > > >
> > > > > > > > >             s =
> MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > > > > > > > >
> > > > > > > > >         } else {
> > > > > > > > >
> > > > > > > > >             s =
> > > > > > MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > > > > > > > tableNameBytes, cf, false),
> > > > > > > > >
> > > > > > > > >                     getAdjustedKey(stopKey, tableNameBytes,
> > cf,
> > > > > > true),
> > > > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > > > > > > > >
> > > > > > > > >                     clientTimeStamp);
> > > > > > > > >
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > >
> >  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > > > > > > > >
> > > > > > > > >
> >  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > > > > > > > >
> > > > > > > > >
> >  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > > QueryConstants.EMPTY_COLUMN_BYTES);
> > > > > > > > >
> > > > > > > > >         ResultScanner scanner = null;
> > > > > > > > >
> > > > > > > > >         long timeStamp =
> > MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > > > > > > > >
> > > > > > > > >         TreeMap<byte[], GuidePostsInfoBuilder>
> > > > > > > guidePostsInfoWriterPerCf
> > > > > > > > =
> > > > > > > > > new TreeMap<byte[],
> > > > GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > > > > > > > >
> > > > > > > > >         try {
> > > > > > > > >
> > > > > > > > >             scanner = statsHTable.getScanner(s);
> > > > > > > > >
> > > > > > > > >             Result result = null;
> > > > > > > > >
> > > > > > > > >             while ((result = scanner.next()) != null) {
> > > > > > > > >
> > > > > > > > >                 CellScanner cellScanner =
> > result.cellScanner();
> > > > > > > > >
> > > > > > > > >                 long rowCount = 0;
> > > > > > > > >
> > > > > > > > >                 long byteCount = 0;
> > > > > > > > >
> > > > > > > > >                 byte[] cfName = null;
> > > > > > > > >
> > > > > > > > >                 int tableNameLength;
> > > > > > > > >
> > > > > > > > >                 int cfOffset;
> > > > > > > > >
> > > > > > > > >                 int cfLength;
> > > > > > > > >
> > > > > > > > >                 boolean valuesSet = false;
> > > > > > > > >
> > > > > > > > >                 // Only the two cells with quals
> > > > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be
> > > > > retrieved
> > > > > > > > >
> > > > > > > > >                 while (cellScanner.advance()) {
> > > > > > > > >
> > > > > > > > >                     Cell current = cellScanner.current();
> > > > > > > > >
> > > > > > > > >                     if (!valuesSet) {
> > > > > > > > >
> > > > > > > > >                         tableNameLength =
> > > tableNameBytes.length +
> > > > > 1;
> > > > > > > > >
> > > > > > > > >                         cfOffset = current.getRowOffset() +
> > > > > > > > > tableNameLength;
> > > > > > > > >
> > > > > > > > >                         cfLength =
> > > > > > > > getVarCharLength(current.getRowArray(),
> > > > > > > > > cfOffset,
> > > > > > > > >
> > > > > > > > >                                 current.getRowLength() -
> > > > > > > > tableNameLength);
> > > > > > > > >
> > > > > > > > >                         ptr.set(current.getRowArray(),
> > > cfOffset,
> > > > > > > > cfLength);
> > > > > > > > >
> > > > > > > > >                         valuesSet = true;
> > > > > > > > >
> > > > > > > > >                     }
> > > > > > > > >
> > > > > > > > >                     cfName =
> > > > ByteUtil.copyKeyBytesIfNecessary(ptr);
> > > > > > > > >
> > > > > > > > >                     if
> > > (Bytes.equals(current.getQualifierArray(),
> > > > > > > current
> > > > > > > > > .getQualifierOffset(),
> > > > > > > > >
> > > > > > > > >                             current.getQualifierLength(),
> > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > > > > > > > >
> > > > > > > > >                             PhoenixDatabaseMetaData.
> > > > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > > > > > > > >
> > > > > > > > >                         rowCount =
> > > > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > > > current.getValueArray(),
> > > > > > > > >
> > > > > > > > >                                 current.getValueOffset(),
> > > > > > > > > SortOrder.getDefault());
> > > > > > > > >
> > > > > > > > >                     } else if
> > > > > > > (Bytes.equals(current.getQualifierArray(),
> > > > > > > > > current.getQualifierOffset(),
> > > > > > > > >
> > > > > > > > >                             current.getQualifierLength(),
> > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > > > > > > > length)) {
> > > > > > > > >
> > > > > > > > >                         byteCount =
> > > > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > > > current.getValueArray(),
> > > > > > > > >
> > > > > > > > >                                 current.getValueOffset(),
> > > > > > > > > SortOrder.getDefault());
> > > > > > > > >
> > > > > > > > >                     }
> > > > > > > > >
> > > > > > > > >                     if (current.getTimestamp() >
> timeStamp) {
> > > > > > > > >
> > > > > > > > >                         timeStamp = current.getTimestamp();
> > > > > > > > >
> > > > > > > > >                     }
> > > > > > > > >
> > > > > > > > >                 }
> > > > > > > > >
> > > > > > > > >                 if (cfName != null) {
> > > > > > > > >
> > > > > > > > >                     byte[] newGPStartKey =
> > > > > > getGuidePostsInfoFromRowKey(
> > > > > > > > > tableNameBytes, cfName, result.getRow());
> > > > > > > > >
> > > > > > > > >                     GuidePostsInfoBuilder
> > guidePostsInfoWriter
> > > =
> > > > > > > > > guidePostsInfoWriterPerCf.get(cfName);
> > > > > > > > >
> > > > > > > > >                     if (guidePostsInfoWriter == null) {
> > > > > > > > >
> > > > > > > > >                         guidePostsInfoWriter = new
> > > > > > > > GuidePostsInfoBuilder();
> > > > > > > > >
> > > > > > > > >
>  guidePostsInfoWriterPerCf.put(cfName,
> > > > > > > > > guidePostsInfoWriter);
> > > > > > > > >
> > > > > > > > >                     }
> > > > > > > > >
> > > > > > > > >
> > > > >  guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > > > > > > > byteCount, rowCount);
> > > > > > > > >
> > > > > > > > >                 }
> > > > > > > > >
> > > > > > > > >             }
> > > > > > > > >
> > > > > > > > >             if (!guidePostsInfoWriterPerCf.isEmpty()) {
> > return
> > > > new
> > > > > > > > > PTableStatsImpl(
> > > > > > > > >
> > > > > > > > >
> > > >  getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > > > > > > > timeStamp);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >         } finally {
> > > > > > > > >
> > > > > > > > >             if (scanner != null) {
> > > > > > > > >
> > > > > > > > >                 scanner.close();
> > > > > > > > >
> > > > > > > > >             }
> > > > > > > > >
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > >         return PTableStats.EMPTY_STATS;
> > > > > > > > >     }
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Question about table stats

Posted by Ankit Singhal <an...@gmail.com>.
I think then we need to store rowCount and byteCount at guidePost level. so
I have created a Jira(PHOENIX-2683) and uploaded a patch for the same.

On Sat, Feb 13, 2016 at 11:50 PM, James Taylor <ja...@apache.org>
wrote:

> The GUIDE_POSTS_WIDTH and GUIDE_POSTS_ROW_COUNT should contain the number
> of bytes and number of rows which were traversed since the last guidepost.
> So given some start key and stop key from a scan and knowledge that a given
> column family is used in a query, you should be able to run a query like
> this:
>
> SELECT SUM(GUIDE_POSTS_WIDTH) bytes_traversed,
>     SUM(GUIDE_POSTS_ROW_COUNT) rows_traversed
> FROM SYSTEM.STATS
> WHERE COLUMN_FAMILY = :1
> AND GUIDE_POST_KEY >= :2
> AND GUIDE_POST_KEY < :3
>
> where :1 is the column family, :2 is the start row of the scan, and :3 is
> the stop row of the scan. The result of the query should tell you the
> bytes_traversed and the rows_traversed with a granularity of the
> phoenix.stats.guidepost.width config parameter.
>
> We could even run this across all column families being traversed based on
> the which ones are referenced and projected into the scan. Or we could use
> the "empty column family" (using SchemaUtil.getEmptyColumnFamily() as
> Maryann mentioned) which is the one that is typically projected. FWIW, the
> logic of which guideposts are used by a query is here:
> BaseResultIterators.getGuidePosts().
>
> Make sense? Is that the way it's working? If not, let's file a JIRA please.
>
> Thanks,
> James
>
> On Sat, Feb 13, 2016 at 10:15 AM, Maryann Xue <ma...@gmail.com>
> wrote:
>
> > Thank you, Ankit! I see what you mean. But I think what I queried was the
> > default CF. SchemaUtil.getEmptyColumnFamily(), is that correct? I'll try
> to
> > see if I can reproduce this.
> >
> > On Sat, Feb 13, 2016 at 8:07 AM, Ankit Singhal <ankitsinghal59@gmail.com
> >
> > wrote:
> >
> > > Yes James, Query is using guidePosts as per the cf used in filter.
> > > But I think Maryann is expecting that rowcount and bytescount should be
> > > available at each guidePost key level, which we currently don't store.
> > > currently, we can use metrics(like rowcount/bytecount) at cf level only
> > > right?
> > >
> > > On Sat, Feb 13, 2016 at 11:34 AM, James Taylor <jamestaylor@apache.org
> >
> > > wrote:
> > >
> > > > We should have separate guideposts per cf, as the data distribution
> may
> > > be
> > > > different. We use the default cf if it's being filtered on, but
> > otherwise
> > > > use a different cf.
> > > >
> > > > Is that how it works currently, Ankit?
> > > >
> > > > On Friday, February 12, 2016, Ankit Singhal <
> ankitsinghal59@gmail.com>
> > > > wrote:
> > > >
> > > > > but I think we need these metrics at cf only right as per this
> > comment-
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779
> > > > >
> > > > >
> > > > > that's why we serialize aggregated value of region at cf level in
> > first
> > > > > guide post only.
> > > > >
> > > > > Regards,
> > > > > Ankit Singhal
> > > > >
> > > > > On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <
> maryann.xue@gmail.com
> > > > > <javascript:;>> wrote:
> > > > >
> > > > > > Thanks a lot for the answer, James! The data size has well
> exceeded
> > > the
> > > > > > guidepost width and the guideposts do exist but without
> > corresponding
> > > > > > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query
> > > instead
> > > > > and
> > > > > > confirm that it is a bug.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Maryann
> > > > > >
> > > > > > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <
> > > jamestaylor@apache.org
> > > > > <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Maryann,
> > > > > > > If the amount of data in a region is less than the guidepost
> > width,
> > > > > then
> > > > > > > it's possible you'd get no guideposts for that region. Do you
> > think
> > > > > > that's
> > > > > > > the case? If not, it sound like there may be a bug.
> > > > > > >
> > > > > > > Assuming you're querying to get the stats information, I'd
> > > recommend
> > > > > > doing
> > > > > > > a Phoenix query directly. The code you're emulating uses
> straight
> > > > HBase
> > > > > > > APIs because it's called from the server-side. It'd be a one
> > liner
> > > > as a
> > > > > > > Phoenix query.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > James
> > > > > > >
> > > > > > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <
> > > maryann.xue@gmail.com
> > > > > <javascript:;>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > This was something I noticed when applying Phoenix table
> stats
> > > into
> > > > > > > > Calcite-Phoenix cost calculation: When executing the
> following
> > > code
> > > > > (a
> > > > > > > > slightly modified version of the existing StatisticsUtil
> > method)
> > > to
> > > > > > scan
> > > > > > > > stats table for a specific column-family and a specific
> > > start/stop
> > > > > key
> > > > > > > > range, I got guidepost rows that did not contain the rowCount
> > or
> > > > > > > byteCount
> > > > > > > > cell, for all rows in the specified range. Apparently, I had
> > set
> > > > the
> > > > > > > > corresponding columns in the Scan (as shown below).
> Meanwhile,
> > > > > another
> > > > > > > > range of stats in the same table gave me the right result. I
> am
> > > > > > wondering
> > > > > > > > if this is an expected behavior or it is a bug?
> > > > > > > >
> > > > > > > >     public static PTableStats readStatistics(HTableInterface
> > > > > > statsHTable,
> > > > > > > >
> > > > > > > >             byte[] tableNameBytes, ImmutableBytesPtr cf,
> byte[]
> > > > > > startKey,
> > > > > > > > byte[] stopKey,
> > > > > > > >
> > > > > > > >             long clientTimeStamp)
> > > > > > > >
> > > > > > > >             throws IOException {
> > > > > > > >
> > > > > > > >         ImmutableBytesWritable ptr = new
> > > ImmutableBytesWritable();
> > > > > > > >
> > > > > > > >         Scan s;
> > > > > > > >
> > > > > > > >         if (cf == null) {
> > > > > > > >
> > > > > > > >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > > > > > > >
> > > > > > > >         } else {
> > > > > > > >
> > > > > > > >             s =
> > > > > MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > > > > > > tableNameBytes, cf, false),
> > > > > > > >
> > > > > > > >                     getAdjustedKey(stopKey, tableNameBytes,
> cf,
> > > > > true),
> > > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > > > > > > >
> > > > > > > >                     clientTimeStamp);
> > > > > > > >
> > > > > > > >         }
> > > > > > > >
> > > > > > > >
>  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > > > > > > >
> > > > > > > >
>  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > > > > > > >
> > > > > > > >
>  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > QueryConstants.EMPTY_COLUMN_BYTES);
> > > > > > > >
> > > > > > > >         ResultScanner scanner = null;
> > > > > > > >
> > > > > > > >         long timeStamp =
> MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > > > > > > >
> > > > > > > >         TreeMap<byte[], GuidePostsInfoBuilder>
> > > > > > guidePostsInfoWriterPerCf
> > > > > > > =
> > > > > > > > new TreeMap<byte[],
> > > GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > > > > > > >
> > > > > > > >         try {
> > > > > > > >
> > > > > > > >             scanner = statsHTable.getScanner(s);
> > > > > > > >
> > > > > > > >             Result result = null;
> > > > > > > >
> > > > > > > >             while ((result = scanner.next()) != null) {
> > > > > > > >
> > > > > > > >                 CellScanner cellScanner =
> result.cellScanner();
> > > > > > > >
> > > > > > > >                 long rowCount = 0;
> > > > > > > >
> > > > > > > >                 long byteCount = 0;
> > > > > > > >
> > > > > > > >                 byte[] cfName = null;
> > > > > > > >
> > > > > > > >                 int tableNameLength;
> > > > > > > >
> > > > > > > >                 int cfOffset;
> > > > > > > >
> > > > > > > >                 int cfLength;
> > > > > > > >
> > > > > > > >                 boolean valuesSet = false;
> > > > > > > >
> > > > > > > >                 // Only the two cells with quals
> > > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be
> > > > retrieved
> > > > > > > >
> > > > > > > >                 while (cellScanner.advance()) {
> > > > > > > >
> > > > > > > >                     Cell current = cellScanner.current();
> > > > > > > >
> > > > > > > >                     if (!valuesSet) {
> > > > > > > >
> > > > > > > >                         tableNameLength =
> > tableNameBytes.length +
> > > > 1;
> > > > > > > >
> > > > > > > >                         cfOffset = current.getRowOffset() +
> > > > > > > > tableNameLength;
> > > > > > > >
> > > > > > > >                         cfLength =
> > > > > > > getVarCharLength(current.getRowArray(),
> > > > > > > > cfOffset,
> > > > > > > >
> > > > > > > >                                 current.getRowLength() -
> > > > > > > tableNameLength);
> > > > > > > >
> > > > > > > >                         ptr.set(current.getRowArray(),
> > cfOffset,
> > > > > > > cfLength);
> > > > > > > >
> > > > > > > >                         valuesSet = true;
> > > > > > > >
> > > > > > > >                     }
> > > > > > > >
> > > > > > > >                     cfName =
> > > ByteUtil.copyKeyBytesIfNecessary(ptr);
> > > > > > > >
> > > > > > > >                     if
> > (Bytes.equals(current.getQualifierArray(),
> > > > > > current
> > > > > > > > .getQualifierOffset(),
> > > > > > > >
> > > > > > > >                             current.getQualifierLength(),
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > > > > > > >
> > > > > > > >                             PhoenixDatabaseMetaData.
> > > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > > > > > > >
> > > > > > > >                         rowCount =
> > > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > > current.getValueArray(),
> > > > > > > >
> > > > > > > >                                 current.getValueOffset(),
> > > > > > > > SortOrder.getDefault());
> > > > > > > >
> > > > > > > >                     } else if
> > > > > > (Bytes.equals(current.getQualifierArray(),
> > > > > > > > current.getQualifierOffset(),
> > > > > > > >
> > > > > > > >                             current.getQualifierLength(),
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > > > > > > >
> > > > > > > >
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > > > > > > length)) {
> > > > > > > >
> > > > > > > >                         byteCount =
> > > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > > current.getValueArray(),
> > > > > > > >
> > > > > > > >                                 current.getValueOffset(),
> > > > > > > > SortOrder.getDefault());
> > > > > > > >
> > > > > > > >                     }
> > > > > > > >
> > > > > > > >                     if (current.getTimestamp() > timeStamp) {
> > > > > > > >
> > > > > > > >                         timeStamp = current.getTimestamp();
> > > > > > > >
> > > > > > > >                     }
> > > > > > > >
> > > > > > > >                 }
> > > > > > > >
> > > > > > > >                 if (cfName != null) {
> > > > > > > >
> > > > > > > >                     byte[] newGPStartKey =
> > > > > getGuidePostsInfoFromRowKey(
> > > > > > > > tableNameBytes, cfName, result.getRow());
> > > > > > > >
> > > > > > > >                     GuidePostsInfoBuilder
> guidePostsInfoWriter
> > =
> > > > > > > > guidePostsInfoWriterPerCf.get(cfName);
> > > > > > > >
> > > > > > > >                     if (guidePostsInfoWriter == null) {
> > > > > > > >
> > > > > > > >                         guidePostsInfoWriter = new
> > > > > > > GuidePostsInfoBuilder();
> > > > > > > >
> > > > > > > >                         guidePostsInfoWriterPerCf.put(cfName,
> > > > > > > > guidePostsInfoWriter);
> > > > > > > >
> > > > > > > >                     }
> > > > > > > >
> > > > > > > >
> > > >  guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > > > > > > byteCount, rowCount);
> > > > > > > >
> > > > > > > >                 }
> > > > > > > >
> > > > > > > >             }
> > > > > > > >
> > > > > > > >             if (!guidePostsInfoWriterPerCf.isEmpty()) {
> return
> > > new
> > > > > > > > PTableStatsImpl(
> > > > > > > >
> > > > > > > >
> > >  getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > > > > > > timeStamp);
> > > > > > > > }
> > > > > > > >
> > > > > > > >         } finally {
> > > > > > > >
> > > > > > > >             if (scanner != null) {
> > > > > > > >
> > > > > > > >                 scanner.close();
> > > > > > > >
> > > > > > > >             }
> > > > > > > >
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         return PTableStats.EMPTY_STATS;
> > > > > > > >     }
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Question about table stats

Posted by James Taylor <ja...@apache.org>.
The GUIDE_POSTS_WIDTH and GUIDE_POSTS_ROW_COUNT should contain the number
of bytes and number of rows which were traversed since the last guidepost.
So given some start key and stop key from a scan and knowledge that a given
column family is used in a query, you should be able to run a query like
this:

SELECT SUM(GUIDE_POSTS_WIDTH) bytes_traversed,
    SUM(GUIDE_POSTS_ROW_COUNT) rows_traversed
FROM SYSTEM.STATS
WHERE COLUMN_FAMILY = :1
AND GUIDE_POST_KEY >= :2
AND GUIDE_POST_KEY < :3

where :1 is the column family, :2 is the start row of the scan, and :3 is
the stop row of the scan. The result of the query should tell you the
bytes_traversed and the rows_traversed with a granularity of the
phoenix.stats.guidepost.width config parameter.

We could even run this across all column families being traversed based on
the which ones are referenced and projected into the scan. Or we could use
the "empty column family" (using SchemaUtil.getEmptyColumnFamily() as
Maryann mentioned) which is the one that is typically projected. FWIW, the
logic of which guideposts are used by a query is here:
BaseResultIterators.getGuidePosts().

Make sense? Is that the way it's working? If not, let's file a JIRA please.

Thanks,
James

On Sat, Feb 13, 2016 at 10:15 AM, Maryann Xue <ma...@gmail.com> wrote:

> Thank you, Ankit! I see what you mean. But I think what I queried was the
> default CF. SchemaUtil.getEmptyColumnFamily(), is that correct? I'll try to
> see if I can reproduce this.
>
> On Sat, Feb 13, 2016 at 8:07 AM, Ankit Singhal <an...@gmail.com>
> wrote:
>
> > Yes James, Query is using guidePosts as per the cf used in filter.
> > But I think Maryann is expecting that rowcount and bytescount should be
> > available at each guidePost key level, which we currently don't store.
> > currently, we can use metrics(like rowcount/bytecount) at cf level only
> > right?
> >
> > On Sat, Feb 13, 2016 at 11:34 AM, James Taylor <ja...@apache.org>
> > wrote:
> >
> > > We should have separate guideposts per cf, as the data distribution may
> > be
> > > different. We use the default cf if it's being filtered on, but
> otherwise
> > > use a different cf.
> > >
> > > Is that how it works currently, Ankit?
> > >
> > > On Friday, February 12, 2016, Ankit Singhal <an...@gmail.com>
> > > wrote:
> > >
> > > > but I think we need these metrics at cf only right as per this
> comment-
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779
> > > >
> > > >
> > > > that's why we serialize aggregated value of region at cf level in
> first
> > > > guide post only.
> > > >
> > > > Regards,
> > > > Ankit Singhal
> > > >
> > > > On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <maryann.xue@gmail.com
> > > > <javascript:;>> wrote:
> > > >
> > > > > Thanks a lot for the answer, James! The data size has well exceeded
> > the
> > > > > guidepost width and the guideposts do exist but without
> corresponding
> > > > > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query
> > instead
> > > > and
> > > > > confirm that it is a bug.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Maryann
> > > > >
> > > > > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <
> > jamestaylor@apache.org
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Hi Maryann,
> > > > > > If the amount of data in a region is less than the guidepost
> width,
> > > > then
> > > > > > it's possible you'd get no guideposts for that region. Do you
> think
> > > > > that's
> > > > > > the case? If not, it sound like there may be a bug.
> > > > > >
> > > > > > Assuming you're querying to get the stats information, I'd
> > recommend
> > > > > doing
> > > > > > a Phoenix query directly. The code you're emulating uses straight
> > > HBase
> > > > > > APIs because it's called from the server-side. It'd be a one
> liner
> > > as a
> > > > > > Phoenix query.
> > > > > >
> > > > > > Thanks,
> > > > > > James
> > > > > >
> > > > > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <
> > maryann.xue@gmail.com
> > > > <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > This was something I noticed when applying Phoenix table stats
> > into
> > > > > > > Calcite-Phoenix cost calculation: When executing the following
> > code
> > > > (a
> > > > > > > slightly modified version of the existing StatisticsUtil
> method)
> > to
> > > > > scan
> > > > > > > stats table for a specific column-family and a specific
> > start/stop
> > > > key
> > > > > > > range, I got guidepost rows that did not contain the rowCount
> or
> > > > > > byteCount
> > > > > > > cell, for all rows in the specified range. Apparently, I had
> set
> > > the
> > > > > > > corresponding columns in the Scan (as shown below). Meanwhile,
> > > > another
> > > > > > > range of stats in the same table gave me the right result. I am
> > > > > wondering
> > > > > > > if this is an expected behavior or it is a bug?
> > > > > > >
> > > > > > >     public static PTableStats readStatistics(HTableInterface
> > > > > statsHTable,
> > > > > > >
> > > > > > >             byte[] tableNameBytes, ImmutableBytesPtr cf, byte[]
> > > > > startKey,
> > > > > > > byte[] stopKey,
> > > > > > >
> > > > > > >             long clientTimeStamp)
> > > > > > >
> > > > > > >             throws IOException {
> > > > > > >
> > > > > > >         ImmutableBytesWritable ptr = new
> > ImmutableBytesWritable();
> > > > > > >
> > > > > > >         Scan s;
> > > > > > >
> > > > > > >         if (cf == null) {
> > > > > > >
> > > > > > >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > > > > > >
> > > > > > >         } else {
> > > > > > >
> > > > > > >             s =
> > > > MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > > > > > tableNameBytes, cf, false),
> > > > > > >
> > > > > > >                     getAdjustedKey(stopKey, tableNameBytes, cf,
> > > > true),
> > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > > > > > >
> > > > > > >                     clientTimeStamp);
> > > > > > >
> > > > > > >         }
> > > > > > >
> > > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > > > > > >
> > > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > > > > > >
> > > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > QueryConstants.EMPTY_COLUMN_BYTES);
> > > > > > >
> > > > > > >         ResultScanner scanner = null;
> > > > > > >
> > > > > > >         long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > > > > > >
> > > > > > >         TreeMap<byte[], GuidePostsInfoBuilder>
> > > > > guidePostsInfoWriterPerCf
> > > > > > =
> > > > > > > new TreeMap<byte[],
> > GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > > > > > >
> > > > > > >         try {
> > > > > > >
> > > > > > >             scanner = statsHTable.getScanner(s);
> > > > > > >
> > > > > > >             Result result = null;
> > > > > > >
> > > > > > >             while ((result = scanner.next()) != null) {
> > > > > > >
> > > > > > >                 CellScanner cellScanner = result.cellScanner();
> > > > > > >
> > > > > > >                 long rowCount = 0;
> > > > > > >
> > > > > > >                 long byteCount = 0;
> > > > > > >
> > > > > > >                 byte[] cfName = null;
> > > > > > >
> > > > > > >                 int tableNameLength;
> > > > > > >
> > > > > > >                 int cfOffset;
> > > > > > >
> > > > > > >                 int cfLength;
> > > > > > >
> > > > > > >                 boolean valuesSet = false;
> > > > > > >
> > > > > > >                 // Only the two cells with quals
> > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be
> > > retrieved
> > > > > > >
> > > > > > >                 while (cellScanner.advance()) {
> > > > > > >
> > > > > > >                     Cell current = cellScanner.current();
> > > > > > >
> > > > > > >                     if (!valuesSet) {
> > > > > > >
> > > > > > >                         tableNameLength =
> tableNameBytes.length +
> > > 1;
> > > > > > >
> > > > > > >                         cfOffset = current.getRowOffset() +
> > > > > > > tableNameLength;
> > > > > > >
> > > > > > >                         cfLength =
> > > > > > getVarCharLength(current.getRowArray(),
> > > > > > > cfOffset,
> > > > > > >
> > > > > > >                                 current.getRowLength() -
> > > > > > tableNameLength);
> > > > > > >
> > > > > > >                         ptr.set(current.getRowArray(),
> cfOffset,
> > > > > > cfLength);
> > > > > > >
> > > > > > >                         valuesSet = true;
> > > > > > >
> > > > > > >                     }
> > > > > > >
> > > > > > >                     cfName =
> > ByteUtil.copyKeyBytesIfNecessary(ptr);
> > > > > > >
> > > > > > >                     if
> (Bytes.equals(current.getQualifierArray(),
> > > > > current
> > > > > > > .getQualifierOffset(),
> > > > > > >
> > > > > > >                             current.getQualifierLength(),
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > > > > > >
> > > > > > >                             PhoenixDatabaseMetaData.
> > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > > > > > >
> > > > > > >                         rowCount =
> > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > current.getValueArray(),
> > > > > > >
> > > > > > >                                 current.getValueOffset(),
> > > > > > > SortOrder.getDefault());
> > > > > > >
> > > > > > >                     } else if
> > > > > (Bytes.equals(current.getQualifierArray(),
> > > > > > > current.getQualifierOffset(),
> > > > > > >
> > > > > > >                             current.getQualifierLength(),
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > > > > > >
> > > > > > >
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > > > > > length)) {
> > > > > > >
> > > > > > >                         byteCount =
> > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > current.getValueArray(),
> > > > > > >
> > > > > > >                                 current.getValueOffset(),
> > > > > > > SortOrder.getDefault());
> > > > > > >
> > > > > > >                     }
> > > > > > >
> > > > > > >                     if (current.getTimestamp() > timeStamp) {
> > > > > > >
> > > > > > >                         timeStamp = current.getTimestamp();
> > > > > > >
> > > > > > >                     }
> > > > > > >
> > > > > > >                 }
> > > > > > >
> > > > > > >                 if (cfName != null) {
> > > > > > >
> > > > > > >                     byte[] newGPStartKey =
> > > > getGuidePostsInfoFromRowKey(
> > > > > > > tableNameBytes, cfName, result.getRow());
> > > > > > >
> > > > > > >                     GuidePostsInfoBuilder guidePostsInfoWriter
> =
> > > > > > > guidePostsInfoWriterPerCf.get(cfName);
> > > > > > >
> > > > > > >                     if (guidePostsInfoWriter == null) {
> > > > > > >
> > > > > > >                         guidePostsInfoWriter = new
> > > > > > GuidePostsInfoBuilder();
> > > > > > >
> > > > > > >                         guidePostsInfoWriterPerCf.put(cfName,
> > > > > > > guidePostsInfoWriter);
> > > > > > >
> > > > > > >                     }
> > > > > > >
> > > > > > >
> > >  guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > > > > > byteCount, rowCount);
> > > > > > >
> > > > > > >                 }
> > > > > > >
> > > > > > >             }
> > > > > > >
> > > > > > >             if (!guidePostsInfoWriterPerCf.isEmpty()) { return
> > new
> > > > > > > PTableStatsImpl(
> > > > > > >
> > > > > > >
> >  getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > > > > > timeStamp);
> > > > > > > }
> > > > > > >
> > > > > > >         } finally {
> > > > > > >
> > > > > > >             if (scanner != null) {
> > > > > > >
> > > > > > >                 scanner.close();
> > > > > > >
> > > > > > >             }
> > > > > > >
> > > > > > >         }
> > > > > > >
> > > > > > >         return PTableStats.EMPTY_STATS;
> > > > > > >     }
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Question about table stats

Posted by Maryann Xue <ma...@gmail.com>.
Thank you, Ankit! I see what you mean. But I think what I queried was the
default CF. SchemaUtil.getEmptyColumnFamily(), is that correct? I'll try to
see if I can reproduce this.

On Sat, Feb 13, 2016 at 8:07 AM, Ankit Singhal <an...@gmail.com>
wrote:

> Yes James, Query is using guidePosts as per the cf used in filter.
> But I think Maryann is expecting that rowcount and bytescount should be
> available at each guidePost key level, which we currently don't store.
> currently, we can use metrics(like rowcount/bytecount) at cf level only
> right?
>
> On Sat, Feb 13, 2016 at 11:34 AM, James Taylor <ja...@apache.org>
> wrote:
>
> > We should have separate guideposts per cf, as the data distribution may
> be
> > different. We use the default cf if it's being filtered on, but otherwise
> > use a different cf.
> >
> > Is that how it works currently, Ankit?
> >
> > On Friday, February 12, 2016, Ankit Singhal <an...@gmail.com>
> > wrote:
> >
> > > but I think we need these metrics at cf only right as per this comment-
> > >
> > >
> >
> https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779
> > >
> > >
> > > that's why we serialize aggregated value of region at cf level in first
> > > guide post only.
> > >
> > > Regards,
> > > Ankit Singhal
> > >
> > > On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <maryann.xue@gmail.com
> > > <javascript:;>> wrote:
> > >
> > > > Thanks a lot for the answer, James! The data size has well exceeded
> the
> > > > guidepost width and the guideposts do exist but without corresponding
> > > > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query
> instead
> > > and
> > > > confirm that it is a bug.
> > > >
> > > >
> > > > Thanks,
> > > > Maryann
> > > >
> > > > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <
> jamestaylor@apache.org
> > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Hi Maryann,
> > > > > If the amount of data in a region is less than the guidepost width,
> > > then
> > > > > it's possible you'd get no guideposts for that region. Do you think
> > > > that's
> > > > > the case? If not, it sound like there may be a bug.
> > > > >
> > > > > Assuming you're querying to get the stats information, I'd
> recommend
> > > > doing
> > > > > a Phoenix query directly. The code you're emulating uses straight
> > HBase
> > > > > APIs because it's called from the server-side. It'd be a one liner
> > as a
> > > > > Phoenix query.
> > > > >
> > > > > Thanks,
> > > > > James
> > > > >
> > > > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <
> maryann.xue@gmail.com
> > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This was something I noticed when applying Phoenix table stats
> into
> > > > > > Calcite-Phoenix cost calculation: When executing the following
> code
> > > (a
> > > > > > slightly modified version of the existing StatisticsUtil method)
> to
> > > > scan
> > > > > > stats table for a specific column-family and a specific
> start/stop
> > > key
> > > > > > range, I got guidepost rows that did not contain the rowCount or
> > > > > byteCount
> > > > > > cell, for all rows in the specified range. Apparently, I had set
> > the
> > > > > > corresponding columns in the Scan (as shown below). Meanwhile,
> > > another
> > > > > > range of stats in the same table gave me the right result. I am
> > > > wondering
> > > > > > if this is an expected behavior or it is a bug?
> > > > > >
> > > > > >     public static PTableStats readStatistics(HTableInterface
> > > > statsHTable,
> > > > > >
> > > > > >             byte[] tableNameBytes, ImmutableBytesPtr cf, byte[]
> > > > startKey,
> > > > > > byte[] stopKey,
> > > > > >
> > > > > >             long clientTimeStamp)
> > > > > >
> > > > > >             throws IOException {
> > > > > >
> > > > > >         ImmutableBytesWritable ptr = new
> ImmutableBytesWritable();
> > > > > >
> > > > > >         Scan s;
> > > > > >
> > > > > >         if (cf == null) {
> > > > > >
> > > > > >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > > > > >
> > > > > >         } else {
> > > > > >
> > > > > >             s =
> > > MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > > > > tableNameBytes, cf, false),
> > > > > >
> > > > > >                     getAdjustedKey(stopKey, tableNameBytes, cf,
> > > true),
> > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > > > > >
> > > > > >                     clientTimeStamp);
> > > > > >
> > > > > >         }
> > > > > >
> > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > > > > >
> > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > > > > >
> > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > QueryConstants.EMPTY_COLUMN_BYTES);
> > > > > >
> > > > > >         ResultScanner scanner = null;
> > > > > >
> > > > > >         long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > > > > >
> > > > > >         TreeMap<byte[], GuidePostsInfoBuilder>
> > > > guidePostsInfoWriterPerCf
> > > > > =
> > > > > > new TreeMap<byte[],
> GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > > > > >
> > > > > >         try {
> > > > > >
> > > > > >             scanner = statsHTable.getScanner(s);
> > > > > >
> > > > > >             Result result = null;
> > > > > >
> > > > > >             while ((result = scanner.next()) != null) {
> > > > > >
> > > > > >                 CellScanner cellScanner = result.cellScanner();
> > > > > >
> > > > > >                 long rowCount = 0;
> > > > > >
> > > > > >                 long byteCount = 0;
> > > > > >
> > > > > >                 byte[] cfName = null;
> > > > > >
> > > > > >                 int tableNameLength;
> > > > > >
> > > > > >                 int cfOffset;
> > > > > >
> > > > > >                 int cfLength;
> > > > > >
> > > > > >                 boolean valuesSet = false;
> > > > > >
> > > > > >                 // Only the two cells with quals
> > > > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be
> > retrieved
> > > > > >
> > > > > >                 while (cellScanner.advance()) {
> > > > > >
> > > > > >                     Cell current = cellScanner.current();
> > > > > >
> > > > > >                     if (!valuesSet) {
> > > > > >
> > > > > >                         tableNameLength = tableNameBytes.length +
> > 1;
> > > > > >
> > > > > >                         cfOffset = current.getRowOffset() +
> > > > > > tableNameLength;
> > > > > >
> > > > > >                         cfLength =
> > > > > getVarCharLength(current.getRowArray(),
> > > > > > cfOffset,
> > > > > >
> > > > > >                                 current.getRowLength() -
> > > > > tableNameLength);
> > > > > >
> > > > > >                         ptr.set(current.getRowArray(), cfOffset,
> > > > > cfLength);
> > > > > >
> > > > > >                         valuesSet = true;
> > > > > >
> > > > > >                     }
> > > > > >
> > > > > >                     cfName =
> ByteUtil.copyKeyBytesIfNecessary(ptr);
> > > > > >
> > > > > >                     if (Bytes.equals(current.getQualifierArray(),
> > > > current
> > > > > > .getQualifierOffset(),
> > > > > >
> > > > > >                             current.getQualifierLength(),
> > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > > > > >
> > > > > >                             PhoenixDatabaseMetaData.
> > > > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > > > > >
> > > > > >                         rowCount =
> > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > current.getValueArray(),
> > > > > >
> > > > > >                                 current.getValueOffset(),
> > > > > > SortOrder.getDefault());
> > > > > >
> > > > > >                     } else if
> > > > (Bytes.equals(current.getQualifierArray(),
> > > > > > current.getQualifierOffset(),
> > > > > >
> > > > > >                             current.getQualifierLength(),
> > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > > > > >
> > > > > >
> > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > > > > length)) {
> > > > > >
> > > > > >                         byteCount =
> > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > current.getValueArray(),
> > > > > >
> > > > > >                                 current.getValueOffset(),
> > > > > > SortOrder.getDefault());
> > > > > >
> > > > > >                     }
> > > > > >
> > > > > >                     if (current.getTimestamp() > timeStamp) {
> > > > > >
> > > > > >                         timeStamp = current.getTimestamp();
> > > > > >
> > > > > >                     }
> > > > > >
> > > > > >                 }
> > > > > >
> > > > > >                 if (cfName != null) {
> > > > > >
> > > > > >                     byte[] newGPStartKey =
> > > getGuidePostsInfoFromRowKey(
> > > > > > tableNameBytes, cfName, result.getRow());
> > > > > >
> > > > > >                     GuidePostsInfoBuilder guidePostsInfoWriter =
> > > > > > guidePostsInfoWriterPerCf.get(cfName);
> > > > > >
> > > > > >                     if (guidePostsInfoWriter == null) {
> > > > > >
> > > > > >                         guidePostsInfoWriter = new
> > > > > GuidePostsInfoBuilder();
> > > > > >
> > > > > >                         guidePostsInfoWriterPerCf.put(cfName,
> > > > > > guidePostsInfoWriter);
> > > > > >
> > > > > >                     }
> > > > > >
> > > > > >
> >  guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > > > > byteCount, rowCount);
> > > > > >
> > > > > >                 }
> > > > > >
> > > > > >             }
> > > > > >
> > > > > >             if (!guidePostsInfoWriterPerCf.isEmpty()) { return
> new
> > > > > > PTableStatsImpl(
> > > > > >
> > > > > >
>  getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > > > > timeStamp);
> > > > > > }
> > > > > >
> > > > > >         } finally {
> > > > > >
> > > > > >             if (scanner != null) {
> > > > > >
> > > > > >                 scanner.close();
> > > > > >
> > > > > >             }
> > > > > >
> > > > > >         }
> > > > > >
> > > > > >         return PTableStats.EMPTY_STATS;
> > > > > >     }
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Question about table stats

Posted by Ankit Singhal <an...@gmail.com>.
Yes James, Query is using guidePosts as per the cf used in filter.
But I think Maryann is expecting that rowcount and bytescount should be
available at each guidePost key level, which we currently don't store.
currently, we can use metrics(like rowcount/bytecount) at cf level only
right?

On Sat, Feb 13, 2016 at 11:34 AM, James Taylor <ja...@apache.org>
wrote:

> We should have separate guideposts per cf, as the data distribution may be
> different. We use the default cf if it's being filtered on, but otherwise
> use a different cf.
>
> Is that how it works currently, Ankit?
>
> On Friday, February 12, 2016, Ankit Singhal <an...@gmail.com>
> wrote:
>
> > but I think we need these metrics at cf only right as per this comment-
> >
> >
> https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779
> >
> >
> > that's why we serialize aggregated value of region at cf level in first
> > guide post only.
> >
> > Regards,
> > Ankit Singhal
> >
> > On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <maryann.xue@gmail.com
> > <javascript:;>> wrote:
> >
> > > Thanks a lot for the answer, James! The data size has well exceeded the
> > > guidepost width and the guideposts do exist but without corresponding
> > > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query instead
> > and
> > > confirm that it is a bug.
> > >
> > >
> > > Thanks,
> > > Maryann
> > >
> > > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <jamestaylor@apache.org
> > <javascript:;>>
> > > wrote:
> > >
> > > > Hi Maryann,
> > > > If the amount of data in a region is less than the guidepost width,
> > then
> > > > it's possible you'd get no guideposts for that region. Do you think
> > > that's
> > > > the case? If not, it sound like there may be a bug.
> > > >
> > > > Assuming you're querying to get the stats information, I'd recommend
> > > doing
> > > > a Phoenix query directly. The code you're emulating uses straight
> HBase
> > > > APIs because it's called from the server-side. It'd be a one liner
> as a
> > > > Phoenix query.
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <maryann.xue@gmail.com
> > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > This was something I noticed when applying Phoenix table stats into
> > > > > Calcite-Phoenix cost calculation: When executing the following code
> > (a
> > > > > slightly modified version of the existing StatisticsUtil method) to
> > > scan
> > > > > stats table for a specific column-family and a specific start/stop
> > key
> > > > > range, I got guidepost rows that did not contain the rowCount or
> > > > byteCount
> > > > > cell, for all rows in the specified range. Apparently, I had set
> the
> > > > > corresponding columns in the Scan (as shown below). Meanwhile,
> > another
> > > > > range of stats in the same table gave me the right result. I am
> > > wondering
> > > > > if this is an expected behavior or it is a bug?
> > > > >
> > > > >     public static PTableStats readStatistics(HTableInterface
> > > statsHTable,
> > > > >
> > > > >             byte[] tableNameBytes, ImmutableBytesPtr cf, byte[]
> > > startKey,
> > > > > byte[] stopKey,
> > > > >
> > > > >             long clientTimeStamp)
> > > > >
> > > > >             throws IOException {
> > > > >
> > > > >         ImmutableBytesWritable ptr = new ImmutableBytesWritable();
> > > > >
> > > > >         Scan s;
> > > > >
> > > > >         if (cf == null) {
> > > > >
> > > > >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > > > >
> > > > >         } else {
> > > > >
> > > > >             s =
> > MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > > > tableNameBytes, cf, false),
> > > > >
> > > > >                     getAdjustedKey(stopKey, tableNameBytes, cf,
> > true),
> > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > > > >
> > > > >                     clientTimeStamp);
> > > > >
> > > > >         }
> > > > >
> > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > > > >
> > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > > > >
> > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > QueryConstants.EMPTY_COLUMN_BYTES);
> > > > >
> > > > >         ResultScanner scanner = null;
> > > > >
> > > > >         long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > > > >
> > > > >         TreeMap<byte[], GuidePostsInfoBuilder>
> > > guidePostsInfoWriterPerCf
> > > > =
> > > > > new TreeMap<byte[], GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > > > >
> > > > >         try {
> > > > >
> > > > >             scanner = statsHTable.getScanner(s);
> > > > >
> > > > >             Result result = null;
> > > > >
> > > > >             while ((result = scanner.next()) != null) {
> > > > >
> > > > >                 CellScanner cellScanner = result.cellScanner();
> > > > >
> > > > >                 long rowCount = 0;
> > > > >
> > > > >                 long byteCount = 0;
> > > > >
> > > > >                 byte[] cfName = null;
> > > > >
> > > > >                 int tableNameLength;
> > > > >
> > > > >                 int cfOffset;
> > > > >
> > > > >                 int cfLength;
> > > > >
> > > > >                 boolean valuesSet = false;
> > > > >
> > > > >                 // Only the two cells with quals
> > > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be
> retrieved
> > > > >
> > > > >                 while (cellScanner.advance()) {
> > > > >
> > > > >                     Cell current = cellScanner.current();
> > > > >
> > > > >                     if (!valuesSet) {
> > > > >
> > > > >                         tableNameLength = tableNameBytes.length +
> 1;
> > > > >
> > > > >                         cfOffset = current.getRowOffset() +
> > > > > tableNameLength;
> > > > >
> > > > >                         cfLength =
> > > > getVarCharLength(current.getRowArray(),
> > > > > cfOffset,
> > > > >
> > > > >                                 current.getRowLength() -
> > > > tableNameLength);
> > > > >
> > > > >                         ptr.set(current.getRowArray(), cfOffset,
> > > > cfLength);
> > > > >
> > > > >                         valuesSet = true;
> > > > >
> > > > >                     }
> > > > >
> > > > >                     cfName = ByteUtil.copyKeyBytesIfNecessary(ptr);
> > > > >
> > > > >                     if (Bytes.equals(current.getQualifierArray(),
> > > current
> > > > > .getQualifierOffset(),
> > > > >
> > > > >                             current.getQualifierLength(),
> > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > > > >
> > > > >                             PhoenixDatabaseMetaData.
> > > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > > > >
> > > > >                         rowCount =
> > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > current.getValueArray(),
> > > > >
> > > > >                                 current.getValueOffset(),
> > > > > SortOrder.getDefault());
> > > > >
> > > > >                     } else if
> > > (Bytes.equals(current.getQualifierArray(),
> > > > > current.getQualifierOffset(),
> > > > >
> > > > >                             current.getQualifierLength(),
> > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > > > >
> > > > >
> > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > > > length)) {
> > > > >
> > > > >                         byteCount =
> > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > current.getValueArray(),
> > > > >
> > > > >                                 current.getValueOffset(),
> > > > > SortOrder.getDefault());
> > > > >
> > > > >                     }
> > > > >
> > > > >                     if (current.getTimestamp() > timeStamp) {
> > > > >
> > > > >                         timeStamp = current.getTimestamp();
> > > > >
> > > > >                     }
> > > > >
> > > > >                 }
> > > > >
> > > > >                 if (cfName != null) {
> > > > >
> > > > >                     byte[] newGPStartKey =
> > getGuidePostsInfoFromRowKey(
> > > > > tableNameBytes, cfName, result.getRow());
> > > > >
> > > > >                     GuidePostsInfoBuilder guidePostsInfoWriter =
> > > > > guidePostsInfoWriterPerCf.get(cfName);
> > > > >
> > > > >                     if (guidePostsInfoWriter == null) {
> > > > >
> > > > >                         guidePostsInfoWriter = new
> > > > GuidePostsInfoBuilder();
> > > > >
> > > > >                         guidePostsInfoWriterPerCf.put(cfName,
> > > > > guidePostsInfoWriter);
> > > > >
> > > > >                     }
> > > > >
> > > > >
>  guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > > > byteCount, rowCount);
> > > > >
> > > > >                 }
> > > > >
> > > > >             }
> > > > >
> > > > >             if (!guidePostsInfoWriterPerCf.isEmpty()) { return new
> > > > > PTableStatsImpl(
> > > > >
> > > > >                     getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > > > timeStamp);
> > > > > }
> > > > >
> > > > >         } finally {
> > > > >
> > > > >             if (scanner != null) {
> > > > >
> > > > >                 scanner.close();
> > > > >
> > > > >             }
> > > > >
> > > > >         }
> > > > >
> > > > >         return PTableStats.EMPTY_STATS;
> > > > >     }
> > > > >
> > > >
> > >
> >
>

Re: Question about table stats

Posted by James Taylor <ja...@apache.org>.
We should have separate guideposts per cf, as the data distribution may be
different. We use the default cf if it's being filtered on, but otherwise
use a different cf.

Is that how it works currently, Ankit?

On Friday, February 12, 2016, Ankit Singhal <an...@gmail.com>
wrote:

> but I think we need these metrics at cf only right as per this comment-
>
> https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779
>
>
> that's why we serialize aggregated value of region at cf level in first
> guide post only.
>
> Regards,
> Ankit Singhal
>
> On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <maryann.xue@gmail.com
> <javascript:;>> wrote:
>
> > Thanks a lot for the answer, James! The data size has well exceeded the
> > guidepost width and the guideposts do exist but without corresponding
> > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query instead
> and
> > confirm that it is a bug.
> >
> >
> > Thanks,
> > Maryann
> >
> > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <jamestaylor@apache.org
> <javascript:;>>
> > wrote:
> >
> > > Hi Maryann,
> > > If the amount of data in a region is less than the guidepost width,
> then
> > > it's possible you'd get no guideposts for that region. Do you think
> > that's
> > > the case? If not, it sound like there may be a bug.
> > >
> > > Assuming you're querying to get the stats information, I'd recommend
> > doing
> > > a Phoenix query directly. The code you're emulating uses straight HBase
> > > APIs because it's called from the server-side. It'd be a one liner as a
> > > Phoenix query.
> > >
> > > Thanks,
> > > James
> > >
> > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <maryann.xue@gmail.com
> <javascript:;>>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > This was something I noticed when applying Phoenix table stats into
> > > > Calcite-Phoenix cost calculation: When executing the following code
> (a
> > > > slightly modified version of the existing StatisticsUtil method) to
> > scan
> > > > stats table for a specific column-family and a specific start/stop
> key
> > > > range, I got guidepost rows that did not contain the rowCount or
> > > byteCount
> > > > cell, for all rows in the specified range. Apparently, I had set the
> > > > corresponding columns in the Scan (as shown below). Meanwhile,
> another
> > > > range of stats in the same table gave me the right result. I am
> > wondering
> > > > if this is an expected behavior or it is a bug?
> > > >
> > > >     public static PTableStats readStatistics(HTableInterface
> > statsHTable,
> > > >
> > > >             byte[] tableNameBytes, ImmutableBytesPtr cf, byte[]
> > startKey,
> > > > byte[] stopKey,
> > > >
> > > >             long clientTimeStamp)
> > > >
> > > >             throws IOException {
> > > >
> > > >         ImmutableBytesWritable ptr = new ImmutableBytesWritable();
> > > >
> > > >         Scan s;
> > > >
> > > >         if (cf == null) {
> > > >
> > > >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > > >
> > > >         } else {
> > > >
> > > >             s =
> MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > > tableNameBytes, cf, false),
> > > >
> > > >                     getAdjustedKey(stopKey, tableNameBytes, cf,
> true),
> > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > > >
> > > >                     clientTimeStamp);
> > > >
> > > >         }
> > > >
> > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > > >
> > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > > >
> > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > QueryConstants.EMPTY_COLUMN_BYTES);
> > > >
> > > >         ResultScanner scanner = null;
> > > >
> > > >         long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > > >
> > > >         TreeMap<byte[], GuidePostsInfoBuilder>
> > guidePostsInfoWriterPerCf
> > > =
> > > > new TreeMap<byte[], GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > > >
> > > >         try {
> > > >
> > > >             scanner = statsHTable.getScanner(s);
> > > >
> > > >             Result result = null;
> > > >
> > > >             while ((result = scanner.next()) != null) {
> > > >
> > > >                 CellScanner cellScanner = result.cellScanner();
> > > >
> > > >                 long rowCount = 0;
> > > >
> > > >                 long byteCount = 0;
> > > >
> > > >                 byte[] cfName = null;
> > > >
> > > >                 int tableNameLength;
> > > >
> > > >                 int cfOffset;
> > > >
> > > >                 int cfLength;
> > > >
> > > >                 boolean valuesSet = false;
> > > >
> > > >                 // Only the two cells with quals
> > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be retrieved
> > > >
> > > >                 while (cellScanner.advance()) {
> > > >
> > > >                     Cell current = cellScanner.current();
> > > >
> > > >                     if (!valuesSet) {
> > > >
> > > >                         tableNameLength = tableNameBytes.length + 1;
> > > >
> > > >                         cfOffset = current.getRowOffset() +
> > > > tableNameLength;
> > > >
> > > >                         cfLength =
> > > getVarCharLength(current.getRowArray(),
> > > > cfOffset,
> > > >
> > > >                                 current.getRowLength() -
> > > tableNameLength);
> > > >
> > > >                         ptr.set(current.getRowArray(), cfOffset,
> > > cfLength);
> > > >
> > > >                         valuesSet = true;
> > > >
> > > >                     }
> > > >
> > > >                     cfName = ByteUtil.copyKeyBytesIfNecessary(ptr);
> > > >
> > > >                     if (Bytes.equals(current.getQualifierArray(),
> > current
> > > > .getQualifierOffset(),
> > > >
> > > >                             current.getQualifierLength(),
> > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > > >
> > > >                             PhoenixDatabaseMetaData.
> > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > > >
> > > >                         rowCount =
> > PLong.INSTANCE.getCodec().decodeLong(
> > > > current.getValueArray(),
> > > >
> > > >                                 current.getValueOffset(),
> > > > SortOrder.getDefault());
> > > >
> > > >                     } else if
> > (Bytes.equals(current.getQualifierArray(),
> > > > current.getQualifierOffset(),
> > > >
> > > >                             current.getQualifierLength(),
> > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > > >
> > > >
> > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > > length)) {
> > > >
> > > >                         byteCount =
> > PLong.INSTANCE.getCodec().decodeLong(
> > > > current.getValueArray(),
> > > >
> > > >                                 current.getValueOffset(),
> > > > SortOrder.getDefault());
> > > >
> > > >                     }
> > > >
> > > >                     if (current.getTimestamp() > timeStamp) {
> > > >
> > > >                         timeStamp = current.getTimestamp();
> > > >
> > > >                     }
> > > >
> > > >                 }
> > > >
> > > >                 if (cfName != null) {
> > > >
> > > >                     byte[] newGPStartKey =
> getGuidePostsInfoFromRowKey(
> > > > tableNameBytes, cfName, result.getRow());
> > > >
> > > >                     GuidePostsInfoBuilder guidePostsInfoWriter =
> > > > guidePostsInfoWriterPerCf.get(cfName);
> > > >
> > > >                     if (guidePostsInfoWriter == null) {
> > > >
> > > >                         guidePostsInfoWriter = new
> > > GuidePostsInfoBuilder();
> > > >
> > > >                         guidePostsInfoWriterPerCf.put(cfName,
> > > > guidePostsInfoWriter);
> > > >
> > > >                     }
> > > >
> > > >                     guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > > byteCount, rowCount);
> > > >
> > > >                 }
> > > >
> > > >             }
> > > >
> > > >             if (!guidePostsInfoWriterPerCf.isEmpty()) { return new
> > > > PTableStatsImpl(
> > > >
> > > >                     getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > > timeStamp);
> > > > }
> > > >
> > > >         } finally {
> > > >
> > > >             if (scanner != null) {
> > > >
> > > >                 scanner.close();
> > > >
> > > >             }
> > > >
> > > >         }
> > > >
> > > >         return PTableStats.EMPTY_STATS;
> > > >     }
> > > >
> > >
> >
>

Re: Question about table stats

Posted by Ankit Singhal <an...@gmail.com>.
but I think we need these metrics at cf only right as per this comment-
https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779


that's why we serialize aggregated value of region at cf level in first
guide post only.

Regards,
Ankit Singhal

On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <ma...@gmail.com> wrote:

> Thanks a lot for the answer, James! The data size has well exceeded the
> guidepost width and the guideposts do exist but without corresponding
> "rowCount" or "byteCount" cell. I'll try doing a Phoenix query instead and
> confirm that it is a bug.
>
>
> Thanks,
> Maryann
>
> On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <ja...@apache.org>
> wrote:
>
> > Hi Maryann,
> > If the amount of data in a region is less than the guidepost width, then
> > it's possible you'd get no guideposts for that region. Do you think
> that's
> > the case? If not, it sound like there may be a bug.
> >
> > Assuming you're querying to get the stats information, I'd recommend
> doing
> > a Phoenix query directly. The code you're emulating uses straight HBase
> > APIs because it's called from the server-side. It'd be a one liner as a
> > Phoenix query.
> >
> > Thanks,
> > James
> >
> > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <ma...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > This was something I noticed when applying Phoenix table stats into
> > > Calcite-Phoenix cost calculation: When executing the following code (a
> > > slightly modified version of the existing StatisticsUtil method) to
> scan
> > > stats table for a specific column-family and a specific start/stop key
> > > range, I got guidepost rows that did not contain the rowCount or
> > byteCount
> > > cell, for all rows in the specified range. Apparently, I had set the
> > > corresponding columns in the Scan (as shown below). Meanwhile, another
> > > range of stats in the same table gave me the right result. I am
> wondering
> > > if this is an expected behavior or it is a bug?
> > >
> > >     public static PTableStats readStatistics(HTableInterface
> statsHTable,
> > >
> > >             byte[] tableNameBytes, ImmutableBytesPtr cf, byte[]
> startKey,
> > > byte[] stopKey,
> > >
> > >             long clientTimeStamp)
> > >
> > >             throws IOException {
> > >
> > >         ImmutableBytesWritable ptr = new ImmutableBytesWritable();
> > >
> > >         Scan s;
> > >
> > >         if (cf == null) {
> > >
> > >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > >
> > >         } else {
> > >
> > >             s = MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > tableNameBytes, cf, false),
> > >
> > >                     getAdjustedKey(stopKey, tableNameBytes, cf, true),
> > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > >
> > >                     clientTimeStamp);
> > >
> > >         }
> > >
> > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > >
> > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > >
> > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > QueryConstants.EMPTY_COLUMN_BYTES);
> > >
> > >         ResultScanner scanner = null;
> > >
> > >         long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > >
> > >         TreeMap<byte[], GuidePostsInfoBuilder>
> guidePostsInfoWriterPerCf
> > =
> > > new TreeMap<byte[], GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > >
> > >         try {
> > >
> > >             scanner = statsHTable.getScanner(s);
> > >
> > >             Result result = null;
> > >
> > >             while ((result = scanner.next()) != null) {
> > >
> > >                 CellScanner cellScanner = result.cellScanner();
> > >
> > >                 long rowCount = 0;
> > >
> > >                 long byteCount = 0;
> > >
> > >                 byte[] cfName = null;
> > >
> > >                 int tableNameLength;
> > >
> > >                 int cfOffset;
> > >
> > >                 int cfLength;
> > >
> > >                 boolean valuesSet = false;
> > >
> > >                 // Only the two cells with quals
> > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be retrieved
> > >
> > >                 while (cellScanner.advance()) {
> > >
> > >                     Cell current = cellScanner.current();
> > >
> > >                     if (!valuesSet) {
> > >
> > >                         tableNameLength = tableNameBytes.length + 1;
> > >
> > >                         cfOffset = current.getRowOffset() +
> > > tableNameLength;
> > >
> > >                         cfLength =
> > getVarCharLength(current.getRowArray(),
> > > cfOffset,
> > >
> > >                                 current.getRowLength() -
> > tableNameLength);
> > >
> > >                         ptr.set(current.getRowArray(), cfOffset,
> > cfLength);
> > >
> > >                         valuesSet = true;
> > >
> > >                     }
> > >
> > >                     cfName = ByteUtil.copyKeyBytesIfNecessary(ptr);
> > >
> > >                     if (Bytes.equals(current.getQualifierArray(),
> current
> > > .getQualifierOffset(),
> > >
> > >                             current.getQualifierLength(),
> > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > >
> > >                             PhoenixDatabaseMetaData.
> > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > >
> > >                         rowCount =
> PLong.INSTANCE.getCodec().decodeLong(
> > > current.getValueArray(),
> > >
> > >                                 current.getValueOffset(),
> > > SortOrder.getDefault());
> > >
> > >                     } else if
> (Bytes.equals(current.getQualifierArray(),
> > > current.getQualifierOffset(),
> > >
> > >                             current.getQualifierLength(),
> > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > >
> > >
> > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > length)) {
> > >
> > >                         byteCount =
> PLong.INSTANCE.getCodec().decodeLong(
> > > current.getValueArray(),
> > >
> > >                                 current.getValueOffset(),
> > > SortOrder.getDefault());
> > >
> > >                     }
> > >
> > >                     if (current.getTimestamp() > timeStamp) {
> > >
> > >                         timeStamp = current.getTimestamp();
> > >
> > >                     }
> > >
> > >                 }
> > >
> > >                 if (cfName != null) {
> > >
> > >                     byte[] newGPStartKey = getGuidePostsInfoFromRowKey(
> > > tableNameBytes, cfName, result.getRow());
> > >
> > >                     GuidePostsInfoBuilder guidePostsInfoWriter =
> > > guidePostsInfoWriterPerCf.get(cfName);
> > >
> > >                     if (guidePostsInfoWriter == null) {
> > >
> > >                         guidePostsInfoWriter = new
> > GuidePostsInfoBuilder();
> > >
> > >                         guidePostsInfoWriterPerCf.put(cfName,
> > > guidePostsInfoWriter);
> > >
> > >                     }
> > >
> > >                     guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > byteCount, rowCount);
> > >
> > >                 }
> > >
> > >             }
> > >
> > >             if (!guidePostsInfoWriterPerCf.isEmpty()) { return new
> > > PTableStatsImpl(
> > >
> > >                     getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > timeStamp);
> > > }
> > >
> > >         } finally {
> > >
> > >             if (scanner != null) {
> > >
> > >                 scanner.close();
> > >
> > >             }
> > >
> > >         }
> > >
> > >         return PTableStats.EMPTY_STATS;
> > >     }
> > >
> >
>

Re: Question about table stats

Posted by Maryann Xue <ma...@gmail.com>.
Thanks a lot for the answer, James! The data size has well exceeded the
guidepost width and the guideposts do exist but without corresponding
"rowCount" or "byteCount" cell. I'll try doing a Phoenix query instead and
confirm that it is a bug.


Thanks,
Maryann

On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <ja...@apache.org>
wrote:

> Hi Maryann,
> If the amount of data in a region is less than the guidepost width, then
> it's possible you'd get no guideposts for that region. Do you think that's
> the case? If not, it sound like there may be a bug.
>
> Assuming you're querying to get the stats information, I'd recommend doing
> a Phoenix query directly. The code you're emulating uses straight HBase
> APIs because it's called from the server-side. It'd be a one liner as a
> Phoenix query.
>
> Thanks,
> James
>
> On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <ma...@gmail.com>
> wrote:
>
> > Hi,
> >
> > This was something I noticed when applying Phoenix table stats into
> > Calcite-Phoenix cost calculation: When executing the following code (a
> > slightly modified version of the existing StatisticsUtil method) to scan
> > stats table for a specific column-family and a specific start/stop key
> > range, I got guidepost rows that did not contain the rowCount or
> byteCount
> > cell, for all rows in the specified range. Apparently, I had set the
> > corresponding columns in the Scan (as shown below). Meanwhile, another
> > range of stats in the same table gave me the right result. I am wondering
> > if this is an expected behavior or it is a bug?
> >
> >     public static PTableStats readStatistics(HTableInterface statsHTable,
> >
> >             byte[] tableNameBytes, ImmutableBytesPtr cf, byte[] startKey,
> > byte[] stopKey,
> >
> >             long clientTimeStamp)
> >
> >             throws IOException {
> >
> >         ImmutableBytesWritable ptr = new ImmutableBytesWritable();
> >
> >         Scan s;
> >
> >         if (cf == null) {
> >
> >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> >
> >         } else {
> >
> >             s = MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > tableNameBytes, cf, false),
> >
> >                     getAdjustedKey(stopKey, tableNameBytes, cf, true),
> > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> >
> >                     clientTimeStamp);
> >
> >         }
> >
> >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> >
> >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> >
> >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > QueryConstants.EMPTY_COLUMN_BYTES);
> >
> >         ResultScanner scanner = null;
> >
> >         long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> >
> >         TreeMap<byte[], GuidePostsInfoBuilder> guidePostsInfoWriterPerCf
> =
> > new TreeMap<byte[], GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> >
> >         try {
> >
> >             scanner = statsHTable.getScanner(s);
> >
> >             Result result = null;
> >
> >             while ((result = scanner.next()) != null) {
> >
> >                 CellScanner cellScanner = result.cellScanner();
> >
> >                 long rowCount = 0;
> >
> >                 long byteCount = 0;
> >
> >                 byte[] cfName = null;
> >
> >                 int tableNameLength;
> >
> >                 int cfOffset;
> >
> >                 int cfLength;
> >
> >                 boolean valuesSet = false;
> >
> >                 // Only the two cells with quals
> > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be retrieved
> >
> >                 while (cellScanner.advance()) {
> >
> >                     Cell current = cellScanner.current();
> >
> >                     if (!valuesSet) {
> >
> >                         tableNameLength = tableNameBytes.length + 1;
> >
> >                         cfOffset = current.getRowOffset() +
> > tableNameLength;
> >
> >                         cfLength =
> getVarCharLength(current.getRowArray(),
> > cfOffset,
> >
> >                                 current.getRowLength() -
> tableNameLength);
> >
> >                         ptr.set(current.getRowArray(), cfOffset,
> cfLength);
> >
> >                         valuesSet = true;
> >
> >                     }
> >
> >                     cfName = ByteUtil.copyKeyBytesIfNecessary(ptr);
> >
> >                     if (Bytes.equals(current.getQualifierArray(), current
> > .getQualifierOffset(),
> >
> >                             current.getQualifierLength(),
> > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> >
> >                             PhoenixDatabaseMetaData.
> > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> >
> >                         rowCount = PLong.INSTANCE.getCodec().decodeLong(
> > current.getValueArray(),
> >
> >                                 current.getValueOffset(),
> > SortOrder.getDefault());
> >
> >                     } else if (Bytes.equals(current.getQualifierArray(),
> > current.getQualifierOffset(),
> >
> >                             current.getQualifierLength(),
> > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> >
> >
> > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > length)) {
> >
> >                         byteCount = PLong.INSTANCE.getCodec().decodeLong(
> > current.getValueArray(),
> >
> >                                 current.getValueOffset(),
> > SortOrder.getDefault());
> >
> >                     }
> >
> >                     if (current.getTimestamp() > timeStamp) {
> >
> >                         timeStamp = current.getTimestamp();
> >
> >                     }
> >
> >                 }
> >
> >                 if (cfName != null) {
> >
> >                     byte[] newGPStartKey = getGuidePostsInfoFromRowKey(
> > tableNameBytes, cfName, result.getRow());
> >
> >                     GuidePostsInfoBuilder guidePostsInfoWriter =
> > guidePostsInfoWriterPerCf.get(cfName);
> >
> >                     if (guidePostsInfoWriter == null) {
> >
> >                         guidePostsInfoWriter = new
> GuidePostsInfoBuilder();
> >
> >                         guidePostsInfoWriterPerCf.put(cfName,
> > guidePostsInfoWriter);
> >
> >                     }
> >
> >                     guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > byteCount, rowCount);
> >
> >                 }
> >
> >             }
> >
> >             if (!guidePostsInfoWriterPerCf.isEmpty()) { return new
> > PTableStatsImpl(
> >
> >                     getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > timeStamp);
> > }
> >
> >         } finally {
> >
> >             if (scanner != null) {
> >
> >                 scanner.close();
> >
> >             }
> >
> >         }
> >
> >         return PTableStats.EMPTY_STATS;
> >     }
> >
>

Re: Question about table stats

Posted by James Taylor <ja...@apache.org>.
Hi Maryann,
If the amount of data in a region is less than the guidepost width, then
it's possible you'd get no guideposts for that region. Do you think that's
the case? If not, it sound like there may be a bug.

Assuming you're querying to get the stats information, I'd recommend doing
a Phoenix query directly. The code you're emulating uses straight HBase
APIs because it's called from the server-side. It'd be a one liner as a
Phoenix query.

Thanks,
James

On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <ma...@gmail.com> wrote:

> Hi,
>
> This was something I noticed when applying Phoenix table stats into
> Calcite-Phoenix cost calculation: When executing the following code (a
> slightly modified version of the existing StatisticsUtil method) to scan
> stats table for a specific column-family and a specific start/stop key
> range, I got guidepost rows that did not contain the rowCount or byteCount
> cell, for all rows in the specified range. Apparently, I had set the
> corresponding columns in the Scan (as shown below). Meanwhile, another
> range of stats in the same table gave me the right result. I am wondering
> if this is an expected behavior or it is a bug?
>
>     public static PTableStats readStatistics(HTableInterface statsHTable,
>
>             byte[] tableNameBytes, ImmutableBytesPtr cf, byte[] startKey,
> byte[] stopKey,
>
>             long clientTimeStamp)
>
>             throws IOException {
>
>         ImmutableBytesWritable ptr = new ImmutableBytesWritable();
>
>         Scan s;
>
>         if (cf == null) {
>
>             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
>
>         } else {
>
>             s = MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> tableNameBytes, cf, false),
>
>                     getAdjustedKey(stopKey, tableNameBytes, cf, true),
> MetaDataProtocol.MIN_TABLE_TIMESTAMP,
>
>                     clientTimeStamp);
>
>         }
>
>         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
>
>         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
>
>         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> QueryConstants.EMPTY_COLUMN_BYTES);
>
>         ResultScanner scanner = null;
>
>         long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;
>
>         TreeMap<byte[], GuidePostsInfoBuilder> guidePostsInfoWriterPerCf =
> new TreeMap<byte[], GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
>
>         try {
>
>             scanner = statsHTable.getScanner(s);
>
>             Result result = null;
>
>             while ((result = scanner.next()) != null) {
>
>                 CellScanner cellScanner = result.cellScanner();
>
>                 long rowCount = 0;
>
>                 long byteCount = 0;
>
>                 byte[] cfName = null;
>
>                 int tableNameLength;
>
>                 int cfOffset;
>
>                 int cfLength;
>
>                 boolean valuesSet = false;
>
>                 // Only the two cells with quals
> GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be retrieved
>
>                 while (cellScanner.advance()) {
>
>                     Cell current = cellScanner.current();
>
>                     if (!valuesSet) {
>
>                         tableNameLength = tableNameBytes.length + 1;
>
>                         cfOffset = current.getRowOffset() +
> tableNameLength;
>
>                         cfLength = getVarCharLength(current.getRowArray(),
> cfOffset,
>
>                                 current.getRowLength() - tableNameLength);
>
>                         ptr.set(current.getRowArray(), cfOffset, cfLength);
>
>                         valuesSet = true;
>
>                     }
>
>                     cfName = ByteUtil.copyKeyBytesIfNecessary(ptr);
>
>                     if (Bytes.equals(current.getQualifierArray(), current
> .getQualifierOffset(),
>
>                             current.getQualifierLength(),
> PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
>
>                             PhoenixDatabaseMetaData.
> GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
>
>                         rowCount = PLong.INSTANCE.getCodec().decodeLong(
> current.getValueArray(),
>
>                                 current.getValueOffset(),
> SortOrder.getDefault());
>
>                     } else if (Bytes.equals(current.getQualifierArray(),
> current.getQualifierOffset(),
>
>                             current.getQualifierLength(),
> PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
>
>
> PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> length)) {
>
>                         byteCount = PLong.INSTANCE.getCodec().decodeLong(
> current.getValueArray(),
>
>                                 current.getValueOffset(),
> SortOrder.getDefault());
>
>                     }
>
>                     if (current.getTimestamp() > timeStamp) {
>
>                         timeStamp = current.getTimestamp();
>
>                     }
>
>                 }
>
>                 if (cfName != null) {
>
>                     byte[] newGPStartKey = getGuidePostsInfoFromRowKey(
> tableNameBytes, cfName, result.getRow());
>
>                     GuidePostsInfoBuilder guidePostsInfoWriter =
> guidePostsInfoWriterPerCf.get(cfName);
>
>                     if (guidePostsInfoWriter == null) {
>
>                         guidePostsInfoWriter = new GuidePostsInfoBuilder();
>
>                         guidePostsInfoWriterPerCf.put(cfName,
> guidePostsInfoWriter);
>
>                     }
>
>                     guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> byteCount, rowCount);
>
>                 }
>
>             }
>
>             if (!guidePostsInfoWriterPerCf.isEmpty()) { return new
> PTableStatsImpl(
>
>                     getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> timeStamp);
> }
>
>         } finally {
>
>             if (scanner != null) {
>
>                 scanner.close();
>
>             }
>
>         }
>
>         return PTableStats.EMPTY_STATS;
>     }
>