You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by William <yh...@163.com> on 2016/01/07 08:27:27 UTC

Can I do scan.setTImeRange(0, Long.MAX_VALUE) instead of (0, current_server_time) ?

Hi all: 


I am optimizing select performance these days. My scenario is very simple:
  * table schema will never change. If it has to change(only add column is allowed), all connections will be re-opened.
  * simple select and upsert: no join, no order by, no group by, etc.
  * require very high performance: almost the same as Hbase native API. (I've almost done this, now the only problem is the table meta updating)


In this case, I modified QueryCompiler and FromCompiler and now it will update table meta only once. (originally, it will update meta at each select/upsert. Updating meta will take about 0.4ms.)
Then I encountered another problem: it will still update meta in BaseQueryPlan#iterator() which calls context.getCurrentTime() to do so.


I single-stepped it and found that:
  *  in BaseColumnResolver#createTableRef(), since it does not call updateCache()  then timeStamp will remain its original value -1
  *  so in StatementContext#getCurrentTime(), this.getCurrentTable().getTimeStamp() will return -1 then it will call client.getCurrentTime() which updates table meta.
I realized that if it updates meta every time, TableRef object will always has a valid timestamp and StatementContext#getCurrentTime() will never update meta.
For select, TableRef's timestamp comes from 'long currentTime = EnvironmentEdgeManager.currentTimeMillis();' in MetaDataEndpointImpl.java which just returns the hbase region server's current time. 


I guess that phoenix does this to avoid reading the rows inserted AFTER the select is compiled, because NEW rows might have different schema. It that right?


so, here's my question:
In case i never change the table schema, is it safe for me to ignore this 'currentTime' and just use [0, Long.MAX_VALUE] for scan's time range? 


modified code: BaseQueryPlan.java


-William

Re: Can I do scan.setTImeRange(0, Long.MAX_VALUE) instead of (0, current_server_time) ?

Posted by James Taylor <ja...@apache.org>.
Hi William,
Yes, that should be fine. I've made this possible for tables in the system
schema in master and 4.x already for the 4.7.0 release. See commit on
PHOENIX-2519 and UpdateCacheIT.testUpdateCacheForNonTxnSystemTable test. To
complete it, there's not much work left. See PHOENIX-2520. Just need a new
table property that declares how often you want to ping the server for
metadata updates. Would you be up for contributing a patch for that one?

Thanks,
James

On Wed, Jan 6, 2016 at 11:34 PM, William <yh...@163.com> wrote:

> Sorry, the attachment failed.
> PS, i did not call scan.setTimeRange(0, max), but just assign
> Long.MAX_VALUE to scn.
> the modified code :
>
>
>
> // Get the time range of row_timestamp column
>
> TimeRange rowTimestampRange =
> context.getScanRanges().getRowTimestampRange();
>
> // Get the already existing time range on the scan.
>
> TimeRange scanTimeRange = scan.getTimeRange();
>
> Long scn = connection.getSCN();
>
> if (scn == null) {
>
>     //scn = context.getCurrentTime();
>
>     scn = Long.MAX_VALUE;                // I add this line
>
> }
>
> try {
>
>     TimeRange timeRangeToUse =
> ScanUtil.intersectTimeRange(rowTimestampRange, scanTimeRange, scn);
>
>     if (timeRangeToUse == null) {
>
>         return ResultIterator.EMPTY_ITERATOR;
>
>     }
>
>     scan.setTimeRange(timeRangeToUse.getMin(), timeRangeToUse.getMax());
>
> } catch (IOException e) {
>
>     throw new RuntimeException(e);
>
> }
>
>
>
>
> At 2016-01-07 15:27:27, "William" <yh...@163.com> wrote:
>
> Hi all:
>
>
> I am optimizing select performance these days. My scenario is very simple:
>   * table schema will never change. If it has to change(only add column is
> allowed), all connections will be re-opened.
>   * simple select and upsert: no join, no order by, no group by, etc.
>   * require very high performance: almost the same as Hbase native API.
> (I've almost done this, now the only problem is the table meta updating)
>
>
> In this case, I modified QueryCompiler and FromCompiler and now it will
> update table meta only once. (originally, it will update meta at each
> select/upsert. Updating meta will take about 0.4ms.)
> Then I encountered another problem: it will still update meta in
> BaseQueryPlan#iterator() which calls context.getCurrentTime() to do so.
>
>
> I single-stepped it and found that:
>   *  in BaseColumnResolver#createTableRef(), since it does not call
> updateCache()  then timeStamp will remain its original value -1
>   *  so in StatementContext#getCurrentTime(),
> this.getCurrentTable().getTimeStamp() will return -1 then it will call
> client.getCurrentTime() which updates table meta.
> I realized that if it updates meta every time, TableRef object will always
> has a valid timestamp and StatementContext#getCurrentTime() will never
> update meta.
> For select, TableRef's timestamp comes from 'long currentTime =
> EnvironmentEdgeManager.currentTimeMillis();' in MetaDataEndpointImpl.java
> which just returns the hbase region server's current time.
>
>
> I guess that phoenix does this to avoid reading the rows inserted AFTER
> the select is compiled, because NEW rows might have different schema. It
> that right?
>
>
> so, here's my question:
> In case i never change the table schema, is it safe for me to ignore this
> 'currentTime' and just use [0, Long.MAX_VALUE] for scan's time range?
>
>
> modified code: BaseQueryPlan.java
>
>
> -William
>
>
>
>
>

Re:Can I do scan.setTImeRange(0, Long.MAX_VALUE) instead of (0, current_server_time) ?

Posted by William <yh...@163.com>.
Sorry, the attachment failed. 
PS, i did not call scan.setTimeRange(0, max), but just assign Long.MAX_VALUE to scn.
the modified code :



// Get the time range of row_timestamp column

TimeRange rowTimestampRange = context.getScanRanges().getRowTimestampRange();

// Get the already existing time range on the scan.

TimeRange scanTimeRange = scan.getTimeRange();

Long scn = connection.getSCN();

if (scn == null) {

    //scn = context.getCurrentTime();

    scn = Long.MAX_VALUE;                // I add this line

}

try {

    TimeRange timeRangeToUse = ScanUtil.intersectTimeRange(rowTimestampRange, scanTimeRange, scn);

    if (timeRangeToUse == null) {

        return ResultIterator.EMPTY_ITERATOR;

    }

    scan.setTimeRange(timeRangeToUse.getMin(), timeRangeToUse.getMax());

} catch (IOException e) {

    throw new RuntimeException(e);

}




At 2016-01-07 15:27:27, "William" <yh...@163.com> wrote:

Hi all: 


I am optimizing select performance these days. My scenario is very simple:
  * table schema will never change. If it has to change(only add column is allowed), all connections will be re-opened.
  * simple select and upsert: no join, no order by, no group by, etc.
  * require very high performance: almost the same as Hbase native API. (I've almost done this, now the only problem is the table meta updating)


In this case, I modified QueryCompiler and FromCompiler and now it will update table meta only once. (originally, it will update meta at each select/upsert. Updating meta will take about 0.4ms.)
Then I encountered another problem: it will still update meta in BaseQueryPlan#iterator() which calls context.getCurrentTime() to do so.


I single-stepped it and found that:
  *  in BaseColumnResolver#createTableRef(), since it does not call updateCache()  then timeStamp will remain its original value -1
  *  so in StatementContext#getCurrentTime(), this.getCurrentTable().getTimeStamp() will return -1 then it will call client.getCurrentTime() which updates table meta.
I realized that if it updates meta every time, TableRef object will always has a valid timestamp and StatementContext#getCurrentTime() will never update meta.
For select, TableRef's timestamp comes from 'long currentTime = EnvironmentEdgeManager.currentTimeMillis();' in MetaDataEndpointImpl.java which just returns the hbase region server's current time. 


I guess that phoenix does this to avoid reading the rows inserted AFTER the select is compiled, because NEW rows might have different schema. It that right?


so, here's my question:
In case i never change the table schema, is it safe for me to ignore this 'currentTime' and just use [0, Long.MAX_VALUE] for scan's time range? 


modified code: BaseQueryPlan.java


-William