You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by William <yh...@163.com> on 2016/01/07 08:27:27 UTC
Can I do scan.setTImeRange(0, Long.MAX_VALUE) instead of (0,
current_server_time) ?
Hi all:
I am optimizing select performance these days. My scenario is very simple:
* table schema will never change. If it has to change(only add column is allowed), all connections will be re-opened.
* simple select and upsert: no join, no order by, no group by, etc.
* require very high performance: almost the same as Hbase native API. (I've almost done this, now the only problem is the table meta updating)
In this case, I modified QueryCompiler and FromCompiler and now it will update table meta only once. (originally, it will update meta at each select/upsert. Updating meta will take about 0.4ms.)
Then I encountered another problem: it will still update meta in BaseQueryPlan#iterator() which calls context.getCurrentTime() to do so.
I single-stepped it and found that:
* in BaseColumnResolver#createTableRef(), since it does not call updateCache() then timeStamp will remain its original value -1
* so in StatementContext#getCurrentTime(), this.getCurrentTable().getTimeStamp() will return -1 then it will call client.getCurrentTime() which updates table meta.
I realized that if it updates meta every time, TableRef object will always has a valid timestamp and StatementContext#getCurrentTime() will never update meta.
For select, TableRef's timestamp comes from 'long currentTime = EnvironmentEdgeManager.currentTimeMillis();' in MetaDataEndpointImpl.java which just returns the hbase region server's current time.
I guess that phoenix does this to avoid reading the rows inserted AFTER the select is compiled, because NEW rows might have different schema. It that right?
so, here's my question:
In case i never change the table schema, is it safe for me to ignore this 'currentTime' and just use [0, Long.MAX_VALUE] for scan's time range?
modified code: BaseQueryPlan.java
-William
Re: Can I do scan.setTImeRange(0, Long.MAX_VALUE) instead of (0,
current_server_time) ?
Posted by James Taylor <ja...@apache.org>.
Hi William,
Yes, that should be fine. I've made this possible for tables in the system
schema in master and 4.x already for the 4.7.0 release. See commit on
PHOENIX-2519 and UpdateCacheIT.testUpdateCacheForNonTxnSystemTable test. To
complete it, there's not much work left. See PHOENIX-2520. Just need a new
table property that declares how often you want to ping the server for
metadata updates. Would you be up for contributing a patch for that one?
Thanks,
James
On Wed, Jan 6, 2016 at 11:34 PM, William <yh...@163.com> wrote:
> Sorry, the attachment failed.
> PS, i did not call scan.setTimeRange(0, max), but just assign
> Long.MAX_VALUE to scn.
> the modified code :
>
>
>
> // Get the time range of row_timestamp column
>
> TimeRange rowTimestampRange =
> context.getScanRanges().getRowTimestampRange();
>
> // Get the already existing time range on the scan.
>
> TimeRange scanTimeRange = scan.getTimeRange();
>
> Long scn = connection.getSCN();
>
> if (scn == null) {
>
> //scn = context.getCurrentTime();
>
> scn = Long.MAX_VALUE; // I add this line
>
> }
>
> try {
>
> TimeRange timeRangeToUse =
> ScanUtil.intersectTimeRange(rowTimestampRange, scanTimeRange, scn);
>
> if (timeRangeToUse == null) {
>
> return ResultIterator.EMPTY_ITERATOR;
>
> }
>
> scan.setTimeRange(timeRangeToUse.getMin(), timeRangeToUse.getMax());
>
> } catch (IOException e) {
>
> throw new RuntimeException(e);
>
> }
>
>
>
>
> At 2016-01-07 15:27:27, "William" <yh...@163.com> wrote:
>
> Hi all:
>
>
> I am optimizing select performance these days. My scenario is very simple:
> * table schema will never change. If it has to change(only add column is
> allowed), all connections will be re-opened.
> * simple select and upsert: no join, no order by, no group by, etc.
> * require very high performance: almost the same as Hbase native API.
> (I've almost done this, now the only problem is the table meta updating)
>
>
> In this case, I modified QueryCompiler and FromCompiler and now it will
> update table meta only once. (originally, it will update meta at each
> select/upsert. Updating meta will take about 0.4ms.)
> Then I encountered another problem: it will still update meta in
> BaseQueryPlan#iterator() which calls context.getCurrentTime() to do so.
>
>
> I single-stepped it and found that:
> * in BaseColumnResolver#createTableRef(), since it does not call
> updateCache() then timeStamp will remain its original value -1
> * so in StatementContext#getCurrentTime(),
> this.getCurrentTable().getTimeStamp() will return -1 then it will call
> client.getCurrentTime() which updates table meta.
> I realized that if it updates meta every time, TableRef object will always
> has a valid timestamp and StatementContext#getCurrentTime() will never
> update meta.
> For select, TableRef's timestamp comes from 'long currentTime =
> EnvironmentEdgeManager.currentTimeMillis();' in MetaDataEndpointImpl.java
> which just returns the hbase region server's current time.
>
>
> I guess that phoenix does this to avoid reading the rows inserted AFTER
> the select is compiled, because NEW rows might have different schema. It
> that right?
>
>
> so, here's my question:
> In case i never change the table schema, is it safe for me to ignore this
> 'currentTime' and just use [0, Long.MAX_VALUE] for scan's time range?
>
>
> modified code: BaseQueryPlan.java
>
>
> -William
>
>
>
>
>
Re:Can I do scan.setTImeRange(0, Long.MAX_VALUE) instead of (0,
current_server_time) ?
Posted by William <yh...@163.com>.
Sorry, the attachment failed.
PS, i did not call scan.setTimeRange(0, max), but just assign Long.MAX_VALUE to scn.
the modified code :
// Get the time range of row_timestamp column
TimeRange rowTimestampRange = context.getScanRanges().getRowTimestampRange();
// Get the already existing time range on the scan.
TimeRange scanTimeRange = scan.getTimeRange();
Long scn = connection.getSCN();
if (scn == null) {
//scn = context.getCurrentTime();
scn = Long.MAX_VALUE; // I add this line
}
try {
TimeRange timeRangeToUse = ScanUtil.intersectTimeRange(rowTimestampRange, scanTimeRange, scn);
if (timeRangeToUse == null) {
return ResultIterator.EMPTY_ITERATOR;
}
scan.setTimeRange(timeRangeToUse.getMin(), timeRangeToUse.getMax());
} catch (IOException e) {
throw new RuntimeException(e);
}
At 2016-01-07 15:27:27, "William" <yh...@163.com> wrote:
Hi all:
I am optimizing select performance these days. My scenario is very simple:
* table schema will never change. If it has to change(only add column is allowed), all connections will be re-opened.
* simple select and upsert: no join, no order by, no group by, etc.
* require very high performance: almost the same as Hbase native API. (I've almost done this, now the only problem is the table meta updating)
In this case, I modified QueryCompiler and FromCompiler and now it will update table meta only once. (originally, it will update meta at each select/upsert. Updating meta will take about 0.4ms.)
Then I encountered another problem: it will still update meta in BaseQueryPlan#iterator() which calls context.getCurrentTime() to do so.
I single-stepped it and found that:
* in BaseColumnResolver#createTableRef(), since it does not call updateCache() then timeStamp will remain its original value -1
* so in StatementContext#getCurrentTime(), this.getCurrentTable().getTimeStamp() will return -1 then it will call client.getCurrentTime() which updates table meta.
I realized that if it updates meta every time, TableRef object will always has a valid timestamp and StatementContext#getCurrentTime() will never update meta.
For select, TableRef's timestamp comes from 'long currentTime = EnvironmentEdgeManager.currentTimeMillis();' in MetaDataEndpointImpl.java which just returns the hbase region server's current time.
I guess that phoenix does this to avoid reading the rows inserted AFTER the select is compiled, because NEW rows might have different schema. It that right?
so, here's my question:
In case i never change the table schema, is it safe for me to ignore this 'currentTime' and just use [0, Long.MAX_VALUE] for scan's time range?
modified code: BaseQueryPlan.java
-William