You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2015/09/18 00:00:05 UTC

[jira] [Commented] (PHOENIX-914) Native HBase timestamp support to optimize date range queries in Phoenix

    [ https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804607#comment-14804607 ] 

James Taylor commented on PHOENIX-914:
--------------------------------------

Patch is looking good. It'd be good to add some tests around a DESC ROW_TIMESTAMP column. Also, make sure to allow UNSIGNED DATE/TIME types as well for ROW_TIMESTAMP.

I think the following code in UpsertCompiler isn't quite right, as you'd want to pass in column.getSortOrder() instead of SortOrder.ASC, as this is the sort order that the column is currently in.
{code}
             PColumn column = table.getColumns().get(columnIndexes[i]);
             if (SchemaUtil.isPKColumn(column)) {
                 pkValues[pkSlotIndex[i]] = value;
+                if (SchemaUtil.getPKPosition(table, column) == table.getRowTimestampColPos()) {
+                    // TODO: check what to do with the sort order here. Are the bytes already inverted if DESC? Do we need to de-invert them again?
+                    // Should we do something like this:
+                    // byte[] tempByteValue = Arrays.copyOf(byteValue, byteValue.length);
+                    // byteValue = SortOrder.invert(byteValue, 0, tempByteValue, 0, byteValue.length);
+                    rowTimestamp = PLong.INSTANCE.getCodec().decodeLong(value, 0, SortOrder.ASC);
+                }
{code}

This function can be simpler:
{code}
+    private static LiteralParseNode getNodeForRowTimestampColumn(PColumn col) {
+        // TODO: what should we really be adding for the long value? 
+        // Might be better to re-factor this constant so it can be used in mutation state as well
+        // to check whether the value is really set or not.
+        if (col.getDataType() == PLong.INSTANCE) {
+            return new LiteralParseNode(-1, PLong.INSTANCE);
+        }
+        if (col.getDataType() == PDate.INSTANCE) {
+            return new LiteralParseNode(new Date(-1), PDate.INSTANCE);
+        }
+        if (col.getDataType() == PTimestamp.INSTANCE) {
+            return new LiteralParseNode(new Timestamp(-1), PTimestamp.INSTANCE);
+        }
+        if (col.getDataType() == PTime.INSTANCE) {
+            return new LiteralParseNode(new Time(-1), PTime.INSTANCE);
+        }
+        throw new IllegalArgumentException();
+    }
+    
{code}
Can be something like this instead:
{code}
    PDataType type = col.getDataType();
    if (type == PLong.INSTANCE || type.isCoercibleTo(PTimestamp.INSTANCE)) {
        return new LiteralParseNode(-1L, type);
    }
    throw new IllegalArgumentException();
{code}

For UPSERT SELECT, how about passing the rowTimestamp column PK slot position instead of a boolean as an indicator that we're using this optimization, as then we don't need to re-search for it again?

Don't forget the upgrade code in ConnectionQueryServicesImpl for the new IS_ROW_TIMESTAMP_INDEX column.

Minor nit: remove commented out code prior to final patch.



> Native HBase timestamp support to optimize date range queries in Phoenix 
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-914
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-914
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Samarth Jain
>         Attachments: PHOENIX-914.patch, wip.patch
>
>
> For many applications one of the column of a table can be (and must be) naturally mapped 
> to HBase timestamp. What it gives us is the optimization on StoreScanner where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of data (EVENTS) and custom compaction, where we create 
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
>     METRICID  VARCHAR NOT NULL,
>     METRICNAME VARCHAR,
>     SERVICENAME VARCHAR NOT NULL,
>     ORIGIN VARCHAR NOT NULL,
>     APPID VARCHAR,
>     IPID VARCHAR,
>     NVALUE DOUBLE,
>     TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
>     DATA VARCHAR,
>     SVALUE VARCHAR
>     CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the Hint to Phoenix that the column
> TIME must be mapped to HBase timestamp. 
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps  TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan timerange 
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view because for every INSERT/UPSERT 
> new Connection object and new Statement is created, beside this we still need the optimization 2. (see above). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)