You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Roc Chu <ch...@gmail.com> on 2015/12/11 10:32:04 UTC

weird result I got when I try row_timestamp feature

Hi all

I try to use phoenix 4.6 to deal with time series data,

this is my table schema

create table log_data_46201512(
server_timestamp timestamp not null,
app_id varchar,
client_timestamp timestamp,
client_ip INTEGER not null,
cluster_id varchar,
host_id varchar,
device_uuid varchar,
url varchar,
api varchar,
sdk varchar,
device varchar,
market_id varchar,
language varchar,
response_time INTEGER,
response_code INTEGER,
bytes_read INTEGER,
bytes_uploaded INTEGER,
description varchar,
CONSTRAINT  my_pk PRIMARY KEY (
server_timestamp, app_id, client_ip,
cluster_id,host_id,api
)
)salt_buckets=128,compression='snappy'

I inserted some test data into this table.
the server_timestamp in my test data had a few minutes behind the current
time.

after finished the data load process, I did 2 queries,
as below

---------------
0: jdbc:phoenix:ROCVM7501> select count(*) cont, round(server_timestamp,
'MINUTE', 10) min from log_data_46201512 where server_timestamp >
to_date('2015-12-11 06:15:00', 'yyyy-MM-dd HH:mm:ss') and server_timestamp
< to_date('2015-12-11 07:30:00', 'yyyy-MM-dd HH:mm:ss') group by min;

+------------------------------------------+-------------------------+
|                   CONT                   |           MIN           |
+------------------------------------------+-------------------------+
| 790615                                   | 2015-12-11 06:20:00.000 |
| 40                                       | 2015-12-11 06:30:00.000 |
| 45                                       | 2015-12-11 06:40:00.000 |
| 25                                       | 2015-12-11 06:50:00.000 |
| 20                                       | 2015-12-11 07:00:00.000 |
+------------------------------------------+-------------------------+

-------------------------------------

0: jdbc:phoenix:ROCVM7501> select count(*) cont, round(server_timestamp,
'MINUTE', 10) min from log_data_46201512 where server_timestamp >
to_date('2015-12-11 06:15:00', 'yyyy-MM-dd HH:mm:ss') group by min;

+------------------------------------------+-------------------------+
|                   CONT                   |           MIN           |
+------------------------------------------+-------------------------+
| 992508                                   | 2015-12-11 06:20:00.000 |
| 40                                       | 2015-12-11 06:30:00.000 |
| 45                                       | 2015-12-11 06:40:00.000 |
| 25                                       | 2015-12-11 06:50:00.000 |
| 31                                       | 2015-12-11 07:00:00.000 |
| 27                                       | 2015-12-11 07:10:00.000 |
| 35                                       | 2015-12-11 07:20:00.000 |
| 31                                       | 2015-12-11 07:30:00.000 |
| 43                                       | 2015-12-11 07:40:00.000 |
| 7                                        | 2015-12-11 07:50:00.000 |
+------------------------------------------+-------------------------+


I am confused about this. why I got different cont in first row(MIN=2015-12-11
06:20:00.000) in my result ?
did I do something wrong to cause that?

can someone help me out? Thanks in advance.


I use the phoenix-4.6.0-HBase-0.98

Greetings

Re: weird result I got when I try row_timestamp feature

Posted by Roc Chu <ch...@gmail.com>.
sorry for bothering. I did some test today, and find out why it got the
wrong result.
It was my mistake.
I use a process as data loader , and when I upgraded my server to 4.6.0, I
forgot the data loader still use the 4.4.0 phoenix.

4.4.0 phoenix client did not reset timestamp in hbase cell, so when I query
data from phoenix with ROW TIMESTAMP FILTER, it always got wrong result.


On Sat, Dec 12, 2015 at 2:34 AM, Samarth Jain <sa...@apache.org> wrote:

> Hi Roc,
>
> FWIW, looking at your schema, it doesn't look like you are using the
> ROW_TIMESTAMP feature. The constraint part of your DDL needs to be changed
> like this:
>
> CONSTRAINT  my_pk PRIMARY KEY (
> server_timestamp ROW_TIMESTAMP,
> app_id, client_ip,
> cluster_id,host_id,api
> )
>
> For the issue of getting different counts, mind filing a JIRA? It would be
> ideal if you could come up with a minimum set of data needed to reproduce
> this issue.
>
> Thanks,
> Samarth
>
>
>
>
>
>
>
> On Fri, Dec 11, 2015 at 1:32 AM, Roc Chu <ch...@gmail.com> wrote:
>
>> Hi all
>>
>> I try to use phoenix 4.6 to deal with time series data,
>>
>> this is my table schema
>>
>> create table log_data_46201512(
>> server_timestamp timestamp not null,
>> app_id varchar,
>> client_timestamp timestamp,
>> client_ip INTEGER not null,
>> cluster_id varchar,
>> host_id varchar,
>> device_uuid varchar,
>> url varchar,
>> api varchar,
>> sdk varchar,
>> device varchar,
>> market_id varchar,
>> language varchar,
>> response_time INTEGER,
>> response_code INTEGER,
>> bytes_read INTEGER,
>> bytes_uploaded INTEGER,
>> description varchar,
>> CONSTRAINT  my_pk PRIMARY KEY (
>> server_timestamp, app_id, client_ip,
>> cluster_id,host_id,api
>> )
>> )salt_buckets=128,compression='snappy'
>>
>> I inserted some test data into this table.
>> the server_timestamp in my test data had a few minutes behind the current
>> time.
>>
>> after finished the data load process, I did 2 queries,
>> as below
>>
>> ---------------
>> 0: jdbc:phoenix:ROCVM7501> select count(*) cont, round(server_timestamp,
>> 'MINUTE', 10) min from log_data_46201512 where server_timestamp >
>> to_date('2015-12-11 06:15:00', 'yyyy-MM-dd HH:mm:ss') and server_timestamp
>> < to_date('2015-12-11 07:30:00', 'yyyy-MM-dd HH:mm:ss') group by min;
>>
>> +------------------------------------------+-------------------------+
>> |                   CONT                   |           MIN           |
>> +------------------------------------------+-------------------------+
>> | 790615                                   | 2015-12-11 06:20:00.000 |
>> | 40                                       | 2015-12-11 06:30:00.000 |
>> | 45                                       | 2015-12-11 06:40:00.000 |
>> | 25                                       | 2015-12-11 06:50:00.000 |
>> | 20                                       | 2015-12-11 07:00:00.000 |
>> +------------------------------------------+-------------------------+
>>
>> -------------------------------------
>>
>> 0: jdbc:phoenix:ROCVM7501> select count(*) cont, round(server_timestamp,
>> 'MINUTE', 10) min from log_data_46201512 where server_timestamp >
>> to_date('2015-12-11 06:15:00', 'yyyy-MM-dd HH:mm:ss') group by min;
>>
>> +------------------------------------------+-------------------------+
>> |                   CONT                   |           MIN           |
>> +------------------------------------------+-------------------------+
>> | 992508                                   | 2015-12-11 06:20:00.000 |
>> | 40                                       | 2015-12-11 06:30:00.000 |
>> | 45                                       | 2015-12-11 06:40:00.000 |
>> | 25                                       | 2015-12-11 06:50:00.000 |
>> | 31                                       | 2015-12-11 07:00:00.000 |
>> | 27                                       | 2015-12-11 07:10:00.000 |
>> | 35                                       | 2015-12-11 07:20:00.000 |
>> | 31                                       | 2015-12-11 07:30:00.000 |
>> | 43                                       | 2015-12-11 07:40:00.000 |
>> | 7                                        | 2015-12-11 07:50:00.000 |
>> +------------------------------------------+-------------------------+
>>
>>
>> I am confused about this. why I got different cont in first row(MIN=2015-12-11
>> 06:20:00.000) in my result ?
>> did I do something wrong to cause that?
>>
>> can someone help me out? Thanks in advance.
>>
>>
>> I use the phoenix-4.6.0-HBase-0.98
>>
>> Greetings
>>
>>
>

Re: weird result I got when I try row_timestamp feature

Posted by Roc Chu <ch...@gmail.com>.
Thanks for your reply.
In my test table, I used the row_timestamp feature and I lost ROW_TIMESTAMP
key word when I copied schema to this mail,

when I finished data load, I executed that query several times, and got
different cont,
it seemed that the result affected by where the data is, in hbase memstore
or in hfile.
I will do some more test. If I could reproduce with a much less test data,
I will make a issue  in JIRA




On Sat, Dec 12, 2015 at 2:34 AM, Samarth Jain <sa...@apache.org> wrote:

> Hi Roc,
>
> FWIW, looking at your schema, it doesn't look like you are using the
> ROW_TIMESTAMP feature. The constraint part of your DDL needs to be changed
> like this:
>
> CONSTRAINT  my_pk PRIMARY KEY (
> server_timestamp ROW_TIMESTAMP,
> app_id, client_ip,
> cluster_id,host_id,api
> )
>
> For the issue of getting different counts, mind filing a JIRA? It would be
> ideal if you could come up with a minimum set of data needed to reproduce
> this issue.
>
> Thanks,
> Samarth
>
>
>
>
>
>
>
> On Fri, Dec 11, 2015 at 1:32 AM, Roc Chu <ch...@gmail.com> wrote:
>
>> Hi all
>>
>> I try to use phoenix 4.6 to deal with time series data,
>>
>> this is my table schema
>>
>> create table log_data_46201512(
>> server_timestamp timestamp not null,
>> app_id varchar,
>> client_timestamp timestamp,
>> client_ip INTEGER not null,
>> cluster_id varchar,
>> host_id varchar,
>> device_uuid varchar,
>> url varchar,
>> api varchar,
>> sdk varchar,
>> device varchar,
>> market_id varchar,
>> language varchar,
>> response_time INTEGER,
>> response_code INTEGER,
>> bytes_read INTEGER,
>> bytes_uploaded INTEGER,
>> description varchar,
>> CONSTRAINT  my_pk PRIMARY KEY (
>> server_timestamp, app_id, client_ip,
>> cluster_id,host_id,api
>> )
>> )salt_buckets=128,compression='snappy'
>>
>> I inserted some test data into this table.
>> the server_timestamp in my test data had a few minutes behind the current
>> time.
>>
>> after finished the data load process, I did 2 queries,
>> as below
>>
>> ---------------
>> 0: jdbc:phoenix:ROCVM7501> select count(*) cont, round(server_timestamp,
>> 'MINUTE', 10) min from log_data_46201512 where server_timestamp >
>> to_date('2015-12-11 06:15:00', 'yyyy-MM-dd HH:mm:ss') and server_timestamp
>> < to_date('2015-12-11 07:30:00', 'yyyy-MM-dd HH:mm:ss') group by min;
>>
>> +------------------------------------------+-------------------------+
>> |                   CONT                   |           MIN           |
>> +------------------------------------------+-------------------------+
>> | 790615                                   | 2015-12-11 06:20:00.000 |
>> | 40                                       | 2015-12-11 06:30:00.000 |
>> | 45                                       | 2015-12-11 06:40:00.000 |
>> | 25                                       | 2015-12-11 06:50:00.000 |
>> | 20                                       | 2015-12-11 07:00:00.000 |
>> +------------------------------------------+-------------------------+
>>
>> -------------------------------------
>>
>> 0: jdbc:phoenix:ROCVM7501> select count(*) cont, round(server_timestamp,
>> 'MINUTE', 10) min from log_data_46201512 where server_timestamp >
>> to_date('2015-12-11 06:15:00', 'yyyy-MM-dd HH:mm:ss') group by min;
>>
>> +------------------------------------------+-------------------------+
>> |                   CONT                   |           MIN           |
>> +------------------------------------------+-------------------------+
>> | 992508                                   | 2015-12-11 06:20:00.000 |
>> | 40                                       | 2015-12-11 06:30:00.000 |
>> | 45                                       | 2015-12-11 06:40:00.000 |
>> | 25                                       | 2015-12-11 06:50:00.000 |
>> | 31                                       | 2015-12-11 07:00:00.000 |
>> | 27                                       | 2015-12-11 07:10:00.000 |
>> | 35                                       | 2015-12-11 07:20:00.000 |
>> | 31                                       | 2015-12-11 07:30:00.000 |
>> | 43                                       | 2015-12-11 07:40:00.000 |
>> | 7                                        | 2015-12-11 07:50:00.000 |
>> +------------------------------------------+-------------------------+
>>
>>
>> I am confused about this. why I got different cont in first row(MIN=2015-12-11
>> 06:20:00.000) in my result ?
>> did I do something wrong to cause that?
>>
>> can someone help me out? Thanks in advance.
>>
>>
>> I use the phoenix-4.6.0-HBase-0.98
>>
>> Greetings
>>
>>
>

Re: weird result I got when I try row_timestamp feature

Posted by Samarth Jain <sa...@apache.org>.
Hi Roc,

FWIW, looking at your schema, it doesn't look like you are using the
ROW_TIMESTAMP feature. The constraint part of your DDL needs to be changed
like this:

CONSTRAINT  my_pk PRIMARY KEY (
server_timestamp ROW_TIMESTAMP,
app_id, client_ip,
cluster_id,host_id,api
)

For the issue of getting different counts, mind filing a JIRA? It would be
ideal if you could come up with a minimum set of data needed to reproduce
this issue.

Thanks,
Samarth







On Fri, Dec 11, 2015 at 1:32 AM, Roc Chu <ch...@gmail.com> wrote:

> Hi all
>
> I try to use phoenix 4.6 to deal with time series data,
>
> this is my table schema
>
> create table log_data_46201512(
> server_timestamp timestamp not null,
> app_id varchar,
> client_timestamp timestamp,
> client_ip INTEGER not null,
> cluster_id varchar,
> host_id varchar,
> device_uuid varchar,
> url varchar,
> api varchar,
> sdk varchar,
> device varchar,
> market_id varchar,
> language varchar,
> response_time INTEGER,
> response_code INTEGER,
> bytes_read INTEGER,
> bytes_uploaded INTEGER,
> description varchar,
> CONSTRAINT  my_pk PRIMARY KEY (
> server_timestamp, app_id, client_ip,
> cluster_id,host_id,api
> )
> )salt_buckets=128,compression='snappy'
>
> I inserted some test data into this table.
> the server_timestamp in my test data had a few minutes behind the current
> time.
>
> after finished the data load process, I did 2 queries,
> as below
>
> ---------------
> 0: jdbc:phoenix:ROCVM7501> select count(*) cont, round(server_timestamp,
> 'MINUTE', 10) min from log_data_46201512 where server_timestamp >
> to_date('2015-12-11 06:15:00', 'yyyy-MM-dd HH:mm:ss') and server_timestamp
> < to_date('2015-12-11 07:30:00', 'yyyy-MM-dd HH:mm:ss') group by min;
>
> +------------------------------------------+-------------------------+
> |                   CONT                   |           MIN           |
> +------------------------------------------+-------------------------+
> | 790615                                   | 2015-12-11 06:20:00.000 |
> | 40                                       | 2015-12-11 06:30:00.000 |
> | 45                                       | 2015-12-11 06:40:00.000 |
> | 25                                       | 2015-12-11 06:50:00.000 |
> | 20                                       | 2015-12-11 07:00:00.000 |
> +------------------------------------------+-------------------------+
>
> -------------------------------------
>
> 0: jdbc:phoenix:ROCVM7501> select count(*) cont, round(server_timestamp,
> 'MINUTE', 10) min from log_data_46201512 where server_timestamp >
> to_date('2015-12-11 06:15:00', 'yyyy-MM-dd HH:mm:ss') group by min;
>
> +------------------------------------------+-------------------------+
> |                   CONT                   |           MIN           |
> +------------------------------------------+-------------------------+
> | 992508                                   | 2015-12-11 06:20:00.000 |
> | 40                                       | 2015-12-11 06:30:00.000 |
> | 45                                       | 2015-12-11 06:40:00.000 |
> | 25                                       | 2015-12-11 06:50:00.000 |
> | 31                                       | 2015-12-11 07:00:00.000 |
> | 27                                       | 2015-12-11 07:10:00.000 |
> | 35                                       | 2015-12-11 07:20:00.000 |
> | 31                                       | 2015-12-11 07:30:00.000 |
> | 43                                       | 2015-12-11 07:40:00.000 |
> | 7                                        | 2015-12-11 07:50:00.000 |
> +------------------------------------------+-------------------------+
>
>
> I am confused about this. why I got different cont in first row(MIN=2015-12-11
> 06:20:00.000) in my result ?
> did I do something wrong to cause that?
>
> can someone help me out? Thanks in advance.
>
>
> I use the phoenix-4.6.0-HBase-0.98
>
> Greetings
>
>