You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Marek Wiewiorka <ma...@gmail.com> on 2015/04/06 18:07:58 UTC

group by problem

Hi All,
I came across a weird situation while running a query with group by.
I executed 2 queries:

1)
select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;

| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.45599999999999996
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 2.167
     |


2)

select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;

| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 8.405999999999999
     |


The only difference is that in the first case I included 2 ids(4720,6854)
and in the other one only 1 (4720).
The result for hu_ho_id=4720 should be the same in both case but it isn't
(0.45 vs 8.4).
The second result(8.4) is correct.

Am I doing something wrong?

Regards,
Marek

Re: group by problem

Posted by Marek Wiewiorka <ma...@gmail.com>.

Hi James - I'm sorry for delay I had to troubleshoot some other more urgent
issue.
I will try to open JIRA with a sample CSV today.

Thanks!
Marek


2015-04-08 23:08 GMT+02:00 James Taylor <ja...@apache.org>:

> Any JIRA yet, Marek? It'd be good to get to the bottom of this. Your
> work around is not going to perform nearly as well as using TRUNC on
> the date.
> Thanks,
> James
>
> On Tue, Apr 7, 2015 at 8:53 AM, James Taylor <ja...@apache.org>
> wrote:
> > Yes, please open a JIRA and attach that CSV (or ideally the smallest
> subset
> > that exhibits the problem).
> > Thanks,
> > James
> >
> >
> > On Tuesday, April 7, 2015, Marek Wiewiorka <ma...@gmail.com>
> > wrote:
> >>
> >> Hi James - shall I still open a JIRA for that?
> >> Thanks!
> >> Marek
> >>
> >> 2015-04-06 22:48 GMT+02:00 Marek Wiewiorka <ma...@gmail.com>:
> >>>
> >>> psql from a csv file:
> >>> ./psql.py dwh:2181:/hbase-unsecure -t SE_DWH.HOMES_USAGE_HOUR
> >>> /mnt/spark/export/usage_convert.txt/usage_merged.csv
> >>>
> >>> Here is a sample:
> >>> 2015-03-19 23:59:59,6854,0,2.167
> >>> 2015-03-19 22:59:59,6854,0,2.421
> >>> 2015-03-19 21:59:59,6854,0,2.738
> >>> 2015-03-19 20:59:59,6854,0,0.9490000000000001
> >>> 2015-03-19 19:59:59,6854,0,0.748
> >>> 2015-03-19 18:59:59,6854,0,2.76
> >>> 2015-03-19 17:59:59,6854,0,1.801
> >>> 2015-03-19 16:59:59,6854,0,0.661
> >>> 2015-03-19 15:59:59,6854,0,1.082
> >>> 2015-03-19 14:59:59,6854,0,1.303
> >>>
> >>>
> >>> M.
> >>>
> >>> 2015-04-06 22:38 GMT+02:00 James Taylor <ja...@apache.org>:
> >>>>
> >>>> How did you input the data?
> >>>>
> >>>> On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka
> >>>> <ma...@gmail.com> wrote:
> >>>> > Oh I'm sorry I forgot to attach them:
> >>>> > DDL of my table:
> >>>> > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id
> >>>> > integer
> >>>> > not null ,hu_stream_id integer not null, hu_usage double constraint
> pk
> >>>> > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) );
> >>>> >
> >>>> > Phoenix: 4.3.0
> >>>> >
> >>>> > Thanks,
> >>>> > Marek
> >>>> >
> >>>> >
> >>>> > 2015-04-06 22:25 GMT+02:00 James Taylor <ja...@apache.org>:
> >>>> >>
> >>>> >> Hi Marek,
> >>>> >> How did you input the data and what does your CREATE TABLE/VIEW
> >>>> >> statement look like? What version of Phoenix and HBase are you
> using?
> >>>> >> Thanks,
> >>>> >> James
> >>>> >>
> >>>> >> On Monday, April 6, 2015, Marek Wiewiorka <
> marek.wiewiorka@gmail.com>
> >>>> >> wrote:
> >>>> >>>
> >>>> >>> Hi James - sure here is the result of your query (limited to 5
> >>>> >>> rows):
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>> |              TO_CHAR(HU_TS)              |
> >>>> >>> HU_HO_ID
> >>>> >>> |               HU_STREAM_ID               |
> >>>> >>> HU_USAGE
> >>>> >>> |
> >>>> >>>
> >>>> >>>
> >>>> >>>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>> | 2015-03-19 00:59:59.000                  | 4720
> >>>> >>> | 0                                        | 0.287
> >>>> >>> |
> >>>> >>> | 2015-03-19 00:59:59.000                  | 6854
> >>>> >>> | 0                                        | 3.6189999999999998
> >>>> >>> |
> >>>> >>> | 2015-03-19 01:59:59.000                  | 4720
> >>>> >>> | 0                                        | 0.323
> >>>> >>> |
> >>>> >>> | 2015-03-19 01:59:59.000                  | 6854
> >>>> >>> | 0                                        | 2.556
> >>>> >>> |
> >>>> >>> | 2015-03-19 02:59:59.000                  | 4720
> >>>> >>> | 0                                        | 0.37
> >>>> >>> |
> >>>> >>>
> >>>> >>>
> >>>> >>>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>
> >>>> >>> Ok - I will try to prepare a sample csv with DDL/DML.
> >>>> >>>
> >>>> >>> Thanks again,
> >>>> >>> Marek
> >>>> >>>
> >>>> >>>
> >>>> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <ja...@apache.org>:
> >>>> >>>>
> >>>> >>>> Hi Marek,
> >>>> >>>> Thanks for the additional information. If you could answer my
> >>>> >>>> earlier
> >>>> >>>> questions, that would be helpful.
> >>>> >>>>
> >>>> >>>> If you can't repro with a simple test case, then how about
> >>>> >>>> attaching a
> >>>> >>>> csv dump of some of your data (the smallest amount of data that
> >>>> >>>> repros
> >>>> >>>> the issue) to a JIRA along with the CREATE TABLE statement and
> the
> >>>> >>>> query?
> >>>> >>>>
> >>>> >>>> Thanks,
> >>>> >>>> James
> >>>> >>>>
> >>>> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
> >>>> >>>> <ma...@gmail.com> wrote:
> >>>> >>>> > Hi Guys - thanks for your messages.
> >>>> >>>> >
> >>>> >>>> > I did another round of testing I found that if I use to_char
> >>>> >>>> > instead
> >>>> >>>> > of
> >>>> >>>> > trunc function I'm getting the expected result:
> >>>> >>>> >
> >>>> >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage
> >>>> >>>> > where
> >>>> >>>> > id in
> >>>> >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> > |               TO_CHAR(DT)                |
> >>>> >>>> > ID
> >>>> >>>> > |                  STREAM                  |
> >>>> >>>> > SUM(USAGE)
> >>>> >>>> > |
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> > | 2015-03-19                               | 4720
> >>>> >>>> > | 0                                        | 8.405999999999999
> >>>> >>>> > |
> >>>> >>>> > | 2015-03-19                               | 6854
> >>>> >>>> > | 0                                        | 28.339000000000006
> >>>> >>>> > |
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> >
> >>>> >>>> > the same query with trunc returns 48 rows (24 rows per id)
> >>>> >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage
> >>>> >>>> > where id
> >>>> >>>> > in
> >>>> >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
> >>>> >>>> > e.g.:
> >>>> >>>> > ....
> >>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 0.406                                    |
> >>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 0.9490000000000001                       |
> >>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 0.332                                    |
> >>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 2.738                                    |
> >>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 0.33499999999999996                      |
> >>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 2.421                                    |
> >>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 0.45599999999999996                      |
> >>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 2.167                                    |
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> >
> >>>> >>>> > so for some reason grouping by trunc is not working...but using
> >>>> >>>> > to_char is.
> >>>> >>>> > This applies for table loaded using psql from csv file.
> >>>> >>>> >
> >>>> >>>> > When tried to create a sample table and populate it with
> upserts
> >>>> >>>> > everything
> >>>> >>>> > worked as expected in both cases:
> >>>> >>>> >
> >>>> >>>> > create table usage (dt time not null ,id integer not
> null,stream
> >>>> >>>> > integer not
> >>>> >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
> >>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
> >>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
> >>>> >>>> >
> >>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
> >>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
> >>>> >>>> >
> >>>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
> >>>> >>>> > trunc(dt,'DAY')
> >>>> >>>> > as
> >>>> >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group
> >>>> >>>> > by
> >>>> >>>> > trunc(dt,'DAY'),stream,id;
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> > |           DAY           |                    ID
> >>>> >>>> > |
> >>>> >>>> > STREAM                  |                SUM(USAGE)
> >>>> >>>> > |
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> > | 2015-04-01 00:00:00.000 | 100
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 5.0                                      |
> >>>> >>>> > | 2015-04-01 00:00:00.000 | 200
> >>>> >>>> > |
> >>>> >>>> > 0
> >>>> >>>> > | 10.0                                     |
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> >
> >>>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
> >>>> >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where
> id
> >>>> >>>> > in
> >>>> >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> > |               TO_CHAR(DT)                |
> >>>> >>>> > ID
> >>>> >>>> > |                  STREAM                  |
> >>>> >>>> > SUM(USAGE)
> >>>> >>>> > |
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> > | 2015-04-01                               | 100
> >>>> >>>> > | 0                                        | 5.0
> >>>> >>>> > |
> >>>> >>>> > | 2015-04-01                               | 200
> >>>> >>>> > | 0                                        | 10.0
> >>>> >>>> > |
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >>>> > 2 rows selected (1.49 seconds)
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> > Shall I open a jira for that?
> >>>> >>>> >
> >>>> >>>> > Regards,
> >>>> >>>> > Marek
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <
> jamestaylor@apache.org>:
> >>>> >>>> >>
> >>>> >>>> >> Hi Marek,
> >>>> >>>> >> How did you input the data and what does your CREATE
> TABLE/VIEW
> >>>> >>>> >> statement look like? What version of Phoenix and HBase are you
> >>>> >>>> >> using?
> >>>> >>>> >>
> >>>> >>>> >> Also, would you mind running the following query and letting
> us
> >>>> >>>> >> know
> >>>> >>>> >> the
> >>>> >>>> >> output?
> >>>> >>>> >>
> >>>> >>>> >> select to_char(hu_ts,'yyyy-MM-dd
> >>>> >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
> >>>> >>>> >> from se_dwh.homes_usage_hour
> >>>> >>>> >> where hu_ho_id in (4720,6854);
> >>>> >>>> >>
> >>>> >>>> >> Thanks,
> >>>> >>>> >> James
> >>>> >>>> >>
> >>>> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid
> >>>> >>>> >> <ga...@gmail.com>
> >>>> >>>> >> wrote:
> >>>> >>>> >> > That certainly looks like a bug. Would it be possible to
> make
> >>>> >>>> >> > a
> >>>> >>>> >> > small
> >>>> >>>> >> > reproducible test case and if possible, log this in the
> >>>> >>>> >> > Phoenix
> >>>> >>>> >> > JIRA
> >>>> >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
> >>>> >>>> >> >
> >>>> >>>> >> > - Gabriel
> >>>> >>>> >> >
> >>>> >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
> >>>> >>>> >> > <ma...@gmail.com>
> >>>> >>>> >> > wrote:
> >>>> >>>> >> >>
> >>>> >>>> >> >> Hi All,
> >>>> >>>> >> >> I came across a weird situation while running a query with
> >>>> >>>> >> >> group
> >>>> >>>> >> >> by.
> >>>> >>>> >> >> I executed 2 queries:
> >>>> >>>> >> >>
> >>>> >>>> >> >> 1)
> >>>> >>>> >> >> select
> >>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
> >>>> >>>> >> >> from
> >>>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group
> >>>> >>>> >> >> by
> >>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >>>> >>>> >> >>
> >>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
> >>>> >>>> >> >> |
> >>>> >>>> >> >> 0
> >>>> >>>> >> >> | 0.45599999999999996                      |
> >>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 6854
> >>>> >>>> >> >> |
> >>>> >>>> >> >> 0
> >>>> >>>> >> >> | 2.167                                    |
> >>>> >>>> >> >>
> >>>> >>>> >> >>
> >>>> >>>> >> >> 2)
> >>>> >>>> >> >>
> >>>> >>>> >> >> select
> >>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
> >>>> >>>> >> >> from
> >>>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
> >>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >>>> >>>> >> >>
> >>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
> >>>> >>>> >> >> |
> >>>> >>>> >> >> 0
> >>>> >>>> >> >> | 8.405999999999999                        |
> >>>> >>>> >> >>
> >>>> >>>> >> >>
> >>>> >>>> >> >> The only difference is that in the first case I included 2
> >>>> >>>> >> >> ids(4720,6854)
> >>>> >>>> >> >> and in the other one only 1 (4720).
> >>>> >>>> >> >> The result for hu_ho_id=4720 should be the same in both
> case
> >>>> >>>> >> >> but
> >>>> >>>> >> >> it
> >>>> >>>> >> >> isn't
> >>>> >>>> >> >> (0.45 vs 8.4).
> >>>> >>>> >> >> The second result(8.4) is correct.
> >>>> >>>> >> >>
> >>>> >>>> >> >> Am I doing something wrong?
> >>>> >>>> >> >>
> >>>> >>>> >> >> Regards,
> >>>> >>>> >> >> Marek
> >>>> >>>> >> >>
> >>>> >>>> >> >
> >>>> >>>> >
> >>>> >>>> >
> >>>> >>>
> >>>> >>>
> >>>> >
> >>>
> >>>
> >>
> >
>

Re: group by problem

Posted by James Taylor <ja...@apache.org>.

Any JIRA yet, Marek? It'd be good to get to the bottom of this. Your
work around is not going to perform nearly as well as using TRUNC on
the date.
Thanks,
James

On Tue, Apr 7, 2015 at 8:53 AM, James Taylor <ja...@apache.org> wrote:
> Yes, please open a JIRA and attach that CSV (or ideally the smallest subset
> that exhibits the problem).
> Thanks,
> James
>
>
> On Tuesday, April 7, 2015, Marek Wiewiorka <ma...@gmail.com>
> wrote:
>>
>> Hi James - shall I still open a JIRA for that?
>> Thanks!
>> Marek
>>
>> 2015-04-06 22:48 GMT+02:00 Marek Wiewiorka <ma...@gmail.com>:
>>>
>>> psql from a csv file:
>>> ./psql.py dwh:2181:/hbase-unsecure -t SE_DWH.HOMES_USAGE_HOUR
>>> /mnt/spark/export/usage_convert.txt/usage_merged.csv
>>>
>>> Here is a sample:
>>> 2015-03-19 23:59:59,6854,0,2.167
>>> 2015-03-19 22:59:59,6854,0,2.421
>>> 2015-03-19 21:59:59,6854,0,2.738
>>> 2015-03-19 20:59:59,6854,0,0.9490000000000001
>>> 2015-03-19 19:59:59,6854,0,0.748
>>> 2015-03-19 18:59:59,6854,0,2.76
>>> 2015-03-19 17:59:59,6854,0,1.801
>>> 2015-03-19 16:59:59,6854,0,0.661
>>> 2015-03-19 15:59:59,6854,0,1.082
>>> 2015-03-19 14:59:59,6854,0,1.303
>>>
>>>
>>> M.
>>>
>>> 2015-04-06 22:38 GMT+02:00 James Taylor <ja...@apache.org>:
>>>>
>>>> How did you input the data?
>>>>
>>>> On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka
>>>> <ma...@gmail.com> wrote:
>>>> > Oh I'm sorry I forgot to attach them:
>>>> > DDL of my table:
>>>> > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id
>>>> > integer
>>>> > not null ,hu_stream_id integer not null, hu_usage double constraint pk
>>>> > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) );
>>>> >
>>>> > Phoenix: 4.3.0
>>>> >
>>>> > Thanks,
>>>> > Marek
>>>> >
>>>> >
>>>> > 2015-04-06 22:25 GMT+02:00 James Taylor <ja...@apache.org>:
>>>> >>
>>>> >> Hi Marek,
>>>> >> How did you input the data and what does your CREATE TABLE/VIEW
>>>> >> statement look like? What version of Phoenix and HBase are you using?
>>>> >> Thanks,
>>>> >> James
>>>> >>
>>>> >> On Monday, April 6, 2015, Marek Wiewiorka <ma...@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi James - sure here is the result of your query (limited to 5
>>>> >>> rows):
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>> |              TO_CHAR(HU_TS)              |
>>>> >>> HU_HO_ID
>>>> >>> |               HU_STREAM_ID               |
>>>> >>> HU_USAGE
>>>> >>> |
>>>> >>>
>>>> >>>
>>>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>> | 2015-03-19 00:59:59.000                  | 4720
>>>> >>> | 0                                        | 0.287
>>>> >>> |
>>>> >>> | 2015-03-19 00:59:59.000                  | 6854
>>>> >>> | 0                                        | 3.6189999999999998
>>>> >>> |
>>>> >>> | 2015-03-19 01:59:59.000                  | 4720
>>>> >>> | 0                                        | 0.323
>>>> >>> |
>>>> >>> | 2015-03-19 01:59:59.000                  | 6854
>>>> >>> | 0                                        | 2.556
>>>> >>> |
>>>> >>> | 2015-03-19 02:59:59.000                  | 4720
>>>> >>> | 0                                        | 0.37
>>>> >>> |
>>>> >>>
>>>> >>>
>>>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>
>>>> >>> Ok - I will try to prepare a sample csv with DDL/DML.
>>>> >>>
>>>> >>> Thanks again,
>>>> >>> Marek
>>>> >>>
>>>> >>>
>>>> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <ja...@apache.org>:
>>>> >>>>
>>>> >>>> Hi Marek,
>>>> >>>> Thanks for the additional information. If you could answer my
>>>> >>>> earlier
>>>> >>>> questions, that would be helpful.
>>>> >>>>
>>>> >>>> If you can't repro with a simple test case, then how about
>>>> >>>> attaching a
>>>> >>>> csv dump of some of your data (the smallest amount of data that
>>>> >>>> repros
>>>> >>>> the issue) to a JIRA along with the CREATE TABLE statement and the
>>>> >>>> query?
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> James
>>>> >>>>
>>>> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
>>>> >>>> <ma...@gmail.com> wrote:
>>>> >>>> > Hi Guys - thanks for your messages.
>>>> >>>> >
>>>> >>>> > I did another round of testing I found that if I use to_char
>>>> >>>> > instead
>>>> >>>> > of
>>>> >>>> > trunc function I'm getting the expected result:
>>>> >>>> >
>>>> >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage
>>>> >>>> > where
>>>> >>>> > id in
>>>> >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> > |               TO_CHAR(DT)                |
>>>> >>>> > ID
>>>> >>>> > |                  STREAM                  |
>>>> >>>> > SUM(USAGE)
>>>> >>>> > |
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> > | 2015-03-19                               | 4720
>>>> >>>> > | 0                                        | 8.405999999999999
>>>> >>>> > |
>>>> >>>> > | 2015-03-19                               | 6854
>>>> >>>> > | 0                                        | 28.339000000000006
>>>> >>>> > |
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> >
>>>> >>>> > the same query with trunc returns 48 rows (24 rows per id)
>>>> >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage
>>>> >>>> > where id
>>>> >>>> > in
>>>> >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
>>>> >>>> > e.g.:
>>>> >>>> > ....
>>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 0.406                                    |
>>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 0.9490000000000001                       |
>>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 0.332                                    |
>>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 2.738                                    |
>>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 0.33499999999999996                      |
>>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 2.421                                    |
>>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 0.45599999999999996                      |
>>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 2.167                                    |
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> >
>>>> >>>> > so for some reason grouping by trunc is not working...but using
>>>> >>>> > to_char is.
>>>> >>>> > This applies for table loaded using psql from csv file.
>>>> >>>> >
>>>> >>>> > When tried to create a sample table and populate it with upserts
>>>> >>>> > everything
>>>> >>>> > worked as expected in both cases:
>>>> >>>> >
>>>> >>>> > create table usage (dt time not null ,id integer not null,stream
>>>> >>>> > integer not
>>>> >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
>>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
>>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
>>>> >>>> >
>>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
>>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
>>>> >>>> >
>>>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>>>> >>>> > trunc(dt,'DAY')
>>>> >>>> > as
>>>> >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group
>>>> >>>> > by
>>>> >>>> > trunc(dt,'DAY'),stream,id;
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> > |           DAY           |                    ID
>>>> >>>> > |
>>>> >>>> > STREAM                  |                SUM(USAGE)
>>>> >>>> > |
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> > | 2015-04-01 00:00:00.000 | 100
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 5.0                                      |
>>>> >>>> > | 2015-04-01 00:00:00.000 | 200
>>>> >>>> > |
>>>> >>>> > 0
>>>> >>>> > | 10.0                                     |
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> >
>>>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>>>> >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id
>>>> >>>> > in
>>>> >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> > |               TO_CHAR(DT)                |
>>>> >>>> > ID
>>>> >>>> > |                  STREAM                  |
>>>> >>>> > SUM(USAGE)
>>>> >>>> > |
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> > | 2015-04-01                               | 100
>>>> >>>> > | 0                                        | 5.0
>>>> >>>> > |
>>>> >>>> > | 2015-04-01                               | 200
>>>> >>>> > | 0                                        | 10.0
>>>> >>>> > |
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >>>> > 2 rows selected (1.49 seconds)
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > Shall I open a jira for that?
>>>> >>>> >
>>>> >>>> > Regards,
>>>> >>>> > Marek
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <ja...@apache.org>:
>>>> >>>> >>
>>>> >>>> >> Hi Marek,
>>>> >>>> >> How did you input the data and what does your CREATE TABLE/VIEW
>>>> >>>> >> statement look like? What version of Phoenix and HBase are you
>>>> >>>> >> using?
>>>> >>>> >>
>>>> >>>> >> Also, would you mind running the following query and letting us
>>>> >>>> >> know
>>>> >>>> >> the
>>>> >>>> >> output?
>>>> >>>> >>
>>>> >>>> >> select to_char(hu_ts,'yyyy-MM-dd
>>>> >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
>>>> >>>> >> from se_dwh.homes_usage_hour
>>>> >>>> >> where hu_ho_id in (4720,6854);
>>>> >>>> >>
>>>> >>>> >> Thanks,
>>>> >>>> >> James
>>>> >>>> >>
>>>> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid
>>>> >>>> >> <ga...@gmail.com>
>>>> >>>> >> wrote:
>>>> >>>> >> > That certainly looks like a bug. Would it be possible to make
>>>> >>>> >> > a
>>>> >>>> >> > small
>>>> >>>> >> > reproducible test case and if possible, log this in the
>>>> >>>> >> > Phoenix
>>>> >>>> >> > JIRA
>>>> >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
>>>> >>>> >> >
>>>> >>>> >> > - Gabriel
>>>> >>>> >> >
>>>> >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
>>>> >>>> >> > <ma...@gmail.com>
>>>> >>>> >> > wrote:
>>>> >>>> >> >>
>>>> >>>> >> >> Hi All,
>>>> >>>> >> >> I came across a weird situation while running a query with
>>>> >>>> >> >> group
>>>> >>>> >> >> by.
>>>> >>>> >> >> I executed 2 queries:
>>>> >>>> >> >>
>>>> >>>> >> >> 1)
>>>> >>>> >> >> select
>>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>>>> >>>> >> >> from
>>>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group
>>>> >>>> >> >> by
>>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>>> >>>> >> >>
>>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>>> >>>> >> >> |
>>>> >>>> >> >> 0
>>>> >>>> >> >> | 0.45599999999999996                      |
>>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 6854
>>>> >>>> >> >> |
>>>> >>>> >> >> 0
>>>> >>>> >> >> | 2.167                                    |
>>>> >>>> >> >>
>>>> >>>> >> >>
>>>> >>>> >> >> 2)
>>>> >>>> >> >>
>>>> >>>> >> >> select
>>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>>>> >>>> >> >> from
>>>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>>> >>>> >> >>
>>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>>> >>>> >> >> |
>>>> >>>> >> >> 0
>>>> >>>> >> >> | 8.405999999999999                        |
>>>> >>>> >> >>
>>>> >>>> >> >>
>>>> >>>> >> >> The only difference is that in the first case I included 2
>>>> >>>> >> >> ids(4720,6854)
>>>> >>>> >> >> and in the other one only 1 (4720).
>>>> >>>> >> >> The result for hu_ho_id=4720 should be the same in both case
>>>> >>>> >> >> but
>>>> >>>> >> >> it
>>>> >>>> >> >> isn't
>>>> >>>> >> >> (0.45 vs 8.4).
>>>> >>>> >> >> The second result(8.4) is correct.
>>>> >>>> >> >>
>>>> >>>> >> >> Am I doing something wrong?
>>>> >>>> >> >>
>>>> >>>> >> >> Regards,
>>>> >>>> >> >> Marek
>>>> >>>> >> >>
>>>> >>>> >> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>
>>>> >>>
>>>> >
>>>
>>>
>>
>

Re: group by problem

Posted by James Taylor <ja...@apache.org>.

Yes, please open a JIRA and attach that CSV (or ideally the smallest subset
that exhibits the problem).
Thanks,
James

On Tuesday, April 7, 2015, Marek Wiewiorka <ma...@gmail.com>
wrote:

> Hi James - shall I still open a JIRA for that?
> Thanks!
> Marek
>
> 2015-04-06 22:48 GMT+02:00 Marek Wiewiorka <marek.wiewiorka@gmail.com
> <javascript:_e(%7B%7D,'cvml','marek.wiewiorka@gmail.com');>>:
>
>> psql from a csv file:
>> ./psql.py dwh:2181:/hbase-unsecure -t SE_DWH.HOMES_USAGE_HOUR
>> /mnt/spark/export/usage_convert.txt/usage_merged.csv
>>
>> Here is a sample:
>> 2015-03-19 23:59:59,6854,0,2.167
>> 2015-03-19 22:59:59,6854,0,2.421
>> 2015-03-19 21:59:59,6854,0,2.738
>> 2015-03-19 20:59:59,6854,0,0.9490000000000001
>> 2015-03-19 19:59:59,6854,0,0.748
>> 2015-03-19 18:59:59,6854,0,2.76
>> 2015-03-19 17:59:59,6854,0,1.801
>> 2015-03-19 16:59:59,6854,0,0.661
>> 2015-03-19 15:59:59,6854,0,1.082
>> 2015-03-19 14:59:59,6854,0,1.303
>>
>>
>> M.
>>
>> 2015-04-06 22:38 GMT+02:00 James Taylor <jamestaylor@apache.org
>> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>>:
>>
>>> How did you input the data?
>>>
>>> On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka
>>> <marek.wiewiorka@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','marek.wiewiorka@gmail.com');>> wrote:
>>> > Oh I'm sorry I forgot to attach them:
>>> > DDL of my table:
>>> > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id
>>> integer
>>> > not null ,hu_stream_id integer not null, hu_usage double constraint pk
>>> > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) );
>>> >
>>> > Phoenix: 4.3.0
>>> >
>>> > Thanks,
>>> > Marek
>>> >
>>> >
>>> > 2015-04-06 22:25 GMT+02:00 James Taylor <jamestaylor@apache.org
>>> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>>:
>>> >>
>>> >> Hi Marek,
>>> >> How did you input the data and what does your CREATE TABLE/VIEW
>>> >> statement look like? What version of Phoenix and HBase are you using?
>>> >> Thanks,
>>> >> James
>>> >>
>>> >> On Monday, April 6, 2015, Marek Wiewiorka <marek.wiewiorka@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','marek.wiewiorka@gmail.com');>>
>>> >> wrote:
>>> >>>
>>> >>> Hi James - sure here is the result of your query (limited to 5 rows):
>>> >>>
>>> >>>
>>> >>>
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>> |              TO_CHAR(HU_TS)              |                 HU_HO_ID
>>> >>> |               HU_STREAM_ID               |                 HU_USAGE
>>> >>> |
>>> >>>
>>> >>>
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>> | 2015-03-19 00:59:59.000                  | 4720
>>> >>> | 0                                        | 0.287
>>> >>> |
>>> >>> | 2015-03-19 00:59:59.000                  | 6854
>>> >>> | 0                                        | 3.6189999999999998
>>> >>> |
>>> >>> | 2015-03-19 01:59:59.000                  | 4720
>>> >>> | 0                                        | 0.323
>>> >>> |
>>> >>> | 2015-03-19 01:59:59.000                  | 6854
>>> >>> | 0                                        | 2.556
>>> >>> |
>>> >>> | 2015-03-19 02:59:59.000                  | 4720
>>> >>> | 0                                        | 0.37
>>> >>> |
>>> >>>
>>> >>>
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>
>>> >>> Ok - I will try to prepare a sample csv with DDL/DML.
>>> >>>
>>> >>> Thanks again,
>>> >>> Marek
>>> >>>
>>> >>>
>>> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <jamestaylor@apache.org
>>> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>>:
>>> >>>>
>>> >>>> Hi Marek,
>>> >>>> Thanks for the additional information. If you could answer my
>>> earlier
>>> >>>> questions, that would be helpful.
>>> >>>>
>>> >>>> If you can't repro with a simple test case, then how about
>>> attaching a
>>> >>>> csv dump of some of your data (the smallest amount of data that
>>> repros
>>> >>>> the issue) to a JIRA along with the CREATE TABLE statement and the
>>> >>>> query?
>>> >>>>
>>> >>>> Thanks,
>>> >>>> James
>>> >>>>
>>> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
>>> >>>> <marek.wiewiorka@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','marek.wiewiorka@gmail.com');>> wrote:
>>> >>>> > Hi Guys - thanks for your messages.
>>> >>>> >
>>> >>>> > I did another round of testing I found that if I use to_char
>>> instead
>>> >>>> > of
>>> >>>> > trunc function I'm getting the expected result:
>>> >>>> >
>>> >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage
>>> where
>>> >>>> > id in
>>> >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>>> >>>> >
>>> >>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> > |               TO_CHAR(DT)                |                    ID
>>> >>>> > |                  STREAM                  |
>>> SUM(USAGE)
>>> >>>> > |
>>> >>>> >
>>> >>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> > | 2015-03-19                               | 4720
>>> >>>> > | 0                                        | 8.405999999999999
>>> >>>> > |
>>> >>>> > | 2015-03-19                               | 6854
>>> >>>> > | 0                                        | 28.339000000000006
>>> >>>> > |
>>> >>>> >
>>> >>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> >
>>> >>>> > the same query with trunc returns 48 rows (24 rows per id)
>>> >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage
>>> where id
>>> >>>> > in
>>> >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
>>> >>>> > e.g.:
>>> >>>> > ....
>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>>    |
>>> >>>> > 0
>>> >>>> > | 0.406                                    |
>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>>    |
>>> >>>> > 0
>>> >>>> > | 0.9490000000000001                       |
>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>>    |
>>> >>>> > 0
>>> >>>> > | 0.332                                    |
>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>>    |
>>> >>>> > 0
>>> >>>> > | 2.738                                    |
>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>>    |
>>> >>>> > 0
>>> >>>> > | 0.33499999999999996                      |
>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>>    |
>>> >>>> > 0
>>> >>>> > | 2.421                                    |
>>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>>    |
>>> >>>> > 0
>>> >>>> > | 0.45599999999999996                      |
>>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>>    |
>>> >>>> > 0
>>> >>>> > | 2.167                                    |
>>> >>>> >
>>> >>>> >
>>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> >
>>> >>>> > so for some reason grouping by trunc is not working...but using
>>> >>>> > to_char is.
>>> >>>> > This applies for table loaded using psql from csv file.
>>> >>>> >
>>> >>>> > When tried to create a sample table and populate it with upserts
>>> >>>> > everything
>>> >>>> > worked as expected in both cases:
>>> >>>> >
>>> >>>> > create table usage (dt time not null ,id integer not null,stream
>>> >>>> > integer not
>>> >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
>>> >>>> >
>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
>>> >>>> >
>>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>>> trunc(dt,'DAY')
>>> >>>> > as
>>> >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group by
>>> >>>> > trunc(dt,'DAY'),stream,id;
>>> >>>> >
>>> >>>> >
>>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> > |           DAY           |                    ID
>>>     |
>>> >>>> > STREAM                  |                SUM(USAGE)
>>>   |
>>> >>>> >
>>> >>>> >
>>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> > | 2015-04-01 00:00:00.000 | 100
>>>     |
>>> >>>> > 0
>>> >>>> > | 5.0                                      |
>>> >>>> > | 2015-04-01 00:00:00.000 | 200
>>>     |
>>> >>>> > 0
>>> >>>> > | 10.0                                     |
>>> >>>> >
>>> >>>> >
>>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> >
>>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>>> >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id
>>> in
>>> >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>>> >>>> >
>>> >>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> > |               TO_CHAR(DT)                |                    ID
>>> >>>> > |                  STREAM                  |
>>> SUM(USAGE)
>>> >>>> > |
>>> >>>> >
>>> >>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> > | 2015-04-01                               | 100
>>> >>>> > | 0                                        | 5.0
>>> >>>> > |
>>> >>>> > | 2015-04-01                               | 200
>>> >>>> > | 0                                        | 10.0
>>> >>>> > |
>>> >>>> >
>>> >>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >>>> > 2 rows selected (1.49 seconds)
>>> >>>> >
>>> >>>> >
>>> >>>> > Shall I open a jira for that?
>>> >>>> >
>>> >>>> > Regards,
>>> >>>> > Marek
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <jamestaylor@apache.org
>>> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>>:
>>> >>>> >>
>>> >>>> >> Hi Marek,
>>> >>>> >> How did you input the data and what does your CREATE TABLE/VIEW
>>> >>>> >> statement look like? What version of Phoenix and HBase are you
>>> using?
>>> >>>> >>
>>> >>>> >> Also, would you mind running the following query and letting us
>>> know
>>> >>>> >> the
>>> >>>> >> output?
>>> >>>> >>
>>> >>>> >> select to_char(hu_ts,'yyyy-MM-dd
>>> >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
>>> >>>> >> from se_dwh.homes_usage_hour
>>> >>>> >> where hu_ho_id in (4720,6854);
>>> >>>> >>
>>> >>>> >> Thanks,
>>> >>>> >> James
>>> >>>> >>
>>> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid
>>> >>>> >> <gabriel.reid@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','gabriel.reid@gmail.com');>>
>>> >>>> >> wrote:
>>> >>>> >> > That certainly looks like a bug. Would it be possible to make a
>>> >>>> >> > small
>>> >>>> >> > reproducible test case and if possible, log this in the Phoenix
>>> >>>> >> > JIRA
>>> >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
>>> >>>> >> >
>>> >>>> >> > - Gabriel
>>> >>>> >> >
>>> >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
>>> >>>> >> > <marek.wiewiorka@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','marek.wiewiorka@gmail.com');>>
>>> >>>> >> > wrote:
>>> >>>> >> >>
>>> >>>> >> >> Hi All,
>>> >>>> >> >> I came across a weird situation while running a query with
>>> group
>>> >>>> >> >> by.
>>> >>>> >> >> I executed 2 queries:
>>> >>>> >> >>
>>> >>>> >> >> 1)
>>> >>>> >> >> select
>>> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>>> >>>> >> >> from
>>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>> >>>> >> >>
>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>> >>>> >> >> |
>>> >>>> >> >> 0
>>> >>>> >> >> | 0.45599999999999996                      |
>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 6854
>>> >>>> >> >> |
>>> >>>> >> >> 0
>>> >>>> >> >> | 2.167                                    |
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >> 2)
>>> >>>> >> >>
>>> >>>> >> >> select
>>> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>>> >>>> >> >> from
>>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>> >>>> >> >>
>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>> >>>> >> >> |
>>> >>>> >> >> 0
>>> >>>> >> >> | 8.405999999999999                        |
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >> The only difference is that in the first case I included 2
>>> >>>> >> >> ids(4720,6854)
>>> >>>> >> >> and in the other one only 1 (4720).
>>> >>>> >> >> The result for hu_ho_id=4720 should be the same in both case
>>> but
>>> >>>> >> >> it
>>> >>>> >> >> isn't
>>> >>>> >> >> (0.45 vs 8.4).
>>> >>>> >> >> The second result(8.4) is correct.
>>> >>>> >> >>
>>> >>>> >> >> Am I doing something wrong?
>>> >>>> >> >>
>>> >>>> >> >> Regards,
>>> >>>> >> >> Marek
>>> >>>> >> >>
>>> >>>> >> >
>>> >>>> >
>>> >>>> >
>>> >>>
>>> >>>
>>> >
>>>
>>
>>
>

Re: group by problem

Posted by Marek Wiewiorka <ma...@gmail.com>.

Hi James - shall I still open a JIRA for that?
Thanks!
Marek

2015-04-06 22:48 GMT+02:00 Marek Wiewiorka <ma...@gmail.com>:

> psql from a csv file:
> ./psql.py dwh:2181:/hbase-unsecure -t SE_DWH.HOMES_USAGE_HOUR
> /mnt/spark/export/usage_convert.txt/usage_merged.csv
>
> Here is a sample:
> 2015-03-19 23:59:59,6854,0,2.167
> 2015-03-19 22:59:59,6854,0,2.421
> 2015-03-19 21:59:59,6854,0,2.738
> 2015-03-19 20:59:59,6854,0,0.9490000000000001
> 2015-03-19 19:59:59,6854,0,0.748
> 2015-03-19 18:59:59,6854,0,2.76
> 2015-03-19 17:59:59,6854,0,1.801
> 2015-03-19 16:59:59,6854,0,0.661
> 2015-03-19 15:59:59,6854,0,1.082
> 2015-03-19 14:59:59,6854,0,1.303
>
>
> M.
>
> 2015-04-06 22:38 GMT+02:00 James Taylor <ja...@apache.org>:
>
>> How did you input the data?
>>
>> On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka
>> <ma...@gmail.com> wrote:
>> > Oh I'm sorry I forgot to attach them:
>> > DDL of my table:
>> > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id
>> integer
>> > not null ,hu_stream_id integer not null, hu_usage double constraint pk
>> > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) );
>> >
>> > Phoenix: 4.3.0
>> >
>> > Thanks,
>> > Marek
>> >
>> >
>> > 2015-04-06 22:25 GMT+02:00 James Taylor <ja...@apache.org>:
>> >>
>> >> Hi Marek,
>> >> How did you input the data and what does your CREATE TABLE/VIEW
>> >> statement look like? What version of Phoenix and HBase are you using?
>> >> Thanks,
>> >> James
>> >>
>> >> On Monday, April 6, 2015, Marek Wiewiorka <ma...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi James - sure here is the result of your query (limited to 5 rows):
>> >>>
>> >>>
>> >>>
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>> |              TO_CHAR(HU_TS)              |                 HU_HO_ID
>> >>> |               HU_STREAM_ID               |                 HU_USAGE
>> >>> |
>> >>>
>> >>>
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>> | 2015-03-19 00:59:59.000                  | 4720
>> >>> | 0                                        | 0.287
>> >>> |
>> >>> | 2015-03-19 00:59:59.000                  | 6854
>> >>> | 0                                        | 3.6189999999999998
>> >>> |
>> >>> | 2015-03-19 01:59:59.000                  | 4720
>> >>> | 0                                        | 0.323
>> >>> |
>> >>> | 2015-03-19 01:59:59.000                  | 6854
>> >>> | 0                                        | 2.556
>> >>> |
>> >>> | 2015-03-19 02:59:59.000                  | 4720
>> >>> | 0                                        | 0.37
>> >>> |
>> >>>
>> >>>
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>
>> >>> Ok - I will try to prepare a sample csv with DDL/DML.
>> >>>
>> >>> Thanks again,
>> >>> Marek
>> >>>
>> >>>
>> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <ja...@apache.org>:
>> >>>>
>> >>>> Hi Marek,
>> >>>> Thanks for the additional information. If you could answer my earlier
>> >>>> questions, that would be helpful.
>> >>>>
>> >>>> If you can't repro with a simple test case, then how about attaching
>> a
>> >>>> csv dump of some of your data (the smallest amount of data that
>> repros
>> >>>> the issue) to a JIRA along with the CREATE TABLE statement and the
>> >>>> query?
>> >>>>
>> >>>> Thanks,
>> >>>> James
>> >>>>
>> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
>> >>>> <ma...@gmail.com> wrote:
>> >>>> > Hi Guys - thanks for your messages.
>> >>>> >
>> >>>> > I did another round of testing I found that if I use to_char
>> instead
>> >>>> > of
>> >>>> > trunc function I'm getting the expected result:
>> >>>> >
>> >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage
>> where
>> >>>> > id in
>> >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>> >>>> >
>> >>>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> > |               TO_CHAR(DT)                |                    ID
>> >>>> > |                  STREAM                  |
>> SUM(USAGE)
>> >>>> > |
>> >>>> >
>> >>>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> > | 2015-03-19                               | 4720
>> >>>> > | 0                                        | 8.405999999999999
>> >>>> > |
>> >>>> > | 2015-03-19                               | 6854
>> >>>> > | 0                                        | 28.339000000000006
>> >>>> > |
>> >>>> >
>> >>>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> >
>> >>>> > the same query with trunc returns 48 rows (24 rows per id)
>> >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage
>> where id
>> >>>> > in
>> >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
>> >>>> > e.g.:
>> >>>> > ....
>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>    |
>> >>>> > 0
>> >>>> > | 0.406                                    |
>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>    |
>> >>>> > 0
>> >>>> > | 0.9490000000000001                       |
>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>    |
>> >>>> > 0
>> >>>> > | 0.332                                    |
>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>    |
>> >>>> > 0
>> >>>> > | 2.738                                    |
>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>    |
>> >>>> > 0
>> >>>> > | 0.33499999999999996                      |
>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>    |
>> >>>> > 0
>> >>>> > | 2.421                                    |
>> >>>> > | 2015-03-19 00:00:00.000 | 4720
>>    |
>> >>>> > 0
>> >>>> > | 0.45599999999999996                      |
>> >>>> > | 2015-03-19 00:00:00.000 | 6854
>>    |
>> >>>> > 0
>> >>>> > | 2.167                                    |
>> >>>> >
>> >>>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> >
>> >>>> > so for some reason grouping by trunc is not working...but using
>> >>>> > to_char is.
>> >>>> > This applies for table loaded using psql from csv file.
>> >>>> >
>> >>>> > When tried to create a sample table and populate it with upserts
>> >>>> > everything
>> >>>> > worked as expected in both cases:
>> >>>> >
>> >>>> > create table usage (dt time not null ,id integer not null,stream
>> >>>> > integer not
>> >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
>> >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
>> >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
>> >>>> >
>> >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
>> >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
>> >>>> >
>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>> trunc(dt,'DAY')
>> >>>> > as
>> >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group by
>> >>>> > trunc(dt,'DAY'),stream,id;
>> >>>> >
>> >>>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> > |           DAY           |                    ID
>>   |
>> >>>> > STREAM                  |                SUM(USAGE)
>> |
>> >>>> >
>> >>>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> > | 2015-04-01 00:00:00.000 | 100
>>   |
>> >>>> > 0
>> >>>> > | 5.0                                      |
>> >>>> > | 2015-04-01 00:00:00.000 | 200
>>   |
>> >>>> > 0
>> >>>> > | 10.0                                     |
>> >>>> >
>> >>>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> >
>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>> >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id
>> in
>> >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>> >>>> >
>> >>>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> > |               TO_CHAR(DT)                |                    ID
>> >>>> > |                  STREAM                  |
>> SUM(USAGE)
>> >>>> > |
>> >>>> >
>> >>>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> > | 2015-04-01                               | 100
>> >>>> > | 0                                        | 5.0
>> >>>> > |
>> >>>> > | 2015-04-01                               | 200
>> >>>> > | 0                                        | 10.0
>> >>>> > |
>> >>>> >
>> >>>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >>>> > 2 rows selected (1.49 seconds)
>> >>>> >
>> >>>> >
>> >>>> > Shall I open a jira for that?
>> >>>> >
>> >>>> > Regards,
>> >>>> > Marek
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <ja...@apache.org>:
>> >>>> >>
>> >>>> >> Hi Marek,
>> >>>> >> How did you input the data and what does your CREATE TABLE/VIEW
>> >>>> >> statement look like? What version of Phoenix and HBase are you
>> using?
>> >>>> >>
>> >>>> >> Also, would you mind running the following query and letting us
>> know
>> >>>> >> the
>> >>>> >> output?
>> >>>> >>
>> >>>> >> select to_char(hu_ts,'yyyy-MM-dd
>> >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
>> >>>> >> from se_dwh.homes_usage_hour
>> >>>> >> where hu_ho_id in (4720,6854);
>> >>>> >>
>> >>>> >> Thanks,
>> >>>> >> James
>> >>>> >>
>> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid
>> >>>> >> <ga...@gmail.com>
>> >>>> >> wrote:
>> >>>> >> > That certainly looks like a bug. Would it be possible to make a
>> >>>> >> > small
>> >>>> >> > reproducible test case and if possible, log this in the Phoenix
>> >>>> >> > JIRA
>> >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
>> >>>> >> >
>> >>>> >> > - Gabriel
>> >>>> >> >
>> >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
>> >>>> >> > <ma...@gmail.com>
>> >>>> >> > wrote:
>> >>>> >> >>
>> >>>> >> >> Hi All,
>> >>>> >> >> I came across a weird situation while running a query with
>> group
>> >>>> >> >> by.
>> >>>> >> >> I executed 2 queries:
>> >>>> >> >>
>> >>>> >> >> 1)
>> >>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>> >>>> >> >> from
>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>> >>>> >> >>
>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>> >>>> >> >> |
>> >>>> >> >> 0
>> >>>> >> >> | 0.45599999999999996                      |
>> >>>> >> >> | 2015-03-19 00:00:00.000 | 6854
>> >>>> >> >> |
>> >>>> >> >> 0
>> >>>> >> >> | 2.167                                    |
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> 2)
>> >>>> >> >>
>> >>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>> >>>> >> >> from
>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>> >>>> >> >>
>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>> >>>> >> >> |
>> >>>> >> >> 0
>> >>>> >> >> | 8.405999999999999                        |
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> The only difference is that in the first case I included 2
>> >>>> >> >> ids(4720,6854)
>> >>>> >> >> and in the other one only 1 (4720).
>> >>>> >> >> The result for hu_ho_id=4720 should be the same in both case
>> but
>> >>>> >> >> it
>> >>>> >> >> isn't
>> >>>> >> >> (0.45 vs 8.4).
>> >>>> >> >> The second result(8.4) is correct.
>> >>>> >> >>
>> >>>> >> >> Am I doing something wrong?
>> >>>> >> >>
>> >>>> >> >> Regards,
>> >>>> >> >> Marek
>> >>>> >> >>
>> >>>> >> >
>> >>>> >
>> >>>> >
>> >>>
>> >>>
>> >
>>
>
>

Re: group by problem

Posted by Marek Wiewiorka <ma...@gmail.com>.

psql from a csv file:
./psql.py dwh:2181:/hbase-unsecure -t SE_DWH.HOMES_USAGE_HOUR
/mnt/spark/export/usage_convert.txt/usage_merged.csv

Here is a sample:
2015-03-19 23:59:59,6854,0,2.167
2015-03-19 22:59:59,6854,0,2.421
2015-03-19 21:59:59,6854,0,2.738
2015-03-19 20:59:59,6854,0,0.9490000000000001
2015-03-19 19:59:59,6854,0,0.748
2015-03-19 18:59:59,6854,0,2.76
2015-03-19 17:59:59,6854,0,1.801
2015-03-19 16:59:59,6854,0,0.661
2015-03-19 15:59:59,6854,0,1.082
2015-03-19 14:59:59,6854,0,1.303


M.

2015-04-06 22:38 GMT+02:00 James Taylor <ja...@apache.org>:

> How did you input the data?
>
> On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka
> <ma...@gmail.com> wrote:
> > Oh I'm sorry I forgot to attach them:
> > DDL of my table:
> > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id
> integer
> > not null ,hu_stream_id integer not null, hu_usage double constraint pk
> > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) );
> >
> > Phoenix: 4.3.0
> >
> > Thanks,
> > Marek
> >
> >
> > 2015-04-06 22:25 GMT+02:00 James Taylor <ja...@apache.org>:
> >>
> >> Hi Marek,
> >> How did you input the data and what does your CREATE TABLE/VIEW
> >> statement look like? What version of Phoenix and HBase are you using?
> >> Thanks,
> >> James
> >>
> >> On Monday, April 6, 2015, Marek Wiewiorka <ma...@gmail.com>
> >> wrote:
> >>>
> >>> Hi James - sure here is the result of your query (limited to 5 rows):
> >>>
> >>>
> >>>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>> |              TO_CHAR(HU_TS)              |                 HU_HO_ID
> >>> |               HU_STREAM_ID               |                 HU_USAGE
> >>> |
> >>>
> >>>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>> | 2015-03-19 00:59:59.000                  | 4720
> >>> | 0                                        | 0.287
> >>> |
> >>> | 2015-03-19 00:59:59.000                  | 6854
> >>> | 0                                        | 3.6189999999999998
> >>> |
> >>> | 2015-03-19 01:59:59.000                  | 4720
> >>> | 0                                        | 0.323
> >>> |
> >>> | 2015-03-19 01:59:59.000                  | 6854
> >>> | 0                                        | 2.556
> >>> |
> >>> | 2015-03-19 02:59:59.000                  | 4720
> >>> | 0                                        | 0.37
> >>> |
> >>>
> >>>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>
> >>> Ok - I will try to prepare a sample csv with DDL/DML.
> >>>
> >>> Thanks again,
> >>> Marek
> >>>
> >>>
> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <ja...@apache.org>:
> >>>>
> >>>> Hi Marek,
> >>>> Thanks for the additional information. If you could answer my earlier
> >>>> questions, that would be helpful.
> >>>>
> >>>> If you can't repro with a simple test case, then how about attaching a
> >>>> csv dump of some of your data (the smallest amount of data that repros
> >>>> the issue) to a JIRA along with the CREATE TABLE statement and the
> >>>> query?
> >>>>
> >>>> Thanks,
> >>>> James
> >>>>
> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
> >>>> <ma...@gmail.com> wrote:
> >>>> > Hi Guys - thanks for your messages.
> >>>> >
> >>>> > I did another round of testing I found that if I use to_char instead
> >>>> > of
> >>>> > trunc function I'm getting the expected result:
> >>>> >
> >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage
> where
> >>>> > id in
> >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
> >>>> >
> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> > |               TO_CHAR(DT)                |                    ID
> >>>> > |                  STREAM                  |
> SUM(USAGE)
> >>>> > |
> >>>> >
> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> > | 2015-03-19                               | 4720
> >>>> > | 0                                        | 8.405999999999999
> >>>> > |
> >>>> > | 2015-03-19                               | 6854
> >>>> > | 0                                        | 28.339000000000006
> >>>> > |
> >>>> >
> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >
> >>>> > the same query with trunc returns 48 rows (24 rows per id)
> >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where
> id
> >>>> > in
> >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
> >>>> > e.g.:
> >>>> > ....
> >>>> > | 2015-03-19 00:00:00.000 | 4720
>  |
> >>>> > 0
> >>>> > | 0.406                                    |
> >>>> > | 2015-03-19 00:00:00.000 | 6854
>  |
> >>>> > 0
> >>>> > | 0.9490000000000001                       |
> >>>> > | 2015-03-19 00:00:00.000 | 4720
>  |
> >>>> > 0
> >>>> > | 0.332                                    |
> >>>> > | 2015-03-19 00:00:00.000 | 6854
>  |
> >>>> > 0
> >>>> > | 2.738                                    |
> >>>> > | 2015-03-19 00:00:00.000 | 4720
>  |
> >>>> > 0
> >>>> > | 0.33499999999999996                      |
> >>>> > | 2015-03-19 00:00:00.000 | 6854
>  |
> >>>> > 0
> >>>> > | 2.421                                    |
> >>>> > | 2015-03-19 00:00:00.000 | 4720
>  |
> >>>> > 0
> >>>> > | 0.45599999999999996                      |
> >>>> > | 2015-03-19 00:00:00.000 | 6854
>  |
> >>>> > 0
> >>>> > | 2.167                                    |
> >>>> >
> >>>> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >
> >>>> > so for some reason grouping by trunc is not working...but using
> >>>> > to_char is.
> >>>> > This applies for table loaded using psql from csv file.
> >>>> >
> >>>> > When tried to create a sample table and populate it with upserts
> >>>> > everything
> >>>> > worked as expected in both cases:
> >>>> >
> >>>> > create table usage (dt time not null ,id integer not null,stream
> >>>> > integer not
> >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
> >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
> >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
> >>>> >
> >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
> >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
> >>>> >
> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
> trunc(dt,'DAY')
> >>>> > as
> >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group by
> >>>> > trunc(dt,'DAY'),stream,id;
> >>>> >
> >>>> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> > |           DAY           |                    ID
>   |
> >>>> > STREAM                  |                SUM(USAGE)                |
> >>>> >
> >>>> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> > | 2015-04-01 00:00:00.000 | 100
>   |
> >>>> > 0
> >>>> > | 5.0                                      |
> >>>> > | 2015-04-01 00:00:00.000 | 200
>   |
> >>>> > 0
> >>>> > | 10.0                                     |
> >>>> >
> >>>> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> >
> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
> >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
> >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
> >>>> >
> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> > |               TO_CHAR(DT)                |                    ID
> >>>> > |                  STREAM                  |
> SUM(USAGE)
> >>>> > |
> >>>> >
> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> > | 2015-04-01                               | 100
> >>>> > | 0                                        | 5.0
> >>>> > |
> >>>> > | 2015-04-01                               | 200
> >>>> > | 0                                        | 10.0
> >>>> > |
> >>>> >
> >>>> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >>>> > 2 rows selected (1.49 seconds)
> >>>> >
> >>>> >
> >>>> > Shall I open a jira for that?
> >>>> >
> >>>> > Regards,
> >>>> > Marek
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <ja...@apache.org>:
> >>>> >>
> >>>> >> Hi Marek,
> >>>> >> How did you input the data and what does your CREATE TABLE/VIEW
> >>>> >> statement look like? What version of Phoenix and HBase are you
> using?
> >>>> >>
> >>>> >> Also, would you mind running the following query and letting us
> know
> >>>> >> the
> >>>> >> output?
> >>>> >>
> >>>> >> select to_char(hu_ts,'yyyy-MM-dd
> >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
> >>>> >> from se_dwh.homes_usage_hour
> >>>> >> where hu_ho_id in (4720,6854);
> >>>> >>
> >>>> >> Thanks,
> >>>> >> James
> >>>> >>
> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid
> >>>> >> <ga...@gmail.com>
> >>>> >> wrote:
> >>>> >> > That certainly looks like a bug. Would it be possible to make a
> >>>> >> > small
> >>>> >> > reproducible test case and if possible, log this in the Phoenix
> >>>> >> > JIRA
> >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
> >>>> >> >
> >>>> >> > - Gabriel
> >>>> >> >
> >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
> >>>> >> > <ma...@gmail.com>
> >>>> >> > wrote:
> >>>> >> >>
> >>>> >> >> Hi All,
> >>>> >> >> I came across a weird situation while running a query with group
> >>>> >> >> by.
> >>>> >> >> I executed 2 queries:
> >>>> >> >>
> >>>> >> >> 1)
> >>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
> >>>> >> >> from
> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >>>> >> >>
> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
> >>>> >> >> |
> >>>> >> >> 0
> >>>> >> >> | 0.45599999999999996                      |
> >>>> >> >> | 2015-03-19 00:00:00.000 | 6854
> >>>> >> >> |
> >>>> >> >> 0
> >>>> >> >> | 2.167                                    |
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> 2)
> >>>> >> >>
> >>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
> >>>> >> >> from
> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >>>> >> >>
> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720
> >>>> >> >> |
> >>>> >> >> 0
> >>>> >> >> | 8.405999999999999                        |
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> The only difference is that in the first case I included 2
> >>>> >> >> ids(4720,6854)
> >>>> >> >> and in the other one only 1 (4720).
> >>>> >> >> The result for hu_ho_id=4720 should be the same in both case but
> >>>> >> >> it
> >>>> >> >> isn't
> >>>> >> >> (0.45 vs 8.4).
> >>>> >> >> The second result(8.4) is correct.
> >>>> >> >>
> >>>> >> >> Am I doing something wrong?
> >>>> >> >>
> >>>> >> >> Regards,
> >>>> >> >> Marek
> >>>> >> >>
> >>>> >> >
> >>>> >
> >>>> >
> >>>
> >>>
> >
>

Re: group by problem

Posted by James Taylor <ja...@apache.org>.

How did you input the data?

On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka
<ma...@gmail.com> wrote:
> Oh I'm sorry I forgot to attach them:
> DDL of my table:
> create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id integer
> not null ,hu_stream_id integer not null, hu_usage double constraint pk
> PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) );
>
> Phoenix: 4.3.0
>
> Thanks,
> Marek
>
>
> 2015-04-06 22:25 GMT+02:00 James Taylor <ja...@apache.org>:
>>
>> Hi Marek,
>> How did you input the data and what does your CREATE TABLE/VIEW
>> statement look like? What version of Phoenix and HBase are you using?
>> Thanks,
>> James
>>
>> On Monday, April 6, 2015, Marek Wiewiorka <ma...@gmail.com>
>> wrote:
>>>
>>> Hi James - sure here is the result of your query (limited to 5 rows):
>>>
>>>
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> |              TO_CHAR(HU_TS)              |                 HU_HO_ID
>>> |               HU_STREAM_ID               |                 HU_USAGE
>>> |
>>>
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> | 2015-03-19 00:59:59.000                  | 4720
>>> | 0                                        | 0.287
>>> |
>>> | 2015-03-19 00:59:59.000                  | 6854
>>> | 0                                        | 3.6189999999999998
>>> |
>>> | 2015-03-19 01:59:59.000                  | 4720
>>> | 0                                        | 0.323
>>> |
>>> | 2015-03-19 01:59:59.000                  | 6854
>>> | 0                                        | 2.556
>>> |
>>> | 2015-03-19 02:59:59.000                  | 4720
>>> | 0                                        | 0.37
>>> |
>>>
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>
>>> Ok - I will try to prepare a sample csv with DDL/DML.
>>>
>>> Thanks again,
>>> Marek
>>>
>>>
>>> 2015-04-06 22:06 GMT+02:00 James Taylor <ja...@apache.org>:
>>>>
>>>> Hi Marek,
>>>> Thanks for the additional information. If you could answer my earlier
>>>> questions, that would be helpful.
>>>>
>>>> If you can't repro with a simple test case, then how about attaching a
>>>> csv dump of some of your data (the smallest amount of data that repros
>>>> the issue) to a JIRA along with the CREATE TABLE statement and the
>>>> query?
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
>>>> <ma...@gmail.com> wrote:
>>>> > Hi Guys - thanks for your messages.
>>>> >
>>>> > I did another round of testing I found that if I use to_char instead
>>>> > of
>>>> > trunc function I'm getting the expected result:
>>>> >
>>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where
>>>> > id in
>>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>>>> >
>>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> > |               TO_CHAR(DT)                |                    ID
>>>> > |                  STREAM                  |                SUM(USAGE)
>>>> > |
>>>> >
>>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> > | 2015-03-19                               | 4720
>>>> > | 0                                        | 8.405999999999999
>>>> > |
>>>> > | 2015-03-19                               | 6854
>>>> > | 0                                        | 28.339000000000006
>>>> > |
>>>> >
>>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >
>>>> > the same query with trunc returns 48 rows (24 rows per id)
>>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id
>>>> > in
>>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
>>>> > e.g.:
>>>> > ....
>>>> > | 2015-03-19 00:00:00.000 | 4720                                     |
>>>> > 0
>>>> > | 0.406                                    |
>>>> > | 2015-03-19 00:00:00.000 | 6854                                     |
>>>> > 0
>>>> > | 0.9490000000000001                       |
>>>> > | 2015-03-19 00:00:00.000 | 4720                                     |
>>>> > 0
>>>> > | 0.332                                    |
>>>> > | 2015-03-19 00:00:00.000 | 6854                                     |
>>>> > 0
>>>> > | 2.738                                    |
>>>> > | 2015-03-19 00:00:00.000 | 4720                                     |
>>>> > 0
>>>> > | 0.33499999999999996                      |
>>>> > | 2015-03-19 00:00:00.000 | 6854                                     |
>>>> > 0
>>>> > | 2.421                                    |
>>>> > | 2015-03-19 00:00:00.000 | 4720                                     |
>>>> > 0
>>>> > | 0.45599999999999996                      |
>>>> > | 2015-03-19 00:00:00.000 | 6854                                     |
>>>> > 0
>>>> > | 2.167                                    |
>>>> >
>>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >
>>>> > so for some reason grouping by trunc is not working...but using
>>>> > to_char is.
>>>> > This applies for table loaded using psql from csv file.
>>>> >
>>>> > When tried to create a sample table and populate it with upserts
>>>> > everything
>>>> > worked as expected in both cases:
>>>> >
>>>> > create table usage (dt time not null ,id integer not null,stream
>>>> > integer not
>>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
>>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
>>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
>>>> >
>>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
>>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
>>>> >
>>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY')
>>>> > as
>>>> > day,id,stream,sum(usage) from usage where id in (100,200) group by
>>>> > trunc(dt,'DAY'),stream,id;
>>>> >
>>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> > |           DAY           |                    ID                    |
>>>> > STREAM                  |                SUM(USAGE)                |
>>>> >
>>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> > | 2015-04-01 00:00:00.000 | 100                                      |
>>>> > 0
>>>> > | 5.0                                      |
>>>> > | 2015-04-01 00:00:00.000 | 200                                      |
>>>> > 0
>>>> > | 10.0                                     |
>>>> >
>>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> >
>>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
>>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>>>> >
>>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> > |               TO_CHAR(DT)                |                    ID
>>>> > |                  STREAM                  |                SUM(USAGE)
>>>> > |
>>>> >
>>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> > | 2015-04-01                               | 100
>>>> > | 0                                        | 5.0
>>>> > |
>>>> > | 2015-04-01                               | 200
>>>> > | 0                                        | 10.0
>>>> > |
>>>> >
>>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>>> > 2 rows selected (1.49 seconds)
>>>> >
>>>> >
>>>> > Shall I open a jira for that?
>>>> >
>>>> > Regards,
>>>> > Marek
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <ja...@apache.org>:
>>>> >>
>>>> >> Hi Marek,
>>>> >> How did you input the data and what does your CREATE TABLE/VIEW
>>>> >> statement look like? What version of Phoenix and HBase are you using?
>>>> >>
>>>> >> Also, would you mind running the following query and letting us know
>>>> >> the
>>>> >> output?
>>>> >>
>>>> >> select to_char(hu_ts,'yyyy-MM-dd
>>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
>>>> >> from se_dwh.homes_usage_hour
>>>> >> where hu_ho_id in (4720,6854);
>>>> >>
>>>> >> Thanks,
>>>> >> James
>>>> >>
>>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid
>>>> >> <ga...@gmail.com>
>>>> >> wrote:
>>>> >> > That certainly looks like a bug. Would it be possible to make a
>>>> >> > small
>>>> >> > reproducible test case and if possible, log this in the Phoenix
>>>> >> > JIRA
>>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
>>>> >> >
>>>> >> > - Gabriel
>>>> >> >
>>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
>>>> >> > <ma...@gmail.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Hi All,
>>>> >> >> I came across a weird situation while running a query with group
>>>> >> >> by.
>>>> >> >> I executed 2 queries:
>>>> >> >>
>>>> >> >> 1)
>>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>>>> >> >> from
>>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
>>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>>> >> >>
>>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>>> >> >> |
>>>> >> >> 0
>>>> >> >> | 0.45599999999999996                      |
>>>> >> >> | 2015-03-19 00:00:00.000 | 6854
>>>> >> >> |
>>>> >> >> 0
>>>> >> >> | 2.167                                    |
>>>> >> >>
>>>> >> >>
>>>> >> >> 2)
>>>> >> >>
>>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>>>> >> >> from
>>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>>> >> >>
>>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>>> >> >> |
>>>> >> >> 0
>>>> >> >> | 8.405999999999999                        |
>>>> >> >>
>>>> >> >>
>>>> >> >> The only difference is that in the first case I included 2
>>>> >> >> ids(4720,6854)
>>>> >> >> and in the other one only 1 (4720).
>>>> >> >> The result for hu_ho_id=4720 should be the same in both case but
>>>> >> >> it
>>>> >> >> isn't
>>>> >> >> (0.45 vs 8.4).
>>>> >> >> The second result(8.4) is correct.
>>>> >> >>
>>>> >> >> Am I doing something wrong?
>>>> >> >>
>>>> >> >> Regards,
>>>> >> >> Marek
>>>> >> >>
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>

Re: group by problem

Posted by Marek Wiewiorka <ma...@gmail.com>.

Oh I'm sorry I forgot to attach them:
DDL of my table:
create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id integer
not null ,hu_stream_id integer not null, hu_usage double constraint pk
PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) );

Phoenix: 4.3.0

Thanks,
Marek


2015-04-06 22:25 GMT+02:00 James Taylor <ja...@apache.org>:

> Hi Marek,
> How did you input the data and what does your CREATE TABLE/VIEW
> statement look like? What version of Phoenix and HBase are you using?
> Thanks,
> James
>
> On Monday, April 6, 2015, Marek Wiewiorka <ma...@gmail.com>
> wrote:
>
>> Hi James - sure here is the result of your query (limited to 5 rows):
>>
>>
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> |              TO_CHAR(HU_TS)              |                 HU_HO_ID
>>             |               HU_STREAM_ID               |
>> HU_USAGE                 |
>>
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> | 2015-03-19 00:59:59.000                  | 4720
>>             | 0                                        | 0.287
>>                        |
>> | 2015-03-19 00:59:59.000                  | 6854
>>             | 0                                        | 3.6189999999999998
>>                       |
>> | 2015-03-19 01:59:59.000                  | 4720
>>             | 0                                        | 0.323
>>                        |
>> | 2015-03-19 01:59:59.000                  | 6854
>>             | 0                                        | 2.556
>>                        |
>> | 2015-03-19 02:59:59.000                  | 4720
>>             | 0                                        | 0.37
>>                       |
>>
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>
>> Ok - I will try to prepare a sample csv with DDL/DML.
>>
>> Thanks again,
>> Marek
>>
>>
>> 2015-04-06 22:06 GMT+02:00 James Taylor <ja...@apache.org>:
>>
>>> Hi Marek,
>>> Thanks for the additional information. If you could answer my earlier
>>> questions, that would be helpful.
>>>
>>> If you can't repro with a simple test case, then how about attaching a
>>> csv dump of some of your data (the smallest amount of data that repros
>>> the issue) to a JIRA along with the CREATE TABLE statement and the
>>> query?
>>>
>>> Thanks,
>>> James
>>>
>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
>>> <ma...@gmail.com> wrote:
>>> > Hi Guys - thanks for your messages.
>>> >
>>> > I did another round of testing I found that if I use to_char instead of
>>> > trunc function I'm getting the expected result:
>>> >
>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where
>>> id in
>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> > |               TO_CHAR(DT)                |                    ID
>>> > |                  STREAM                  |                SUM(USAGE)
>>> > |
>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> > | 2015-03-19                               | 4720
>>> > | 0                                        | 8.405999999999999
>>> > |
>>> > | 2015-03-19                               | 6854
>>> > | 0                                        | 28.339000000000006
>>> > |
>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >
>>> > the same query with trunc returns 48 rows (24 rows per id)
>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id
>>> in
>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
>>> > e.g.:
>>> > ....
>>> > | 2015-03-19 00:00:00.000 | 4720                                     |
>>> 0
>>> > | 0.406                                    |
>>> > | 2015-03-19 00:00:00.000 | 6854                                     |
>>> 0
>>> > | 0.9490000000000001                       |
>>> > | 2015-03-19 00:00:00.000 | 4720                                     |
>>> 0
>>> > | 0.332                                    |
>>> > | 2015-03-19 00:00:00.000 | 6854                                     |
>>> 0
>>> > | 2.738                                    |
>>> > | 2015-03-19 00:00:00.000 | 4720                                     |
>>> 0
>>> > | 0.33499999999999996                      |
>>> > | 2015-03-19 00:00:00.000 | 6854                                     |
>>> 0
>>> > | 2.421                                    |
>>> > | 2015-03-19 00:00:00.000 | 4720                                     |
>>> 0
>>> > | 0.45599999999999996                      |
>>> > | 2015-03-19 00:00:00.000 | 6854                                     |
>>> 0
>>> > | 2.167                                    |
>>> >
>>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >
>>> > so for some reason grouping by trunc is not working...but using
>>> to_char is.
>>> > This applies for table loaded using psql from csv file.
>>> >
>>> > When tried to create a sample table and populate it with upserts
>>> everything
>>> > worked as expected in both cases:
>>> >
>>> > create table usage (dt time not null ,id integer not null,stream
>>> integer not
>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
>>> >
>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
>>> >
>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY')
>>> as
>>> > day,id,stream,sum(usage) from usage where id in (100,200) group by
>>> > trunc(dt,'DAY'),stream,id;
>>> >
>>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> > |           DAY           |                    ID                    |
>>> > STREAM                  |                SUM(USAGE)                |
>>> >
>>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> > | 2015-04-01 00:00:00.000 | 100                                      |
>>> 0
>>> > | 5.0                                      |
>>> > | 2015-04-01 00:00:00.000 | 200                                      |
>>> 0
>>> > | 10.0                                     |
>>> >
>>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> >
>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> > |               TO_CHAR(DT)                |                    ID
>>> > |                  STREAM                  |                SUM(USAGE)
>>> > |
>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> > | 2015-04-01                               | 100
>>> > | 0                                        | 5.0
>>> > |
>>> > | 2015-04-01                               | 200
>>> > | 0                                        | 10.0
>>> > |
>>> >
>>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>>> > 2 rows selected (1.49 seconds)
>>> >
>>> >
>>> > Shall I open a jira for that?
>>> >
>>> > Regards,
>>> > Marek
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <ja...@apache.org>:
>>> >>
>>> >> Hi Marek,
>>> >> How did you input the data and what does your CREATE TABLE/VIEW
>>> >> statement look like? What version of Phoenix and HBase are you using?
>>> >>
>>> >> Also, would you mind running the following query and letting us know
>>> the
>>> >> output?
>>> >>
>>> >> select to_char(hu_ts,'yyyy-MM-dd
>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
>>> >> from se_dwh.homes_usage_hour
>>> >> where hu_ho_id in (4720,6854);
>>> >>
>>> >> Thanks,
>>> >> James
>>> >>
>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <gabriel.reid@gmail.com
>>> >
>>> >> wrote:
>>> >> > That certainly looks like a bug. Would it be possible to make a
>>> small
>>> >> > reproducible test case and if possible, log this in the Phoenix JIRA
>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
>>> >> >
>>> >> > - Gabriel
>>> >> >
>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
>>> >> > <ma...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi All,
>>> >> >> I came across a weird situation while running a query with group
>>> by.
>>> >> >> I executed 2 queries:
>>> >> >>
>>> >> >> 1)
>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>>> from
>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>> >> >>
>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>>    |
>>> >> >> 0
>>> >> >> | 0.45599999999999996                      |
>>> >> >> | 2015-03-19 00:00:00.000 | 6854
>>>    |
>>> >> >> 0
>>> >> >> | 2.167                                    |
>>> >> >>
>>> >> >>
>>> >> >> 2)
>>> >> >>
>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage)
>>> from
>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>> >> >>
>>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>>    |
>>> >> >> 0
>>> >> >> | 8.405999999999999                        |
>>> >> >>
>>> >> >>
>>> >> >> The only difference is that in the first case I included 2
>>> >> >> ids(4720,6854)
>>> >> >> and in the other one only 1 (4720).
>>> >> >> The result for hu_ho_id=4720 should be the same in both case but it
>>> >> >> isn't
>>> >> >> (0.45 vs 8.4).
>>> >> >> The second result(8.4) is correct.
>>> >> >>
>>> >> >> Am I doing something wrong?
>>> >> >>
>>> >> >> Regards,
>>> >> >> Marek
>>> >> >>
>>> >> >
>>> >
>>> >
>>>
>>
>>

Re: group by problem

Posted by James Taylor <ja...@apache.org>.

Hi Marek,
How did you input the data and what does your CREATE TABLE/VIEW
statement look like? What version of Phoenix and HBase are you using?
Thanks,
James

On Monday, April 6, 2015, Marek Wiewiorka <ma...@gmail.com> wrote:

> Hi James - sure here is the result of your query (limited to 5 rows):
>
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> |              TO_CHAR(HU_TS)              |                 HU_HO_ID
>             |               HU_STREAM_ID               |
> HU_USAGE                 |
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> | 2015-03-19 00:59:59.000                  | 4720
>             | 0                                        | 0.287
>                        |
> | 2015-03-19 00:59:59.000                  | 6854
>             | 0                                        | 3.6189999999999998
>                       |
> | 2015-03-19 01:59:59.000                  | 4720
>             | 0                                        | 0.323
>                        |
> | 2015-03-19 01:59:59.000                  | 6854
>             | 0                                        | 2.556
>                        |
> | 2015-03-19 02:59:59.000                  | 4720
>             | 0                                        | 0.37
>                       |
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>
> Ok - I will try to prepare a sample csv with DDL/DML.
>
> Thanks again,
> Marek
>
>
> 2015-04-06 22:06 GMT+02:00 James Taylor <jamestaylor@apache.org
> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>>:
>
>> Hi Marek,
>> Thanks for the additional information. If you could answer my earlier
>> questions, that would be helpful.
>>
>> If you can't repro with a simple test case, then how about attaching a
>> csv dump of some of your data (the smallest amount of data that repros
>> the issue) to a JIRA along with the CREATE TABLE statement and the
>> query?
>>
>> Thanks,
>> James
>>
>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
>> <marek.wiewiorka@gmail.com
>> <javascript:_e(%7B%7D,'cvml','marek.wiewiorka@gmail.com');>> wrote:
>> > Hi Guys - thanks for your messages.
>> >
>> > I did another round of testing I found that if I use to_char instead of
>> > trunc function I'm getting the expected result:
>> >
>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where
>> id in
>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > |               TO_CHAR(DT)                |                    ID
>> > |                  STREAM                  |                SUM(USAGE)
>> > |
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > | 2015-03-19                               | 4720
>> > | 0                                        | 8.405999999999999
>> > |
>> > | 2015-03-19                               | 6854
>> > | 0                                        | 28.339000000000006
>> > |
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >
>> > the same query with trunc returns 48 rows (24 rows per id)
>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id
>> in
>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
>> > e.g.:
>> > ....
>> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> > | 0.406                                    |
>> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> > | 0.9490000000000001                       |
>> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> > | 0.332                                    |
>> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> > | 2.738                                    |
>> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> > | 0.33499999999999996                      |
>> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> > | 2.421                                    |
>> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> > | 0.45599999999999996                      |
>> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> > | 2.167                                    |
>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >
>> > so for some reason grouping by trunc is not working...but using to_char
>> is.
>> > This applies for table loaded using psql from csv file.
>> >
>> > When tried to create a sample table and populate it with upserts
>> everything
>> > worked as expected in both cases:
>> >
>> > create table usage (dt time not null ,id integer not null,stream
>> integer not
>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
>> >
>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
>> >
>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY')
>> as
>> > day,id,stream,sum(usage) from usage where id in (100,200) group by
>> > trunc(dt,'DAY'),stream,id;
>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > |           DAY           |                    ID                    |
>> > STREAM                  |                SUM(USAGE)                |
>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > | 2015-04-01 00:00:00.000 | 100                                      | 0
>> > | 5.0                                      |
>> > | 2015-04-01 00:00:00.000 | 200                                      | 0
>> > | 10.0                                     |
>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >
>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > |               TO_CHAR(DT)                |                    ID
>> > |                  STREAM                  |                SUM(USAGE)
>> > |
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > | 2015-04-01                               | 100
>> > | 0                                        | 5.0
>> > |
>> > | 2015-04-01                               | 200
>> > | 0                                        | 10.0
>> > |
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > 2 rows selected (1.49 seconds)
>> >
>> >
>> > Shall I open a jira for that?
>> >
>> > Regards,
>> > Marek
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 2015-04-06 20:16 GMT+02:00 James Taylor <jamestaylor@apache.org
>> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>>:
>> >>
>> >> Hi Marek,
>> >> How did you input the data and what does your CREATE TABLE/VIEW
>> >> statement look like? What version of Phoenix and HBase are you using?
>> >>
>> >> Also, would you mind running the following query and letting us know
>> the
>> >> output?
>> >>
>> >> select to_char(hu_ts,'yyyy-MM-dd
>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
>> >> from se_dwh.homes_usage_hour
>> >> where hu_ho_id in (4720,6854);
>> >>
>> >> Thanks,
>> >> James
>> >>
>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <gabriel.reid@gmail.com
>> <javascript:_e(%7B%7D,'cvml','gabriel.reid@gmail.com');>>
>> >> wrote:
>> >> > That certainly looks like a bug. Would it be possible to make a small
>> >> > reproducible test case and if possible, log this in the Phoenix JIRA
>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
>> >> >
>> >> > - Gabriel
>> >> >
>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
>> >> > <marek.wiewiorka@gmail.com
>> <javascript:_e(%7B%7D,'cvml','marek.wiewiorka@gmail.com');>>
>> >> > wrote:
>> >> >>
>> >> >> Hi All,
>> >> >> I came across a weird situation while running a query with group by.
>> >> >> I executed 2 queries:
>> >> >>
>> >> >> 1)
>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>> >> >>
>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>  |
>> >> >> 0
>> >> >> | 0.45599999999999996                      |
>> >> >> | 2015-03-19 00:00:00.000 | 6854
>>  |
>> >> >> 0
>> >> >> | 2.167                                    |
>> >> >>
>> >> >>
>> >> >> 2)
>> >> >>
>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>> >> >>
>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>  |
>> >> >> 0
>> >> >> | 8.405999999999999                        |
>> >> >>
>> >> >>
>> >> >> The only difference is that in the first case I included 2
>> >> >> ids(4720,6854)
>> >> >> and in the other one only 1 (4720).
>> >> >> The result for hu_ho_id=4720 should be the same in both case but it
>> >> >> isn't
>> >> >> (0.45 vs 8.4).
>> >> >> The second result(8.4) is correct.
>> >> >>
>> >> >> Am I doing something wrong?
>> >> >>
>> >> >> Regards,
>> >> >> Marek
>> >> >>
>> >> >
>> >
>> >
>>
>
>

Re: group by problem

Posted by Marek Wiewiorka <ma...@gmail.com>.

Hi James - sure here is the result of your query (limited to 5 rows):

+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
|              TO_CHAR(HU_TS)              |                 HU_HO_ID
          |               HU_STREAM_ID               |
HU_USAGE                 |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
| 2015-03-19 00:59:59.000                  | 4720
          | 0                                        | 0.287
                     |
| 2015-03-19 00:59:59.000                  | 6854
          | 0                                        | 3.6189999999999998
                    |
| 2015-03-19 01:59:59.000                  | 4720
          | 0                                        | 0.323
                     |
| 2015-03-19 01:59:59.000                  | 6854
          | 0                                        | 2.556
                     |
| 2015-03-19 02:59:59.000                  | 4720
          | 0                                        | 0.37
                    |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+

Ok - I will try to prepare a sample csv with DDL/DML.

Thanks again,
Marek


2015-04-06 22:06 GMT+02:00 James Taylor <ja...@apache.org>:

> Hi Marek,
> Thanks for the additional information. If you could answer my earlier
> questions, that would be helpful.
>
> If you can't repro with a simple test case, then how about attaching a
> csv dump of some of your data (the smallest amount of data that repros
> the issue) to a JIRA along with the CREATE TABLE statement and the
> query?
>
> Thanks,
> James
>
> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
> <ma...@gmail.com> wrote:
> > Hi Guys - thanks for your messages.
> >
> > I did another round of testing I found that if I use to_char instead of
> > trunc function I'm getting the expected result:
> >
> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id
> in
> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> > |               TO_CHAR(DT)                |                    ID
> > |                  STREAM                  |                SUM(USAGE)
> > |
> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> > | 2015-03-19                               | 4720
> > | 0                                        | 8.405999999999999
> > |
> > | 2015-03-19                               | 6854
> > | 0                                        | 28.339000000000006
> > |
> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >
> > the same query with trunc returns 48 rows (24 rows per id)
> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id in
> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
> > e.g.:
> > ....
> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
> > | 0.406                                    |
> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
> > | 0.9490000000000001                       |
> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
> > | 0.332                                    |
> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
> > | 2.738                                    |
> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
> > | 0.33499999999999996                      |
> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
> > | 2.421                                    |
> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
> > | 0.45599999999999996                      |
> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
> > | 2.167                                    |
> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >
> > so for some reason grouping by trunc is not working...but using to_char
> is.
> > This applies for table loaded using psql from csv file.
> >
> > When tried to create a sample table and populate it with upserts
> everything
> > worked as expected in both cases:
> >
> > create table usage (dt time not null ,id integer not null,stream integer
> not
> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
> >
> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
> >
> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY') as
> > day,id,stream,sum(usage) from usage where id in (100,200) group by
> > trunc(dt,'DAY'),stream,id;
> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> > |           DAY           |                    ID                    |
> > STREAM                  |                SUM(USAGE)                |
> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> > | 2015-04-01 00:00:00.000 | 100                                      | 0
> > | 5.0                                      |
> > | 2015-04-01 00:00:00.000 | 200                                      | 0
> > | 10.0                                     |
> >
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> >
> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> > |               TO_CHAR(DT)                |                    ID
> > |                  STREAM                  |                SUM(USAGE)
> > |
> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> > | 2015-04-01                               | 100
> > | 0                                        | 5.0
> > |
> > | 2015-04-01                               | 200
> > | 0                                        | 10.0
> > |
> >
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> > 2 rows selected (1.49 seconds)
> >
> >
> > Shall I open a jira for that?
> >
> > Regards,
> > Marek
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2015-04-06 20:16 GMT+02:00 James Taylor <ja...@apache.org>:
> >>
> >> Hi Marek,
> >> How did you input the data and what does your CREATE TABLE/VIEW
> >> statement look like? What version of Phoenix and HBase are you using?
> >>
> >> Also, would you mind running the following query and letting us know the
> >> output?
> >>
> >> select to_char(hu_ts,'yyyy-MM-dd
> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
> >> from se_dwh.homes_usage_hour
> >> where hu_ho_id in (4720,6854);
> >>
> >> Thanks,
> >> James
> >>
> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <ga...@gmail.com>
> >> wrote:
> >> > That certainly looks like a bug. Would it be possible to make a small
> >> > reproducible test case and if possible, log this in the Phoenix JIRA
> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
> >> >
> >> > - Gabriel
> >> >
> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
> >> > <ma...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi All,
> >> >> I came across a weird situation while running a query with group by.
> >> >> I executed 2 queries:
> >> >>
> >> >> 1)
> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >> >>
> >> >> | 2015-03-19 00:00:00.000 | 4720
>  |
> >> >> 0
> >> >> | 0.45599999999999996                      |
> >> >> | 2015-03-19 00:00:00.000 | 6854
>  |
> >> >> 0
> >> >> | 2.167                                    |
> >> >>
> >> >>
> >> >> 2)
> >> >>
> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >> >>
> >> >> | 2015-03-19 00:00:00.000 | 4720
>  |
> >> >> 0
> >> >> | 8.405999999999999                        |
> >> >>
> >> >>
> >> >> The only difference is that in the first case I included 2
> >> >> ids(4720,6854)
> >> >> and in the other one only 1 (4720).
> >> >> The result for hu_ho_id=4720 should be the same in both case but it
> >> >> isn't
> >> >> (0.45 vs 8.4).
> >> >> The second result(8.4) is correct.
> >> >>
> >> >> Am I doing something wrong?
> >> >>
> >> >> Regards,
> >> >> Marek
> >> >>
> >> >
> >
> >
>

Re: group by problem

Posted by James Taylor <ja...@apache.org>.

Hi Marek,
Thanks for the additional information. If you could answer my earlier
questions, that would be helpful.

If you can't repro with a simple test case, then how about attaching a
csv dump of some of your data (the smallest amount of data that repros
the issue) to a JIRA along with the CREATE TABLE statement and the
query?

Thanks,
James

On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
<ma...@gmail.com> wrote:
> Hi Guys - thanks for your messages.
>
> I did another round of testing I found that if I use to_char instead of
> trunc function I'm getting the expected result:
>
> select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
> (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> |               TO_CHAR(DT)                |                    ID
> |                  STREAM                  |                SUM(USAGE)
> |
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> | 2015-03-19                               | 4720
> | 0                                        | 8.405999999999999
> |
> | 2015-03-19                               | 6854
> | 0                                        | 28.339000000000006
> |
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>
> the same query with trunc returns 48 rows (24 rows per id)
> select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id in
> (4720,6854) group by trunc(dt,'DAY'),stream,id;
> e.g.:
> ....
> | 2015-03-19 00:00:00.000 | 4720                                     | 0
> | 0.406                                    |
> | 2015-03-19 00:00:00.000 | 6854                                     | 0
> | 0.9490000000000001                       |
> | 2015-03-19 00:00:00.000 | 4720                                     | 0
> | 0.332                                    |
> | 2015-03-19 00:00:00.000 | 6854                                     | 0
> | 2.738                                    |
> | 2015-03-19 00:00:00.000 | 4720                                     | 0
> | 0.33499999999999996                      |
> | 2015-03-19 00:00:00.000 | 6854                                     | 0
> | 2.421                                    |
> | 2015-03-19 00:00:00.000 | 4720                                     | 0
> | 0.45599999999999996                      |
> | 2015-03-19 00:00:00.000 | 6854                                     | 0
> | 2.167                                    |
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>
> so for some reason grouping by trunc is not working...but using to_char is.
> This applies for table loaded using psql from csv file.
>
> When tried to create a sample table and populate it with upserts everything
> worked as expected in both cases:
>
> create table usage (dt time not null ,id integer not null,stream integer not
> null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
> UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
> UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
>
> UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
> UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
>
> 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY') as
> day,id,stream,sum(usage) from usage where id in (100,200) group by
> trunc(dt,'DAY'),stream,id;
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> |           DAY           |                    ID                    |
> STREAM                  |                SUM(USAGE)                |
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> | 2015-04-01 00:00:00.000 | 100                                      | 0
> | 5.0                                      |
> | 2015-04-01 00:00:00.000 | 200                                      | 0
> | 10.0                                     |
> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>
> 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
> to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
> (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> |               TO_CHAR(DT)                |                    ID
> |                  STREAM                  |                SUM(USAGE)
> |
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> | 2015-04-01                               | 100
> | 0                                        | 5.0
> |
> | 2015-04-01                               | 200
> | 0                                        | 10.0
> |
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> 2 rows selected (1.49 seconds)
>
>
> Shall I open a jira for that?
>
> Regards,
> Marek
>
>
>
>
>
>
>
>
>
>
>
> 2015-04-06 20:16 GMT+02:00 James Taylor <ja...@apache.org>:
>>
>> Hi Marek,
>> How did you input the data and what does your CREATE TABLE/VIEW
>> statement look like? What version of Phoenix and HBase are you using?
>>
>> Also, would you mind running the following query and letting us know the
>> output?
>>
>> select to_char(hu_ts,'yyyy-MM-dd
>> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
>> from se_dwh.homes_usage_hour
>> where hu_ho_id in (4720,6854);
>>
>> Thanks,
>> James
>>
>> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <ga...@gmail.com>
>> wrote:
>> > That certainly looks like a bug. Would it be possible to make a small
>> > reproducible test case and if possible, log this in the Phoenix JIRA
>> > (https://issues.apache.org/jira/browse/PHOENIX) ?
>> >
>> > - Gabriel
>> >
>> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
>> > <ma...@gmail.com>
>> > wrote:
>> >>
>> >> Hi All,
>> >> I came across a weird situation while running a query with group by.
>> >> I executed 2 queries:
>> >>
>> >> 1)
>> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
>> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
>> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>> >>
>> >> | 2015-03-19 00:00:00.000 | 4720                                     |
>> >> 0
>> >> | 0.45599999999999996                      |
>> >> | 2015-03-19 00:00:00.000 | 6854                                     |
>> >> 0
>> >> | 2.167                                    |
>> >>
>> >>
>> >> 2)
>> >>
>> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
>> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>> >>
>> >> | 2015-03-19 00:00:00.000 | 4720                                     |
>> >> 0
>> >> | 8.405999999999999                        |
>> >>
>> >>
>> >> The only difference is that in the first case I included 2
>> >> ids(4720,6854)
>> >> and in the other one only 1 (4720).
>> >> The result for hu_ho_id=4720 should be the same in both case but it
>> >> isn't
>> >> (0.45 vs 8.4).
>> >> The second result(8.4) is correct.
>> >>
>> >> Am I doing something wrong?
>> >>
>> >> Regards,
>> >> Marek
>> >>
>> >
>
>

Re: group by problem

Posted by Marek Wiewiorka <ma...@gmail.com>.

Hi Guys - thanks for your messages.

I did another round of testing I found that if I use to_char instead of
trunc function I'm getting the expected result:

select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
(4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
|               TO_CHAR(DT)                |                    ID
           |                  STREAM                  |
 SUM(USAGE)                |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
| 2015-03-19                               | 4720
          | 0                                        | 8.405999999999999
                     |
| 2015-03-19                               | 6854
          | 0                                        | 28.339000000000006
                    |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+

the same query with trunc returns 48 rows (24 rows per id)
select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id in
(4720,6854) group by trunc(dt,'DAY'),stream,id;
e.g.:
....
| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.406
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 0.9490000000000001
    |
| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.332
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 2.738
     |
| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.33499999999999996
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 2.421
     |
| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.45599999999999996
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 2.167
     |
+-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+

so for some reason grouping by trunc is not working...but using to_char is.
This applies for table loaded using psql from csv file.

When tried to create a sample table and populate it with upserts everything
worked as expected in both cases:

create table usage (dt time not null ,id integer not null,stream integer
not null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);

UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);

0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY') as
day,id,stream,sum(usage) from usage where id in (100,200) group by
trunc(dt,'DAY'),stream,id;
+-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
|           DAY           |                    ID                    |
             STREAM                  |                SUM(USAGE)
     |
+-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
| 2015-04-01 00:00:00.000 | 100                                      | 0
                                     | 5.0
     |
| 2015-04-01 00:00:00.000 | 200                                      | 0
                                     | 10.0
    |
+-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+

0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
(100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
|               TO_CHAR(DT)                |                    ID
           |                  STREAM                  |
 SUM(USAGE)                |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
| 2015-04-01                               | 100
           | 0                                        | 5.0
                     |
| 2015-04-01                               | 200
           | 0                                        | 10.0
                      |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
2 rows selected (1.49 seconds)


Shall I open a jira for that?

Regards,
Marek











2015-04-06 20:16 GMT+02:00 James Taylor <ja...@apache.org>:

> Hi Marek,
> How did you input the data and what does your CREATE TABLE/VIEW
> statement look like? What version of Phoenix and HBase are you using?
>
> Also, would you mind running the following query and letting us know the
> output?
>
> select to_char(hu_ts,'yyyy-MM-dd
> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
> from se_dwh.homes_usage_hour
> where hu_ho_id in (4720,6854);
>
> Thanks,
> James
>
> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <ga...@gmail.com>
> wrote:
> > That certainly looks like a bug. Would it be possible to make a small
> > reproducible test case and if possible, log this in the Phoenix JIRA
> > (https://issues.apache.org/jira/browse/PHOENIX) ?
> >
> > - Gabriel
> >
> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka <
> marek.wiewiorka@gmail.com>
> > wrote:
> >>
> >> Hi All,
> >> I came across a weird situation while running a query with group by.
> >> I executed 2 queries:
> >>
> >> 1)
> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >>
> >> | 2015-03-19 00:00:00.000 | 4720                                     | 0
> >> | 0.45599999999999996                      |
> >> | 2015-03-19 00:00:00.000 | 6854                                     | 0
> >> | 2.167                                    |
> >>
> >>
> >> 2)
> >>
> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >>
> >> | 2015-03-19 00:00:00.000 | 4720                                     | 0
> >> | 8.405999999999999                        |
> >>
> >>
> >> The only difference is that in the first case I included 2
> ids(4720,6854)
> >> and in the other one only 1 (4720).
> >> The result for hu_ho_id=4720 should be the same in both case but it
> isn't
> >> (0.45 vs 8.4).
> >> The second result(8.4) is correct.
> >>
> >> Am I doing something wrong?
> >>
> >> Regards,
> >> Marek
> >>
> >
>

Re: group by problem

Posted by James Taylor <ja...@apache.org>.

Hi Marek,
How did you input the data and what does your CREATE TABLE/VIEW
statement look like? What version of Phoenix and HBase are you using?

Also, would you mind running the following query and letting us know the output?

select to_char(hu_ts,'yyyy-MM-dd HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
from se_dwh.homes_usage_hour
where hu_ho_id in (4720,6854);

Thanks,
James

On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <ga...@gmail.com> wrote:
> That certainly looks like a bug. Would it be possible to make a small
> reproducible test case and if possible, log this in the Phoenix JIRA
> (https://issues.apache.org/jira/browse/PHOENIX) ?
>
> - Gabriel
>
> On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka <ma...@gmail.com>
> wrote:
>>
>> Hi All,
>> I came across a weird situation while running a query with group by.
>> I executed 2 queries:
>>
>> 1)
>> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
>> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
>> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>
>> | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> | 0.45599999999999996                      |
>> | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> | 2.167                                    |
>>
>>
>> 2)
>>
>> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
>> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>>
>> | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> | 8.405999999999999                        |
>>
>>
>> The only difference is that in the first case I included 2 ids(4720,6854)
>> and in the other one only 1 (4720).
>> The result for hu_ho_id=4720 should be the same in both case but it isn't
>> (0.45 vs 8.4).
>> The second result(8.4) is correct.
>>
>> Am I doing something wrong?
>>
>> Regards,
>> Marek
>>
>

Re: group by problem

Posted by Gabriel Reid <ga...@gmail.com>.

That certainly looks like a bug. Would it be possible to make a small
reproducible test case and if possible, log this in the Phoenix JIRA (
https://issues.apache.org/jira/browse/PHOENIX) ?

- Gabriel

On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka <ma...@gmail.com>
wrote:

> Hi All,
> I came across a weird situation while running a query with group by.
> I executed 2 queries:
>
> 1)
> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>
> | 2015-03-19 00:00:00.000 | 4720                                     | 0
>                                      | 0.45599999999999996
>      |
> | 2015-03-19 00:00:00.000 | 6854                                     | 0
>                                      | 2.167
>      |
>
>
> 2)
>
> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>
> | 2015-03-19 00:00:00.000 | 4720                                     | 0
>                                      | 8.405999999999999
>      |
>
>
> The only difference is that in the first case I included 2 ids(4720,6854)
> and in the other one only 1 (4720).
> The result for hu_ho_id=4720 should be the same in both case but it isn't
> (0.45 vs 8.4).
> The second result(8.4) is correct.
>
> Am I doing something wrong?
>
> Regards,
> Marek
>
>