You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Paul O'Riordan <re...@gmail.com> on 2016/11/10 19:34:16 UTC

Performance issue

Hi,

I'm using phoenix-4.7.0-HBase-1.2-client on AWS EMR 5.1 and seeing a
strange issue. I've created the following table with approximately 300
columns (omitted below).

CREATE TABLE EventLog(
    AssetID integer NOT NULL,
    time timestamp NOT NULL,
    c.Col0 varchar(10) NULL,
    c.Col1 decimal(9, 6) NULL,
    c.Col2 decimal(9, 6) NULL,
    c.Col3 decimal(9, 6) NULL,
    c.Col4 decimal(9, 3) NULL,
    c.Col5 decimal(9, 3) NULL,
    c.Col6 tinyint NULL,
    c.Col7 boolean NULL,
    c.Col8 boolean NULL,
    c.Col9 boolean NULL,
    c.Col10 boolean NULL,
    c.Col11 boolean NULL,
    ....
    c.Col300 boolean NULL,
    constraint pk primary key (AssetID, time)
) immutable_rows=true, salt_buckets=4, compression='snappy';

I've populated this table with a small amount of data ~ 3m records (1.1 GB
on disk). Attempting to select the entire 'c' column family through Phoenix
is extremely slow compared to HBase queries on the same table. I'm doing a
simple rowkey lookup as below and selecting the entire 'c' column family.
The query is consistently taking > 13 seconds to return the entire column
family. Even with an empty table.

If I limit the query to a small number of columns, it will complete in ~
250ms.

0: jdbc:phoenix:10.27.80.116:2181> select /*+ SMALL */ c.* from EventLog
where AssetID = 1 and time = to_timestamp('2016-09-08 09:35:52.050');
...
1 row selected (14.029 seconds)

Plan:

0: jdbc:phoenix:10.27.80.116:2181> explain select /*+ SMALL */ c.* from
EventLog where AssetID = 1 and timestamp = to_timestamp('2016-09-08
09:35:52.050');

+-----------------------------------------------------------
--------------------------------+
| PLAN
                 |
+-----------------------------------------------------------
--------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN SMALL POINT LOOKUP ON 1 KEY
OVER EVENTLOG       |
+-----------------------------------------------------------
--------------------------------+

Executing a similar query in the HBase shell doesn't have the same issue.

hbase(main):008:0> get 'EVENTLOG', "\x00\x80\x00\x00\x01\x80\x00\
x01W\x095\xD4\xE4\x00\x00\x00\x00"
...
C:COL97 timestamp=1478740732629, value=\x00
C:COL98 timestamp=1478740732629, value=\x00
C:COL99 timestamp=1478740732629, value=\x00

326 row(s) in 0.1150 seconds


I realise that Phoenix-4.7.0-HBase-1.2-client seems to be an Amazon fork of
Phoenix, but any thoughts on where my problem could be?


Cheers,


Paul

Re: Performance issue

Posted by James Taylor <ja...@apache.org>.
Hi Paul,
A couple of things to try:
- Try separately with the SEEK_TO_COLUMN and NO_SEEK_TO_COLUMN hint: SELECT
/*+ SEEK_TO_COLUMN SMALL */ and then SELECT /*+ NO_SEEK_TO_COLUMN SMALL */
- Try SELECT * instead of SELECT c.*

Also, any difference between the first time you run it and subsequent times?

Please file a JIRA as well.

Thanks,
James

On Thu, Nov 10, 2016 at 11:34 AM, Paul O'Riordan <re...@gmail.com>
wrote:

> Hi,
>
> I'm using phoenix-4.7.0-HBase-1.2-client on AWS EMR 5.1 and seeing a
> strange issue. I've created the following table with approximately 300
> columns (omitted below).
>
> CREATE TABLE EventLog(
>     AssetID integer NOT NULL,
>     time timestamp NOT NULL,
>     c.Col0 varchar(10) NULL,
>     c.Col1 decimal(9, 6) NULL,
>     c.Col2 decimal(9, 6) NULL,
>     c.Col3 decimal(9, 6) NULL,
>     c.Col4 decimal(9, 3) NULL,
>     c.Col5 decimal(9, 3) NULL,
>     c.Col6 tinyint NULL,
>     c.Col7 boolean NULL,
>     c.Col8 boolean NULL,
>     c.Col9 boolean NULL,
>     c.Col10 boolean NULL,
>     c.Col11 boolean NULL,
>     ....
>     c.Col300 boolean NULL,
>     constraint pk primary key (AssetID, time)
> ) immutable_rows=true, salt_buckets=4, compression='snappy';
>
> I've populated this table with a small amount of data ~ 3m records (1.1 GB
> on disk). Attempting to select the entire 'c' column family through Phoenix
> is extremely slow compared to HBase queries on the same table. I'm doing a
> simple rowkey lookup as below and selecting the entire 'c' column family.
> The query is consistently taking > 13 seconds to return the entire column
> family. Even with an empty table.
>
> If I limit the query to a small number of columns, it will complete in ~
> 250ms.
>
> 0: jdbc:phoenix:10.27.80.116:2181> select /*+ SMALL */ c.* from EventLog
> where AssetID = 1 and time = to_timestamp('2016-09-08 09:35:52.050');
> ...
> 1 row selected (14.029 seconds)
>
> Plan:
>
> 0: jdbc:phoenix:10.27.80.116:2181> explain select /*+ SMALL */ c.* from
> EventLog where AssetID = 1 and timestamp = to_timestamp('2016-09-08
> 09:35:52.050');
>
> +-----------------------------------------------------------
> --------------------------------+
> | PLAN
>                  |
> +-----------------------------------------------------------
> --------------------------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN SMALL POINT LOOKUP ON 1 KEY
> OVER EVENTLOG       |
> +-----------------------------------------------------------
> --------------------------------+
>
> Executing a similar query in the HBase shell doesn't have the same issue.
>
> hbase(main):008:0> get 'EVENTLOG', "\x00\x80\x00\x00\x01\x80\x00\
> x01W\x095\xD4\xE4\x00\x00\x00\x00"
> ...
> C:COL97 timestamp=1478740732629, value=\x00
> C:COL98 timestamp=1478740732629, value=\x00
> C:COL99 timestamp=1478740732629, value=\x00
>
> 326 row(s) in 0.1150 seconds
>
>
> I realise that Phoenix-4.7.0-HBase-1.2-client seems to be an Amazon fork
> of Phoenix, but any thoughts on where my problem could be?
>
>
> Cheers,
>
>
> Paul
>