You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Fawaz Enaya <m....@gmail.com> on 2016/10/30 11:42:21 UTC

Phoenix Slow Problem

Hi All in this great project,


I have an HBase cluster of four nodes, I use Phoenix to access HBase, but I
do not know why its too much slow to execute SELECT count(*) for table
contains 5 million records it takes 8 seconds.
Below is the explain for may select statement

CLIENT 6-CHUNK 9531695 ROWS 629145639 BYTES PARALLEL 1-WAY FULL SCAN OVER
TABLE* |*

*| *    SERVER FILTER BY FIRST KEY ONLY
     * |*

*| *    SERVER AGGREGATE INTO SINGLE ROW
Anyone can help.

Many Thanks
--
Thanks & regards,

Re: Phoenix Slow Problem

Posted by Mich Talebzadeh <mi...@gmail.com>.

These are interesting points Jonathan.

You mentioned row size, disk throughput (something like svc_t or service
time is the average service time for IO requests in ms) plus number of
regions

Is there any empirical rule on this ? For example when to add another
region server etc

Thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 November 2016 at 03:34, Jonathan Leech <jo...@gmail.com> wrote:

> Make sure number of regions is at least number of physical disks on
> cluster, if not split or salt. Do the math based on row size and target
> performance on disk throughput, number of regions etc. If necessary, add
> servers or disks. Also look at hbase cache settings, JVM heap sizes, GC
> settings etc. Depending on the data, compression can improve performance.
> Snappy typically does less compression than gzip but at less CPU cost. Gzip
> can get pretty high ratios but writes are more costly than reads, so major
> compactions get get backed up. For typical data both will likely increase
> read throughout. Depending on how often rows are updated, removed, added,
> change default hbase major compaction interval, or force major compaction
> after large updates. Also, unless counting rows is your use case, don't
> worry about how long it takes to count them. Base expectations on expected
> use cases. With the overhead of Phoenix query parsing, threading in the
> client, etc etc you probably won't do much better than sub-second on
> aggregates on 1+ mil rows.
>
> On Oct 31, 2016, at 5:19 PM, Fawaz Enaya <m....@gmail.com> wrote:
>
> Thanks for your answer but why it gives 1 way parallel and can not be more?
>
> On Sunday, 30 October 2016, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> If you create a secondary index in Phoenix on the table on single or
>> selected columns, that index (which will be added to Hbase) will be used to
>> return data. For example in below MARKETDATAHBASE_IDX1 is an index on table
>> MARKETDATAHBASE and is used by the query
>>
>>
>>  0: jdbc:phoenix:rhes564:2181> EXPLAIN select count(1) from
>> MARKETDATAHBASE;
>> +--------------------------------------------------------------------+
>> |                                PLAN                                |
>> +--------------------------------------------------------------------+
>> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER MARKETDATAHBASE_IDX1  |
>> |    * SERVER FILTER BY FIRST KEY ONLY*                                |
>> |     SERVER AGGREGATE INTO SINGLE ROW                               |
>> +--------------------------------------------------------------------+
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 30 October 2016 at 11:42, Fawaz Enaya <m....@gmail.com> wrote:
>>
>>> Hi All in this great project,
>>>
>>>
>>> I have an HBase cluster of four nodes, I use Phoenix to access HBase,
>>> but I do not know why its too much slow to execute SELECT count(*) for
>>> table contains 5 million records it takes 8 seconds.
>>> Below is the explain for may select statement
>>>
>>> CLIENT 6-CHUNK 9531695 ROWS 629145639 BYTES PARALLEL 1-WAY FULL SCAN
>>> OVER TABLE* |*
>>>
>>> *| *    SERVER FILTER BY FIRST KEY ONLY
>>>          * |*
>>>
>>> *| *    SERVER AGGREGATE INTO SINGLE ROW
>>> Anyone can help.
>>>
>>> Many Thanks
>>> --
>>> Thanks & regards,
>>>
>>>
>>
>
> --
> --
> Thanks & regards,
>
>
>

Re: Phoenix Slow Problem

Posted by Jonathan Leech <jo...@gmail.com>.

Make sure number of regions is at least number of physical disks on cluster, if not split or salt. Do the math based on row size and target performance on disk throughput, number of regions etc. If necessary, add servers or disks. Also look at hbase cache settings, JVM heap sizes, GC settings etc. Depending on the data, compression can improve performance. Snappy typically does less compression than gzip but at less CPU cost. Gzip can get pretty high ratios but writes are more costly than reads, so major compactions get get backed up. For typical data both will likely increase read throughout. Depending on how often rows are updated, removed, added, change default hbase major compaction interval, or force major compaction after large updates. Also, unless counting rows is your use case, don't worry about how long it takes to count them. Base expectations on expected use cases. With the overhead of Phoenix query parsing, threading in the client, etc etc you probably won't do much better than sub-second on aggregates on 1+ mil rows.

> On Oct 31, 2016, at 5:19 PM, Fawaz Enaya <m....@gmail.com> wrote:
> 
> Thanks for your answer but why it gives 1 way parallel and can not be more?
> 
>> On Sunday, 30 October 2016, Mich Talebzadeh <mi...@gmail.com> wrote:
>> If you create a secondary index in Phoenix on the table on single or selected columns, that index (which will be added to Hbase) will be used to return data. For example in below MARKETDATAHBASE_IDX1 is an index on table MARKETDATAHBASE and is used by the query
>> 
>> 
>>  0: jdbc:phoenix:rhes564:2181> EXPLAIN select count(1) from MARKETDATAHBASE;
>> +--------------------------------------------------------------------+
>> |                                PLAN                                |
>> +--------------------------------------------------------------------+
>> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER MARKETDATAHBASE_IDX1  |
>> |     SERVER FILTER BY FIRST KEY ONLY                                |
>> |     SERVER AGGREGATE INTO SINGLE ROW                               |
>> +--------------------------------------------------------------------+
>> 
>> HTH
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> http://talebzadehmich.wordpress.com
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>>  
>> 
>>> On 30 October 2016 at 11:42, Fawaz Enaya <m....@gmail.com> wrote:
>>> Hi All in this great project,
>>> 
>>> 
>>> I have an HBase cluster of four nodes, I use Phoenix to access HBase, but I do not know why its too much slow to execute SELECT count(*) for table contains 5 million records it takes 8 seconds.
>>> Below is the explain for may select statement
>>> CLIENT 6-CHUNK 9531695 ROWS 629145639 BYTES PARALLEL 1-WAY FULL SCAN OVER TABLE |
>>> 
>>> |     SERVER FILTER BY FIRST KEY ONLY                                          |
>>> 
>>> |     SERVER AGGREGATE INTO SINGLE ROW 
>>> 
>>> Anyone can help.
>>> 
>>> Many Thanks
>>> --
>>> Thanks & regards,
>>> 
>> 
> 
> 
> -- 
> --
> Thanks & regards,
> 
>

Re: Phoenix Slow Problem

Posted by Sergey Soldatov <se...@gmail.com>.

Hi Fawaz,
Actually explain plan says that there will be 6 parallel full scans. I
believe that's the number of regions you have. If you want to increase the
number of parallel scans you may think about setting
phoenix.stats.guidepost.width to something smaller than default value and
scans will be executed for smaller chunks (and it will be faster) or split
table to increase number of regions.

Thanks,
Sergey

On Mon, Oct 31, 2016 at 4:19 PM, Fawaz Enaya <m....@gmail.com>
wrote:

> Thanks for your answer but why it gives 1 way parallel and can not be more?
>
>
> On Sunday, 30 October 2016, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> If you create a secondary index in Phoenix on the table on single or
>> selected columns, that index (which will be added to Hbase) will be used to
>> return data. For example in below MARKETDATAHBASE_IDX1 is an index on table
>> MARKETDATAHBASE and is used by the query
>>
>>
>>  0: jdbc:phoenix:rhes564:2181> EXPLAIN select count(1) from
>> MARKETDATAHBASE;
>> +--------------------------------------------------------------------+
>> |                                PLAN                                |
>> +--------------------------------------------------------------------+
>> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER MARKETDATAHBASE_IDX1  |
>> |    * SERVER FILTER BY FIRST KEY ONLY*                                |
>> |     SERVER AGGREGATE INTO SINGLE ROW                               |
>> +--------------------------------------------------------------------+
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 30 October 2016 at 11:42, Fawaz Enaya <m....@gmail.com> wrote:
>>
>>> Hi All in this great project,
>>>
>>>
>>> I have an HBase cluster of four nodes, I use Phoenix to access HBase,
>>> but I do not know why its too much slow to execute SELECT count(*) for
>>> table contains 5 million records it takes 8 seconds.
>>> Below is the explain for may select statement
>>>
>>> CLIENT 6-CHUNK 9531695 ROWS 629145639 BYTES PARALLEL 1-WAY FULL SCAN
>>> OVER TABLE* |*
>>>
>>> *| *    SERVER FILTER BY FIRST KEY ONLY
>>>          * |*
>>>
>>> *| *    SERVER AGGREGATE INTO SINGLE ROW
>>> Anyone can help.
>>>
>>> Many Thanks
>>> --
>>> Thanks & regards,
>>>
>>>
>>
>
> --
> --
> Thanks & regards,
>
>
>

Re: Phoenix Slow Problem

Posted by Fawaz Enaya <m....@gmail.com>.

Thanks for your answer but why it gives 1 way parallel and can not be more?

On Sunday, 30 October 2016, Mich Talebzadeh <mi...@gmail.com>
wrote:

> If you create a secondary index in Phoenix on the table on single or
> selected columns, that index (which will be added to Hbase) will be used to
> return data. For example in below MARKETDATAHBASE_IDX1 is an index on table
> MARKETDATAHBASE and is used by the query
>
>
>  0: jdbc:phoenix:rhes564:2181> EXPLAIN select count(1) from
> MARKETDATAHBASE;
> +--------------------------------------------------------------------+
> |                                PLAN                                |
> +--------------------------------------------------------------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER MARKETDATAHBASE_IDX1  |
> |    * SERVER FILTER BY FIRST KEY ONLY*                                |
> |     SERVER AGGREGATE INTO SINGLE ROW                               |
> +--------------------------------------------------------------------+
>
> HTH
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 30 October 2016 at 11:42, Fawaz Enaya <m.fawaz.enaya@gmail.com
> <javascript:_e(%7B%7D,'cvml','m.fawaz.enaya@gmail.com');>> wrote:
>
>> Hi All in this great project,
>>
>>
>> I have an HBase cluster of four nodes, I use Phoenix to access HBase, but
>> I do not know why its too much slow to execute SELECT count(*) for table
>> contains 5 million records it takes 8 seconds.
>> Below is the explain for may select statement
>>
>> CLIENT 6-CHUNK 9531695 ROWS 629145639 BYTES PARALLEL 1-WAY FULL SCAN OVER
>> TABLE* |*
>>
>> *| *    SERVER FILTER BY FIRST KEY ONLY
>>        * |*
>>
>> *| *    SERVER AGGREGATE INTO SINGLE ROW
>> Anyone can help.
>>
>> Many Thanks
>> --
>> Thanks & regards,
>>
>>
>

-- 
--
Thanks & regards,

Re: Phoenix Slow Problem

Posted by Mich Talebzadeh <mi...@gmail.com>.

If you create a secondary index in Phoenix on the table on single or
selected columns, that index (which will be added to Hbase) will be used to
return data. For example in below MARKETDATAHBASE_IDX1 is an index on table
MARKETDATAHBASE and is used by the query

 0: jdbc:phoenix:rhes564:2181> EXPLAIN select count(1) from MARKETDATAHBASE;
+--------------------------------------------------------------------+
|                                PLAN                                |
+--------------------------------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER MARKETDATAHBASE_IDX1  |
|    * SERVER FILTER BY FIRST KEY ONLY*                                |
|     SERVER AGGREGATE INTO SINGLE ROW                               |
+--------------------------------------------------------------------+

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 30 October 2016 at 11:42, Fawaz Enaya <m....@gmail.com> wrote:

> Hi All in this great project,
>
>
> I have an HBase cluster of four nodes, I use Phoenix to access HBase, but
> I do not know why its too much slow to execute SELECT count(*) for table
> contains 5 million records it takes 8 seconds.
> Below is the explain for may select statement
>
> CLIENT 6-CHUNK 9531695 ROWS 629145639 BYTES PARALLEL 1-WAY FULL SCAN OVER
> TABLE* |*
>
> *| *    SERVER FILTER BY FIRST KEY ONLY
>        * |*
>
> *| *    SERVER AGGREGATE INTO SINGLE ROW
> Anyone can help.
>
> Many Thanks
> --
> Thanks & regards,
>
>