You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Mujtaba Chohan <mu...@apache.org> on 2016/08/31 17:40:07 UTC

Re: Phoenix has slow response times compared to HBase

Something seems inherently wrong in these test results.

* How are you running Phoenix queries? Were the concurrent Phoenix queries
using the same JVM? Was the JVM restarted after changing number of
concurrent users?
* Is the response time plotted when query is executed for the first time or
second or average of both?
* Is the UUID filtered on randomly distributed? Does UUID match a single
row?
* It seems that even non-concurrent Phoenix query which filters on UUID
takes 500ms in your environment. Can you try the same query in Sqlline a
few times and see how much time it takes for each run?
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
* Can you share your table schema and how you ran Phoenix queries and your
HBase equivalent code?

On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
e.narros@elsevier.com> wrote:

> Hi,
>
>
> We are exploring starting to use Phoenix and have done some load tests to
> see whether Phoenix would scale. We have noted that compared to HBase,
> Phoenix response times have a much slower average as the number of
> concurrent users increases. We are trying to understand whether this is
> expected or there is something we are missing out.
>
>
> This is the test we have performed:
>
>
>    - Create table (20 columns) and load it with 400 million records
>    indexed via a column called 'uuid'.
>    - Perform the following queries using 10,20,100,200,400 and 600 users
>    per second, each user will perform each query twice:
>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where
>       uuid = ?
>       - Hbase equivalent to: select * from schema.DOCUMENTS where uuid = ?
>    - The results are attached and they show that Phoenix response times
>    are at least an order of magnitude above those of HBase
>
> The tests were run from the Master node of a CDH5.7.2 cluster with Phoenix
> 4.7.0.
>
> Are these test results expected?
>
> Kind Regards,
>
> Edu
>
>
> ------------------------------
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>

Re: Phoenix has slow response times compared to HBase

Posted by Andrew Purtell <an...@gmail.com>.

Many 0.98 releases include a client change. Possibly of no consequence, but since Phoenix at runtime is an amalgam of HBase and Phoenix code if the objective is testing your latest I'd think it prudent to do the same with HBase. Several hundred changes from .17 to .22 overall. 

> On Sep 3, 2016, at 10:09 AM, James Taylor <ja...@apache.org> wrote:
> 
> > Why not 0.98.21?
> I think it's just because Mujtaba already had 0.98.17 on that particular box. I don't expect for this particular benchmark it would change anything, though.
> 
>> On Sat, Sep 3, 2016 at 8:37 AM, Andrew Purtell <ap...@apache.org> wrote:
>> > HBase 0.98.17
>> 
>> Why not 0.98.21, our latest release at the time? And there's now 0.98.22.
>> 
>>> On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org> wrote:
>> 
>>> Here is the graph that I get simulating 1, 50 and 500 concurrent users from single client. Query time for Phoenix is highly comparable with direct HBase gets. 
>>> 
>>> See the chart below with query time (ms) for random point gets over large table that will not fit HBase block cache. Query/gets were executed for 1000 time for each user.
>>> 
>>> <image.png>
>>> Source code to execute gets/phoenix query simulating multiple users is at:
>>> 
>>>  directhbasemt.java
>>> 
>>>  directphoenixmt.java
>>> 
>>> Table DDL
>>> create table testuuid (k varchar not null primary key, a varchar, b varchar, c varchar, d varchar, e varchar, f varchar);
>>> 
>>> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this restricts how often server will check for metadata updates to improve performance
>>> 
>>> Table was filled with 68M rows.
>>> Phoenix 4.8/HBase 0.98.17 running on single machine.
>>> 
>>> //mujtaba
>>> 
>>> 
>>>> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:
>>>> Hi Mujtaba,
>>>> 
>>>> 
>>>> See the answers inline below:
>>>> 
>>>> 
>>>> * How are you running Phoenix queries? We are using apache-jmeter and the jdbc sampler.
>>>> * Were the concurrent Phoenix queries using the same JVM? Yes.
>>>> * Was the JVM restarted after changing number of concurrent users? Yes.
>>>> * Is the response time plotted when query is executed for the first time or second or average of both? Average. We see response times ranging significantly even via sqlline. i.e. the same query run 11 times sequentially takes anything between 17ms to around 489ms with no other load on the server.
>>>> * Is the UUID filtered on randomly distributed? Yes. 
>>>> * Does UUID match a single row? Yes.
>>>> * It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run? We run the same query 11 times via sqlline and these were the response times:
>>>> 1 row selected (0.489 seconds)
>>>> 1 row selected (0.279 seconds)
>>>> 1 row selected (0.227 seconds)
>>>> 1 row selected (0.22 seconds)
>>>> 1 row selected (0.17 seconds)
>>>> 1 row selected (0.152 seconds)
>>>> 1 row selected (0.129 seconds)
>>>> 1 row selected (0.17 seconds)
>>>> 1 row selected (0.153 seconds)
>>>> 1 row selected (0.259 seconds)
>>>> 1 row selected (0.102 seconds)
>>>> 
>>>> * What is the explain plan for your Phoenix query? CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS
>>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS table and reconnect Sqlline and execute the query again. I think the issue is that the response times vary a lot, with 600 concurrent users the same query can take anything between 2ms to 10s.
>>>> * Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code? It is a simple table with 15 columns, the primary key is the uuid which is of type VARCHAR(36). The hbase equivalent code is:
>>>>  HTableInterface hTable = pool.getTable("schema.DOCUMENTS");
>>>> 
>>>> Get get = new Get(toBytes(saltPrefix + uuid));
>>>> 
>>>> Result result = hTable.get(get);
>>>> 
>>>> * Any phoenix tuning defaults that you changed? No.
>>>> 
>>>> Kind Regards,
>>>> 
>>>> 
>>>> Edu
>>>> 
>>>> 
>>>> 
>>>>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org> wrote:
>>>>> Something seems inherently wrong in these test results.
>>>>> 
>>>>> * How are you running Phoenix queries? Were the concurrent Phoenix queries using the same JVM? Was the JVM restarted after changing number of concurrent users?
>>>>> * Is the response time plotted when query is executed for the first time or second or average of both?
>>>>> * Is the UUID filtered on randomly distributed? Does UUID match a single row?
>>>>> * It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run?
>>>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>>>> * Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> We are exploring starting to use Phoenix and have done some load tests to see whether Phoenix would scale. We have noted that compared to HBase, Phoenix response times have a much slower average as the number of concurrent users increases. We are trying to understand whether this is expected or there is something we are missing out.
>>>>>> 
>>>>>> 
>>>>>> This is the test we have performed:
>>>>>> 
>>>>>> Create table (20 columns) and load it with 400 million records indexed via a column called 'uuid'.
>>>>>> Perform the following queries using 10,20,100,200,400 and 600 users per second, each user will perform each query twice:
>>>>>> Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>>> Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where uuid = ?
>>>>>> Hbase equivalent to: select * from schema.DOCUMENTS where uuid = ?
>>>>>> The results are attached and they show that Phoenix response times are at least an order of magnitude above those of HBase
>>>>>> The tests were run from the Master node of a CDH5.7.2 cluster with Phoenix 4.7.0.
>>>>>> 
>>>>>> Are these test results expected?
>>>>>> 
>>>>>> Kind Regards,
>>>>>> 
>>>>>> Edu
>>>>>> 
>>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.
>>>> 
>>>> 
>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.
>> 
>> 
>> 
>> -- 
>> Best regards,
>> 
>>    - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>

Re: Phoenix has slow response times compared to HBase

Posted by James Taylor <ja...@apache.org>.

> Why not 0.98.21?
I think it's just because Mujtaba already had 0.98.17 on that particular
box. I don't expect for this particular benchmark it would change anything,
though.

On Sat, Sep 3, 2016 at 8:37 AM, Andrew Purtell <ap...@apache.org> wrote:

> > HBase 0.98.17
>
> Why not 0.98.21, our latest release at the time? And there's now 0.98.22.
>
> On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org> wrote:
>
>> Here is the graph that I get simulating 1, 50 and 500 concurrent users
>> from single client. Query time for Phoenix is highly comparable with direct
>> HBase gets.
>>
>> See the chart below with query time (ms) for random point gets over large
>> table that will not fit HBase block cache. Query/gets were executed for
>> 1000 time for each user.
>>
>> [image: Inline image 1]
>> Source code to execute gets/phoenix query simulating multiple users is at:
>> 
>>  directhbasemt.java
>> <https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>
>> 
>>  directphoenixmt.java
>> <https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>
>> 
>> Table DDL
>> create table testuuid (k varchar not null primary key, a varchar, b
>> varchar, c varchar, d varchar, e varchar, f varchar);
>>
>> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this
>> restricts how often server will check for metadata updates to improve
>> performance
>>
>> Table was filled with 68M rows.
>> Phoenix 4.8/HBase 0.98.17 running on single machine.
>>
>> //mujtaba
>>
>>
>> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <
>> e.narros@elsevier.com> wrote:
>>
>>> Hi Mujtaba,
>>>
>>>
>>> See the answers inline below:
>>>
>>>
>>> * How are you running Phoenix queries? *We are using apache-jmeter and
>>> the jdbc sampler.*
>>> * Were the concurrent Phoenix queries using the same JVM? *Yes.*
>>> * Was the JVM restarted after changing number of concurrent users?
>>> *Yes.*
>>> * Is the response time plotted when query is executed for the first time
>>> or second or average of both? *Average. We see response times ranging
>>> significantly even via sqlline. i.e. the same query run 11 times
>>> sequentially takes anything between 17ms to around 489ms with no other load
>>> on the server.*
>>> * Is the UUID filtered on randomly distributed? *Yes.*
>>> * Does UUID match a single row? *Yes.*
>>> * It seems that even non-concurrent Phoenix query which filters on UUID
>>> takes 500ms in your environment. Can you try the same query in Sqlline a
>>> few times and see how much time it takes for each run?
>>> *We run the same query 11 times via sqlline and these were the response
>>> times: *
>>> *1 row selected (0.489 seconds)*
>>> *1 row selected (0.279 seconds)*
>>> *1 row selected (0.227 seconds)*
>>> *1 row selected (0.22 seconds)*
>>> *1 row selected (0.17 seconds)*
>>> *1 row selected (0.152 seconds)*
>>> *1 row selected (0.129 seconds)*
>>> *1 row selected (0.17 seconds)*
>>> *1 row selected (0.153 seconds)*
>>> *1 row selected (0.259 seconds)*
>>> *1 row selected (0.102 seconds)*
>>>
>>> * What is the explain <https://phoenix.apache.org/language/#explain>
>>> plan for your Phoenix query? *CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN
>>> POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS*
>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>> table and reconnect Sqlline and execute the query again. *I think the
>>> issue is that the response times vary a lot, with 600 concurrent users the
>>> same query can take anything between 2ms to 10s.*
>>> * Can you share your table schema and how you ran Phoenix queries and
>>> your HBase equivalent code? *It is a simple table with 15 columns, the
>>> primary key is the uuid which is of type VARCHAR(36). The hbase equivalent
>>> code is:*
>>>
>>>
>>>
>>>
>>>
>>> * HTableInterface hTable = pool.getTable("schema.DOCUMENTS");Get get =
>>> new Get(toBytes(saltPrefix + uuid));Result result = hTable.get(get); *
>>>
>>> * Any phoenix tuning defaults that you changed? *No.*
>>>
>>> Kind Regards,
>>>
>>>
>>> Edu
>>>
>>>
>>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org>
>>> wrote:
>>>
>>>> Something seems inherently wrong in these test results.
>>>>
>>>> * How are you running Phoenix queries? Were the concurrent Phoenix
>>>> queries using the same JVM? Was the JVM restarted after changing number of
>>>> concurrent users?
>>>> * Is the response time plotted when query is executed for the first
>>>> time or second or average of both?
>>>> * Is the UUID filtered on randomly distributed? Does UUID match a
>>>> single row?
>>>> * It seems that even non-concurrent Phoenix query which filters on UUID
>>>> takes 500ms in your environment. Can you try the same query in Sqlline a
>>>> few times and see how much time it takes for each run?
>>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>>> * Can you share your table schema and how you ran Phoenix queries and
>>>> your HBase equivalent code?
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
>>>> e.narros@elsevier.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> We are exploring starting to use Phoenix and have done some load tests
>>>>> to see whether Phoenix would scale. We have noted that compared to HBase,
>>>>> Phoenix response times have a much slower average as the number of
>>>>> concurrent users increases. We are trying to understand whether this is
>>>>> expected or there is something we are missing out.
>>>>>
>>>>>
>>>>> This is the test we have performed:
>>>>>
>>>>>
>>>>>    - Create table (20 columns) and load it with 400 million records
>>>>>    indexed via a column called 'uuid'.
>>>>>    - Perform the following queries using 10,20,100,200,400 and 600
>>>>>    users per second, each user will perform each query twice:
>>>>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS
>>>>>       where uuid = ?
>>>>>       - Hbase equivalent to: select * from schema.DOCUMENTS where
>>>>>       uuid = ?
>>>>>    - The results are attached and they show that Phoenix response
>>>>>    times are at least an order of magnitude above those of HBase
>>>>>
>>>>> The tests were run from the Master node of a CDH5.7.2 cluster with
>>>>> Phoenix 4.7.0.
>>>>>
>>>>> Are these test results expected?
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Edu
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>>> Registered in England and Wales.
>>>>>
>>>>
>>>>
>>>
>>> ------------------------------
>>>
>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>> Registered in England and Wales.
>>>
>>
>>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Phoenix has slow response times compared to HBase

Posted by Andrew Purtell <ap...@apache.org>.

> HBase 0.98.17

Why not 0.98.21, our latest release at the time? And there's now 0.98.22.

On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org> wrote:

> Here is the graph that I get simulating 1, 50 and 500 concurrent users
> from single client. Query time for Phoenix is highly comparable with direct
> HBase gets.
>
> See the chart below with query time (ms) for random point gets over large
> table that will not fit HBase block cache. Query/gets were executed for
> 1000 time for each user.
>
> [image: Inline image 1]
> Source code to execute gets/phoenix query simulating multiple users is at:
> 
>  directhbasemt.java
> <https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>
> 
>  directphoenixmt.java
> <https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>
> 
> Table DDL
> create table testuuid (k varchar not null primary key, a varchar, b
> varchar, c varchar, d varchar, e varchar, f varchar);
>
> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this
> restricts how often server will check for metadata updates to improve
> performance
>
> Table was filled with 68M rows.
> Phoenix 4.8/HBase 0.98.17 running on single machine.
>
> //mujtaba
>
>
> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <
> e.narros@elsevier.com> wrote:
>
>> Hi Mujtaba,
>>
>>
>> See the answers inline below:
>>
>>
>> * How are you running Phoenix queries? *We are using apache-jmeter and
>> the jdbc sampler.*
>> * Were the concurrent Phoenix queries using the same JVM? *Yes.*
>> * Was the JVM restarted after changing number of concurrent users? *Yes.*
>> * Is the response time plotted when query is executed for the first time
>> or second or average of both? *Average. We see response times ranging
>> significantly even via sqlline. i.e. the same query run 11 times
>> sequentially takes anything between 17ms to around 489ms with no other load
>> on the server.*
>> * Is the UUID filtered on randomly distributed? *Yes.*
>> * Does UUID match a single row? *Yes.*
>> * It seems that even non-concurrent Phoenix query which filters on UUID
>> takes 500ms in your environment. Can you try the same query in Sqlline a
>> few times and see how much time it takes for each run?
>> *We run the same query 11 times via sqlline and these were the response
>> times: *
>> *1 row selected (0.489 seconds)*
>> *1 row selected (0.279 seconds)*
>> *1 row selected (0.227 seconds)*
>> *1 row selected (0.22 seconds)*
>> *1 row selected (0.17 seconds)*
>> *1 row selected (0.152 seconds)*
>> *1 row selected (0.129 seconds)*
>> *1 row selected (0.17 seconds)*
>> *1 row selected (0.153 seconds)*
>> *1 row selected (0.259 seconds)*
>> *1 row selected (0.102 seconds)*
>>
>> * What is the explain <https://phoenix.apache.org/language/#explain>
>> plan for your Phoenix query? *CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN
>> POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS*
>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>> table and reconnect Sqlline and execute the query again. *I think the
>> issue is that the response times vary a lot, with 600 concurrent users the
>> same query can take anything between 2ms to 10s.*
>> * Can you share your table schema and how you ran Phoenix queries and
>> your HBase equivalent code? *It is a simple table with 15 columns, the
>> primary key is the uuid which is of type VARCHAR(36). The hbase equivalent
>> code is:*
>>
>>
>>
>>
>>
>> * HTableInterface hTable = pool.getTable("schema.DOCUMENTS");Get get =
>> new Get(toBytes(saltPrefix + uuid));Result result = hTable.get(get); *
>>
>> * Any phoenix tuning defaults that you changed? *No.*
>>
>> Kind Regards,
>>
>>
>> Edu
>>
>>
>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org>
>> wrote:
>>
>>> Something seems inherently wrong in these test results.
>>>
>>> * How are you running Phoenix queries? Were the concurrent Phoenix
>>> queries using the same JVM? Was the JVM restarted after changing number of
>>> concurrent users?
>>> * Is the response time plotted when query is executed for the first time
>>> or second or average of both?
>>> * Is the UUID filtered on randomly distributed? Does UUID match a single
>>> row?
>>> * It seems that even non-concurrent Phoenix query which filters on UUID
>>> takes 500ms in your environment. Can you try the same query in Sqlline a
>>> few times and see how much time it takes for each run?
>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>> * Can you share your table schema and how you ran Phoenix queries and
>>> your HBase equivalent code?
>>>
>>>
>>>
>>>
>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
>>> e.narros@elsevier.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> We are exploring starting to use Phoenix and have done some load tests
>>>> to see whether Phoenix would scale. We have noted that compared to HBase,
>>>> Phoenix response times have a much slower average as the number of
>>>> concurrent users increases. We are trying to understand whether this is
>>>> expected or there is something we are missing out.
>>>>
>>>>
>>>> This is the test we have performed:
>>>>
>>>>
>>>>    - Create table (20 columns) and load it with 400 million records
>>>>    indexed via a column called 'uuid'.
>>>>    - Perform the following queries using 10,20,100,200,400 and 600
>>>>    users per second, each user will perform each query twice:
>>>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS
>>>>       where uuid = ?
>>>>       - Hbase equivalent to: select * from schema.DOCUMENTS where uuid
>>>>       = ?
>>>>    - The results are attached and they show that Phoenix response
>>>>    times are at least an order of magnitude above those of HBase
>>>>
>>>> The tests were run from the Master node of a CDH5.7.2 cluster with
>>>> Phoenix 4.7.0.
>>>>
>>>> Are these test results expected?
>>>>
>>>> Kind Regards,
>>>>
>>>> Edu
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>> Registered in England and Wales.
>>>>
>>>
>>>
>>
>> ------------------------------
>>
>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>> Registered in England and Wales.
>>
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Phoenix has slow response times compared to HBase

Posted by Jonathan Leech <jo...@gmail.com>.

The direct hbase client probably made 500 direct clients whereas Phoenix maybe made fewer simultaneous calls, with a little waiting and hit a sweeter spot for load on your configuration.

> On Sep 2, 2016, at 7:06 PM, Mujtaba Chohan <mc...@salesforce.com> wrote:
> 
> Single user average: Phoenix 8ms, HBase 5ms
> 50 users average: Phoenix 35ms, HBase 40ms
> 500 users average: Phoenix 300-400ms, HBase 350-450ms
> 
> Few notes:
> 
> * We have yet to identify why Phoenix was showing slight advantage with high number of concurrent users from single client. 
> 
> * For the case with 500 concurrent users from single client, region server handler count and Phoenix thread pool size was bumped to 500 to accommodate this level of concurrency.
> 
>> On Friday, September 2, 2016, James Taylor <ja...@apache.org> wrote:
>> Thanks, Mujtaba. What's the average query time for HBase and Phoenix for the 1/50/500 simultaneous user scenarios?
>> 
>> Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table (as Mujtaba showed in his ALTER TABLE statement - you can do this in the CREATE TABLE statement as well).
>> 
>> Thanks,
>> James
>> 
>>> On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org> wrote:
>>> Here is the graph that I get simulating 1, 50 and 500 concurrent users from single client. Query time for Phoenix is highly comparable with direct HBase gets. 
>>> 
>>> See the chart below with query time (ms) for random point gets over large table that will not fit HBase block cache. Query/gets were executed for 1000 time for each user.
>>> 
>>> <image.png>
>>> Source code to execute gets/phoenix query simulating multiple users is at:
>>> 
>>>  directhbasemt.java
>>> 
>>>  directphoenixmt.java
>>> 
>>> Table DDL
>>> create table testuuid (k varchar not null primary key, a varchar, b varchar, c varchar, d varchar, e varchar, f varchar);
>>> 
>>> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this restricts how often server will check for metadata updates to improve performance
>>> 
>>> Table was filled with 68M rows.
>>> Phoenix 4.8/HBase 0.98.17 running on single machine.
>>> 
>>> //mujtaba
>>> 
>>> 
>>>> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:
>>>> Hi Mujtaba,
>>>> 
>>>> 
>>>> See the answers inline below:
>>>> 
>>>> 
>>>> * How are you running Phoenix queries? We are using apache-jmeter and the jdbc sampler.
>>>> * Were the concurrent Phoenix queries using the same JVM? Yes.
>>>> * Was the JVM restarted after changing number of concurrent users? Yes.
>>>> * Is the response time plotted when query is executed for the first time or second or average of both? Average. We see response times ranging significantly even via sqlline. i.e. the same query run 11 times sequentially takes anything between 17ms to around 489ms with no other load on the server.
>>>> * Is the UUID filtered on randomly distributed? Yes. 
>>>> * Does UUID match a single row? Yes.
>>>> * It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run? We run the same query 11 times via sqlline and these were the response times:
>>>> 1 row selected (0.489 seconds)
>>>> 1 row selected (0.279 seconds)
>>>> 1 row selected (0.227 seconds)
>>>> 1 row selected (0.22 seconds)
>>>> 1 row selected (0.17 seconds)
>>>> 1 row selected (0.152 seconds)
>>>> 1 row selected (0.129 seconds)
>>>> 1 row selected (0.17 seconds)
>>>> 1 row selected (0.153 seconds)
>>>> 1 row selected (0.259 seconds)
>>>> 1 row selected (0.102 seconds)
>>>> 
>>>> * What is the explain plan for your Phoenix query? CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS
>>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS table and reconnect Sqlline and execute the query again. I think the issue is that the response times vary a lot, with 600 concurrent users the same query can take anything between 2ms to 10s.
>>>> * Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code? It is a simple table with 15 columns, the primary key is the uuid which is of type VARCHAR(36). The hbase equivalent code is:
>>>>  HTableInterface hTable = pool.getTable("schema.DOCUMENTS");
>>>> 
>>>> Get get = new Get(toBytes(saltPrefix + uuid));
>>>> 
>>>> Result result = hTable.get(get);
>>>> 
>>>> * Any phoenix tuning defaults that you changed? No.
>>>> 
>>>> Kind Regards,
>>>> 
>>>> 
>>>> Edu
>>>> 
>>>> 
>>>> 
>>>>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org> wrote:
>>>>> Something seems inherently wrong in these test results.
>>>>> 
>>>>> * How are you running Phoenix queries? Were the concurrent Phoenix queries using the same JVM? Was the JVM restarted after changing number of concurrent users?
>>>>> * Is the response time plotted when query is executed for the first time or second or average of both?
>>>>> * Is the UUID filtered on randomly distributed? Does UUID match a single row?
>>>>> * It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run?
>>>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>>>> * Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> We are exploring starting to use Phoenix and have done some load tests to see whether Phoenix would scale. We have noted that compared to HBase, Phoenix response times have a much slower average as the number of concurrent users increases. We are trying to understand whether this is expected or there is something we are missing out.
>>>>>> 
>>>>>> 
>>>>>> This is the test we have performed:
>>>>>> 
>>>>>> Create table (20 columns) and load it with 400 million records indexed via a column called 'uuid'.
>>>>>> Perform the following queries using 10,20,100,200,400 and 600 users per second, each user will perform each query twice:
>>>>>> Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>>> Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where uuid = ?
>>>>>> Hbase equivalent to: select * from schema.DOCUMENTS where uuid = ?
>>>>>> The results are attached and they show that Phoenix response times are at least an order of magnitude above those of HBase
>>>>>> The tests were run from the Master node of a CDH5.7.2 cluster with Phoenix 4.7.0.
>>>>>> 
>>>>>> Are these test results expected?
>>>>>> 
>>>>>> Kind Regards,
>>>>>> 
>>>>>> Edu
>>>>>> 
>>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.
>>>> 
>>>> 
>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Re: Phoenix has slow response times compared to HBase

Posted by "Narros, Eduardo (ELS-LON)" <e....@elsevier.com>.

Noted. Thanks for your help.


Kind Regards,


Edu

________________________________
From: Mujtaba Chohan <mu...@apache.org>
Sent: 27 September 2016 19:42:46
To: user@phoenix.apache.org
Subject: Re: Phoenix has slow response times compared to HBase

Hi Edu,

Since Phoenix connection is not initialized in your test, *all* concurrent queries which start at the same time would show this first time connection cost which offsets the perf. numbers.

Thanks,
Mujtaba

On Mon, Sep 26, 2016 at 9:53 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com>> wrote:

Hi Mujtaba,


Thanks for the response. I am intrigued about why it makes a difference to run the initialization code by the main thread (as in your test) vs running as soon as the first thread opens a jdbc connection (as in our test).


Is there any state that phoenix stores somewhere that would make this two tests so different?


Regards,


Edu

________________________________
From: Mujtaba Chohan <mu...@apache.org>>
Sent: 22 September 2016 18:32:38

To: user@phoenix.apache.org<ma...@phoenix.apache.org>
Subject: Re: Phoenix has slow response times compared to HBase

Hi Edu,

See the attached updated test which initialize Phoenix before running the concurrent test to mimic real world usage where JVM should not be restarted before each run. Apart from this it runs 2 queries per thread without ignoring time for the first query and the average time still remain within few ms.

-mujtaba


On Wed, Sep 7, 2016 at 2:00 PM, James Taylor <ja...@apache.org>> wrote:
Thanks for the update, Edu. Mujtaba is out of the office for the next couple of weeks, so unless someone else has time to pick this up, it likely won't be picked up until he returns.

     James

On Mon, Sep 5, 2016 at 8:36 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com>> wrote:

Hi Mujtaba,


Thanks a lot for helping us to get to the bottom of this. Looking at your test code we have observed that you are ignoring the execution time of the first query. If we do the same, our results are similar to yours.


Unfortunately, our tests cannot ignore the time it takes to fetch a record by a thread for the first time. We are interested in the response times for the first and second queries per thread. (i.e. each thread just runs 2 queries)


When we rerun your code without ignoring the first result and with the number of runs per thread equal 2, we see similar results to our original findings. We have attached our modified test harness to this email.


Kind Regards,


Edu


________________________________
From: Mujtaba Chohan <mc...@salesforce.com>>
Sent: 03 September 2016 02:06:30
To: user@phoenix.apache.org<ma...@phoenix.apache.org>
Subject: Re: Phoenix has slow response times compared to HBase

Single user average: Phoenix 8ms, HBase 5ms
50 users average: Phoenix 35ms, HBase 40ms
500 users average: Phoenix 300-400ms, HBase 350-450ms

Few notes:

* We have yet to identify why Phoenix was showing slight advantage with high number of concurrent users from single client.

* For the case with 500 concurrent users from single client, region server handler count and Phoenix thread pool size was bumped to 500 to accommodate this level of concurrency.

On Friday, September 2, 2016, James Taylor <ja...@apache.org>> wrote:
Thanks, Mujtaba. What's the average query time for HBase and Phoenix for the 1/50/500 simultaneous user scenarios?

Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table (as Mujtaba showed in his ALTER TABLE statement - you can do this in the CREATE TABLE statement as well).

Thanks,
James

On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org> wrote:
Here is the graph that I get simulating 1, 50 and 500 concurrent users from single client. Query time for Phoenix is highly comparable with direct HBase gets.

See the chart below with query time (ms) for random point gets over large table that will not fit HBase block cache. Query/gets were executed for 1000 time for each user.

[Inline image 1]
Source code to execute gets/phoenix query simulating multiple users is at:

[https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] directhbasemt.java<https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>

[https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] directphoenixmt.java<https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>

Table DDL
create table testuuid (k varchar not null primary key, a varchar, b varchar, c varchar, d varchar, e varchar, f varchar);

alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this restricts how often server will check for metadata updates to improve performance

Table was filled with 68M rows.
Phoenix 4.8/HBase 0.98.17 running on single machine.

//mujtaba


On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:

Hi Mujtaba,


See the answers inline below:

* How are you running Phoenix queries? We are using apache-jmeter and the jdbc sampler.
* Were the concurrent Phoenix queries using the same JVM? Yes.
* Was the JVM restarted after changing number of concurrent users? Yes.
* Is the response time plotted when query is executed for the first time or second or average of both? Average. We see response times ranging significantly even via sqlline. i.e. the same query run 11 times sequentially takes anything between 17ms to around 489ms with no other load on the server.
* Is the UUID filtered on randomly distributed? Yes.
* Does UUID match a single row? Yes.
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run? We run the same query 11 times via sqlline and these were the response times:
1 row selected (0.489 seconds)
1 row selected (0.279 seconds)
1 row selected (0.227 seconds)
1 row selected (0.22 seconds)
1 row selected (0.17 seconds)
1 row selected (0.152 seconds)
1 row selected (0.129 seconds)
1 row selected (0.17 seconds)
1 row selected (0.153 seconds)
1 row selected (0.259 seconds)
1 row selected (0.102 seconds)

* What is the explain<https://phoenix.apache.org/language/#explain> plan for your Phoenix query? CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS table and reconnect Sqlline and execute the query again. I think the issue is that the response times vary a lot, with 600 concurrent users the same query can take anything between 2ms to 10s.
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code? It is a simple table with 15 columns, the primary key is the uuid which is of type VARCHAR(36). The hbase equivalent code is:

HTableInterface hTable = pool.getTable("schema.DOCUMENTS");

Get get = new Get(toBytes(saltPrefix + uuid));

Result result = hTable.get(get);

* Any phoenix tuning defaults that you changed? No.


Kind Regards,


Edu


On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org> wrote:
Something seems inherently wrong in these test results.

* How are you running Phoenix queries? Were the concurrent Phoenix queries using the same JVM? Was the JVM restarted after changing number of concurrent users?
* Is the response time plotted when query is executed for the first time or second or average of both?
* Is the UUID filtered on randomly distributed? Does UUID match a single row?
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run?
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code?




On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:

Hi,


We are exploring starting to use Phoenix and have done some load tests to see whether Phoenix would scale. We have noted that compared to HBase, Phoenix response times have a much slower average as the number of concurrent users increases. We are trying to understand whether this is expected or there is something we are missing out.


This is the test we have performed:

  *   Create table (20 columns) and load it with 400 million records indexed via a column called 'uuid'.
  *   Perform the following queries using 10,20,100,200,400 and 600 users per second, each user will perform each query twice:
     *   Phoenix: select * from schema.DOCUMENTS where uuid = ?
     *   Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where uuid = ?
     *   Hbase equivalent to: select * from schema.DOCUMENTS where uuid = ?
  *   The results are attached and they show that Phoenix response times are at least an order of magnitude above those of HBase

The tests were run from the Master node of a CDH5.7.2 cluster with Phoenix 4.7.0.

Are these test results expected?

Kind Regards,

Edu

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.


________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Re: Phoenix has slow response times compared to HBase

Posted by "Heather, James (ELS-LON)" <ja...@elsevier.com>.

That's great. Thank you.

I'm very glad we've got this resolved!

James

On 27 September 2016 19:43:03 Mujtaba Chohan <mu...@apache.org> wrote:

Hi Edu,

Since Phoenix connection is not initialized in your test, *all* concurrent queries which start at the same time would show this first time connection cost which offsets the perf. numbers.

Thanks,
Mujtaba

On Mon, Sep 26, 2016 at 9:53 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com>> wrote:

Hi Mujtaba,


Thanks for the response. I am intrigued about why it makes a difference to run the initialization code by the main thread (as in your test) vs running as soon as the first thread opens a jdbc connection (as in our test).


Is there any state that phoenix stores somewhere that would make this two tests so different?


Regards,


Edu

________________________________
From: Mujtaba Chohan <mu...@apache.org>>
Sent: 22 September 2016 18:32:38

To: user@phoenix.apache.org<ma...@phoenix.apache.org>
Subject: Re: Phoenix has slow response times compared to HBase

Hi Edu,

See the attached updated test which initialize Phoenix before running the concurrent test to mimic real world usage where JVM should not be restarted before each run. Apart from this it runs 2 queries per thread without ignoring time for the first query and the average time still remain within few ms.

-mujtaba


On Wed, Sep 7, 2016 at 2:00 PM, James Taylor <ja...@apache.org>> wrote:
Thanks for the update, Edu. Mujtaba is out of the office for the next couple of weeks, so unless someone else has time to pick this up, it likely won't be picked up until he returns.

     James

On Mon, Sep 5, 2016 at 8:36 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com>> wrote:

Hi Mujtaba,


Thanks a lot for helping us to get to the bottom of this. Looking at your test code we have observed that you are ignoring the execution time of the first query. If we do the same, our results are similar to yours.


Unfortunately, our tests cannot ignore the time it takes to fetch a record by a thread for the first time. We are interested in the response times for the first and second queries per thread. (i.e. each thread just runs 2 queries)


When we rerun your code without ignoring the first result and with the number of runs per thread equal 2, we see similar results to our original findings. We have attached our modified test harness to this email.


Kind Regards,


Edu


________________________________
From: Mujtaba Chohan <mc...@salesforce.com>>
Sent: 03 September 2016 02:06:30
To: user@phoenix.apache.org<ma...@phoenix.apache.org>
Subject: Re: Phoenix has slow response times compared to HBase

Single user average: Phoenix 8ms, HBase 5ms
50 users average: Phoenix 35ms, HBase 40ms
500 users average: Phoenix 300-400ms, HBase 350-450ms

Few notes:

* We have yet to identify why Phoenix was showing slight advantage with high number of concurrent users from single client.

* For the case with 500 concurrent users from single client, region server handler count and Phoenix thread pool size was bumped to 500 to accommodate this level of concurrency.

On Friday, September 2, 2016, James Taylor <ja...@apache.org>> wrote:
Thanks, Mujtaba. What's the average query time for HBase and Phoenix for the 1/50/500 simultaneous user scenarios?

Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table (as Mujtaba showed in his ALTER TABLE statement - you can do this in the CREATE TABLE statement as well).

Thanks,
James

On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org> wrote:
Here is the graph that I get simulating 1, 50 and 500 concurrent users from single client. Query time for Phoenix is highly comparable with direct HBase gets.

See the chart below with query time (ms) for random point gets over large table that will not fit HBase block cache. Query/gets were executed for 1000 time for each user.

[Inline image 1]
Source code to execute gets/phoenix query simulating multiple users is at:

[https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] directhbasemt.java<https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>

[https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] directphoenixmt.java<https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>

Table DDL
create table testuuid (k varchar not null primary key, a varchar, b varchar, c varchar, d varchar, e varchar, f varchar);

alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this restricts how often server will check for metadata updates to improve performance

Table was filled with 68M rows.
Phoenix 4.8/HBase 0.98.17 running on single machine.

//mujtaba


On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:

Hi Mujtaba,


See the answers inline below:

* How are you running Phoenix queries? We are using apache-jmeter and the jdbc sampler.
* Were the concurrent Phoenix queries using the same JVM? Yes.
* Was the JVM restarted after changing number of concurrent users? Yes.
* Is the response time plotted when query is executed for the first time or second or average of both? Average. We see response times ranging significantly even via sqlline. i.e. the same query run 11 times sequentially takes anything between 17ms to around 489ms with no other load on the server.
* Is the UUID filtered on randomly distributed? Yes.
* Does UUID match a single row? Yes.
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run? We run the same query 11 times via sqlline and these were the response times:
1 row selected (0.489 seconds)
1 row selected (0.279 seconds)
1 row selected (0.227 seconds)
1 row selected (0.22 seconds)
1 row selected (0.17 seconds)
1 row selected (0.152 seconds)
1 row selected (0.129 seconds)
1 row selected (0.17 seconds)
1 row selected (0.153 seconds)
1 row selected (0.259 seconds)
1 row selected (0.102 seconds)

* What is the explain<https://phoenix.apache.org/language/#explain> plan for your Phoenix query? CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS table and reconnect Sqlline and execute the query again. I think the issue is that the response times vary a lot, with 600 concurrent users the same query can take anything between 2ms to 10s.
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code? It is a simple table with 15 columns, the primary key is the uuid which is of type VARCHAR(36). The hbase equivalent code is:

HTableInterface hTable = pool.getTable("schema.DOCUMENTS");

Get get = new Get(toBytes(saltPrefix + uuid));

Result result = hTable.get(get);

* Any phoenix tuning defaults that you changed? No.


Kind Regards,


Edu


On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org> wrote:
Something seems inherently wrong in these test results.

* How are you running Phoenix queries? Were the concurrent Phoenix queries using the same JVM? Was the JVM restarted after changing number of concurrent users?
* Is the response time plotted when query is executed for the first time or second or average of both?
* Is the UUID filtered on randomly distributed? Does UUID match a single row?
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run?
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code?




On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:

Hi,


We are exploring starting to use Phoenix and have done some load tests to see whether Phoenix would scale. We have noted that compared to HBase, Phoenix response times have a much slower average as the number of concurrent users increases. We are trying to understand whether this is expected or there is something we are missing out.


This is the test we have performed:

  *   Create table (20 columns) and load it with 400 million records indexed via a column called 'uuid'.
  *   Perform the following queries using 10,20,100,200,400 and 600 users per second, each user will perform each query twice:
     *   Phoenix: select * from schema.DOCUMENTS where uuid = ?
     *   Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where uuid = ?
     *   Hbase equivalent to: select * from schema.DOCUMENTS where uuid = ?
  *   The results are attached and they show that Phoenix response times are at least an order of magnitude above those of HBase

The tests were run from the Master node of a CDH5.7.2 cluster with Phoenix 4.7.0.

Are these test results expected?

Kind Regards,

Edu

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.


________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Re: Phoenix has slow response times compared to HBase

Posted by Mujtaba Chohan <mu...@apache.org>.

Hi Edu,

Since Phoenix connection is not initialized in your test, *all* concurrent
queries which start at the same time would show this first time connection
cost which offsets the perf. numbers.

Thanks,
Mujtaba

On Mon, Sep 26, 2016 at 9:53 AM, Narros, Eduardo (ELS-LON) <
e.narros@elsevier.com> wrote:

> Hi Mujtaba,
>
>
> Thanks for the response. I am intrigued about why it makes a difference to
> run the initialization code by the main thread (as in your test) vs
> running as soon as the first thread opens a jdbc connection (as in our
> test).
>
>
> Is there any state that phoenix stores somewhere that would make this two
> tests so different?
>
>
> Regards,
>
>
> Edu
> ------------------------------
> *From:* Mujtaba Chohan <mu...@apache.org>
> *Sent:* 22 September 2016 18:32:38
>
> *To:* user@phoenix.apache.org
> *Subject:* Re: Phoenix has slow response times compared to HBase
>
> Hi Edu,
>
> See the attached updated test which initialize Phoenix before running the
> concurrent test to mimic real world usage where JVM should not be restarted
> before each run. Apart from this it runs 2 queries per thread without
> ignoring time for the first query and the average time still remain within
> few ms.
>
> -mujtaba
>
>
> On Wed, Sep 7, 2016 at 2:00 PM, James Taylor <ja...@apache.org>
> wrote:
>
>> Thanks for the update, Edu. Mujtaba is out of the office for the next
>> couple of weeks, so unless someone else has time to pick this up, it likely
>> won't be picked up until he returns.
>>
>>      James
>>
>> On Mon, Sep 5, 2016 at 8:36 AM, Narros, Eduardo (ELS-LON) <
>> e.narros@elsevier.com> wrote:
>>
>>> Hi Mujtaba,
>>>
>>>
>>> Thanks a lot for helping us to get to the bottom of this. Looking at
>>> your test code we have observed that you are ignoring the execution time of
>>> the first query. If we do the same, our results are similar to yours.
>>>
>>>
>>> Unfortunately, our tests cannot ignore the time it takes to fetch a
>>> record by a thread for the first time. We are interested in the
>>> response times for the first and second queries per thread. (i.e. each
>>> thread just runs 2 queries)
>>>
>>>
>>> When we rerun your code without ignoring the first result and with the
>>> number of runs per thread equal 2, we see similar results to our original
>>> findings. We have attached our modified test harness to this email.
>>>
>>>
>>> Kind Regards,
>>>
>>>
>>> Edu
>>>
>>>
>>> ------------------------------
>>> *From:* Mujtaba Chohan <mc...@salesforce.com>
>>> *Sent:* 03 September 2016 02:06:30
>>> *To:* user@phoenix.apache.org
>>> *Subject:* Re: Phoenix has slow response times compared to HBase
>>>
>>> Single user average: Phoenix 8ms, HBase 5ms
>>> 50 users average: Phoenix 35ms, HBase 40ms
>>> 500 users average: Phoenix 300-400ms, HBase 350-450ms
>>>
>>> Few notes:
>>>
>>> * We have yet to identify why Phoenix was showing slight advantage with
>>> high number of concurrent users from single client.
>>>
>>> * For the case with 500 concurrent users from single client, region
>>> server handler count and Phoenix thread pool size was bumped to 500 to
>>> accommodate this level of concurrency.
>>>
>>> On Friday, September 2, 2016, James Taylor <ja...@apache.org>
>>> wrote:
>>>
>>>> Thanks, Mujtaba. What's the average query time for HBase and Phoenix
>>>> for the 1/50/500 simultaneous user scenarios?
>>>>
>>>> Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table
>>>> (as Mujtaba showed in his ALTER TABLE statement - you can do this in the
>>>> CREATE TABLE statement as well).
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>> On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org>
>>>> wrote:
>>>>
>>>>> Here is the graph that I get simulating 1, 50 and 500 concurrent users
>>>>> from single client. Query time for Phoenix is highly comparable with direct
>>>>> HBase gets.
>>>>>
>>>>> See the chart below with query time (ms) for random point gets over
>>>>> large table that will not fit HBase block cache. Query/gets were executed
>>>>> for 1000 time for each user.
>>>>>
>>>>> [image: Inline image 1]
>>>>> Source code to execute gets/phoenix query simulating multiple users is
>>>>> at:
>>>>> 
>>>>>  directhbasemt.java
>>>>> <https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>
>>>>> 
>>>>>  directphoenixmt.java
>>>>> <https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>
>>>>> 
>>>>> Table DDL
>>>>> create table testuuid (k varchar not null primary key, a varchar, b
>>>>> varchar, c varchar, d varchar, e varchar, f varchar);
>>>>>
>>>>> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this
>>>>> restricts how often server will check for metadata updates to improve
>>>>> performance
>>>>>
>>>>> Table was filled with 68M rows.
>>>>> Phoenix 4.8/HBase 0.98.17 running on single machine.
>>>>>
>>>>> //mujtaba
>>>>>
>>>>>
>>>>> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <
>>>>> e.narros@elsevier.com> wrote:
>>>>>
>>>>>> Hi Mujtaba,
>>>>>>
>>>>>>
>>>>>> See the answers inline below:
>>>>>>
>>>>>>
>>>>>> * How are you running Phoenix queries? *We are using apache-jmeter
>>>>>> and the jdbc sampler.*
>>>>>> * Were the concurrent Phoenix queries using the same JVM? *Yes.*
>>>>>> * Was the JVM restarted after changing number of concurrent users?
>>>>>> *Yes.*
>>>>>> * Is the response time plotted when query is executed for the first
>>>>>> time or second or average of both? *Average. We see response times
>>>>>> ranging significantly even via sqlline. i.e. the same query run 11 times
>>>>>> sequentially takes anything between 17ms to around 489ms with no other load
>>>>>> on the server.*
>>>>>> * Is the UUID filtered on randomly distributed? *Yes.*
>>>>>> * Does UUID match a single row? *Yes.*
>>>>>> * It seems that even non-concurrent Phoenix query which filters on
>>>>>> UUID takes 500ms in your environment. Can you try the same query in Sqlline
>>>>>> a few times and see how much time it takes for each run?
>>>>>> *We run the same query 11 times via sqlline and these were the
>>>>>> response times: *
>>>>>> *1 row selected (0.489 seconds)*
>>>>>> *1 row selected (0.279 seconds)*
>>>>>> *1 row selected (0.227 seconds)*
>>>>>> *1 row selected (0.22 seconds)*
>>>>>> *1 row selected (0.17 seconds)*
>>>>>> *1 row selected (0.152 seconds)*
>>>>>> *1 row selected (0.129 seconds)*
>>>>>> *1 row selected (0.17 seconds)*
>>>>>> *1 row selected (0.153 seconds)*
>>>>>> *1 row selected (0.259 seconds)*
>>>>>> *1 row selected (0.102 seconds)*
>>>>>>
>>>>>> * What is the explain <https://phoenix.apache.org/language/#explain>
>>>>>> plan for your Phoenix query? *CLIENT 1-CHUNK PARALLEL 1-WAY ROUND
>>>>>> ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS*
>>>>>> * If it's slow in Sqlline as well then try truncating your
>>>>>> SYSTEM.STATS table and reconnect Sqlline and execute the query again. *I
>>>>>> think the issue is that the response times vary a lot, with 600 concurrent
>>>>>> users the same query can take anything between 2ms to 10s.*
>>>>>> * Can you share your table schema and how you ran Phoenix queries and
>>>>>> your HBase equivalent code? *It is a simple table with 15 columns,
>>>>>> the primary key is the uuid which is of type VARCHAR(36). The hbase
>>>>>> equivalent code is:*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> * HTableInterface hTable = pool.getTable("schema.DOCUMENTS");Get get
>>>>>> = new Get(toBytes(saltPrefix + uuid));Result result = hTable.get(get); *
>>>>>>
>>>>>> * Any phoenix tuning defaults that you changed? *No.*
>>>>>>
>>>>>> Kind Regards,
>>>>>>
>>>>>>
>>>>>> Edu
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Something seems inherently wrong in these test results.
>>>>>>>
>>>>>>> * How are you running Phoenix queries? Were the concurrent Phoenix
>>>>>>> queries using the same JVM? Was the JVM restarted after changing number of
>>>>>>> concurrent users?
>>>>>>> * Is the response time plotted when query is executed for the first
>>>>>>> time or second or average of both?
>>>>>>> * Is the UUID filtered on randomly distributed? Does UUID match a
>>>>>>> single row?
>>>>>>> * It seems that even non-concurrent Phoenix query which filters on
>>>>>>> UUID takes 500ms in your environment. Can you try the same query in Sqlline
>>>>>>> a few times and see how much time it takes for each run?
>>>>>>> * If it's slow in Sqlline as well then try truncating your
>>>>>>> SYSTEM.STATS
>>>>>>> * Can you share your table schema and how you ran Phoenix queries
>>>>>>> and your HBase equivalent code?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
>>>>>>> e.narros@elsevier.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> We are exploring starting to use Phoenix and have done some load
>>>>>>>> tests to see whether Phoenix would scale. We have noted that compared to
>>>>>>>> HBase, Phoenix response times have a much slower average as the number of
>>>>>>>> concurrent users increases. We are trying to understand whether this is
>>>>>>>> expected or there is something we are missing out.
>>>>>>>>
>>>>>>>>
>>>>>>>> This is the test we have performed:
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Create table (20 columns) and load it with 400 million
>>>>>>>>    records indexed via a column called 'uuid'.
>>>>>>>>    - Perform the following queries using 10,20,100,200,400 and 600
>>>>>>>>    users per second, each user will perform each query twice:
>>>>>>>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>>>>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS
>>>>>>>>       where uuid = ?
>>>>>>>>       - Hbase equivalent to: select * from schema.DOCUMENTS where
>>>>>>>>       uuid = ?
>>>>>>>>    - The results are attached and they show that Phoenix response
>>>>>>>>    times are at least an order of magnitude above those of HBase
>>>>>>>>
>>>>>>>> The tests were run from the Master node of a CDH5.7.2 cluster with
>>>>>>>> Phoenix 4.7.0.
>>>>>>>>
>>>>>>>> Are these test results expected?
>>>>>>>>
>>>>>>>> Kind Regards,
>>>>>>>>
>>>>>>>> Edu
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>>>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>>>>>> Registered in England and Wales.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>>>> Registered in England and Wales.
>>>>>>
>>>>>
>>>>>
>>>>
>>> ------------------------------
>>>
>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>> Registered in England and Wales.
>>>
>>
>>
>
> ------------------------------
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>

Re: Phoenix has slow response times compared to HBase

Posted by "Narros, Eduardo (ELS-LON)" <e....@elsevier.com>.

Hi Mujtaba,


Thanks for the response. I am intrigued about why it makes a difference to run the initialization code by the main thread (as in your test) vs running as soon as the first thread opens a jdbc connection (as in our test).


Is there any state that phoenix stores somewhere that would make this two tests so different?


Regards,


Edu

________________________________
From: Mujtaba Chohan <mu...@apache.org>
Sent: 22 September 2016 18:32:38
To: user@phoenix.apache.org
Subject: Re: Phoenix has slow response times compared to HBase

Hi Edu,

See the attached updated test which initialize Phoenix before running the concurrent test to mimic real world usage where JVM should not be restarted before each run. Apart from this it runs 2 queries per thread without ignoring time for the first query and the average time still remain within few ms.

-mujtaba


On Wed, Sep 7, 2016 at 2:00 PM, James Taylor <ja...@apache.org>> wrote:
Thanks for the update, Edu. Mujtaba is out of the office for the next couple of weeks, so unless someone else has time to pick this up, it likely won't be picked up until he returns.

     James

On Mon, Sep 5, 2016 at 8:36 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com>> wrote:

Hi Mujtaba,


Thanks a lot for helping us to get to the bottom of this. Looking at your test code we have observed that you are ignoring the execution time of the first query. If we do the same, our results are similar to yours.


Unfortunately, our tests cannot ignore the time it takes to fetch a record by a thread for the first time. We are interested in the response times for the first and second queries per thread. (i.e. each thread just runs 2 queries)


When we rerun your code without ignoring the first result and with the number of runs per thread equal 2, we see similar results to our original findings. We have attached our modified test harness to this email.


Kind Regards,


Edu


________________________________
From: Mujtaba Chohan <mc...@salesforce.com>>
Sent: 03 September 2016 02:06:30
To: user@phoenix.apache.org<ma...@phoenix.apache.org>
Subject: Re: Phoenix has slow response times compared to HBase

Single user average: Phoenix 8ms, HBase 5ms
50 users average: Phoenix 35ms, HBase 40ms
500 users average: Phoenix 300-400ms, HBase 350-450ms

Few notes:

* We have yet to identify why Phoenix was showing slight advantage with high number of concurrent users from single client.

* For the case with 500 concurrent users from single client, region server handler count and Phoenix thread pool size was bumped to 500 to accommodate this level of concurrency.

On Friday, September 2, 2016, James Taylor <ja...@apache.org>> wrote:
Thanks, Mujtaba. What's the average query time for HBase and Phoenix for the 1/50/500 simultaneous user scenarios?

Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table (as Mujtaba showed in his ALTER TABLE statement - you can do this in the CREATE TABLE statement as well).

Thanks,
James

On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org> wrote:
Here is the graph that I get simulating 1, 50 and 500 concurrent users from single client. Query time for Phoenix is highly comparable with direct HBase gets.

See the chart below with query time (ms) for random point gets over large table that will not fit HBase block cache. Query/gets were executed for 1000 time for each user.

[Inline image 1]
Source code to execute gets/phoenix query simulating multiple users is at:

[https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] directhbasemt.java<https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>

[https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] directphoenixmt.java<https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>

Table DDL
create table testuuid (k varchar not null primary key, a varchar, b varchar, c varchar, d varchar, e varchar, f varchar);

alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this restricts how often server will check for metadata updates to improve performance

Table was filled with 68M rows.
Phoenix 4.8/HBase 0.98.17 running on single machine.

//mujtaba


On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:

Hi Mujtaba,


See the answers inline below:

* How are you running Phoenix queries? We are using apache-jmeter and the jdbc sampler.
* Were the concurrent Phoenix queries using the same JVM? Yes.
* Was the JVM restarted after changing number of concurrent users? Yes.
* Is the response time plotted when query is executed for the first time or second or average of both? Average. We see response times ranging significantly even via sqlline. i.e. the same query run 11 times sequentially takes anything between 17ms to around 489ms with no other load on the server.
* Is the UUID filtered on randomly distributed? Yes.
* Does UUID match a single row? Yes.
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run? We run the same query 11 times via sqlline and these were the response times:
1 row selected (0.489 seconds)
1 row selected (0.279 seconds)
1 row selected (0.227 seconds)
1 row selected (0.22 seconds)
1 row selected (0.17 seconds)
1 row selected (0.152 seconds)
1 row selected (0.129 seconds)
1 row selected (0.17 seconds)
1 row selected (0.153 seconds)
1 row selected (0.259 seconds)
1 row selected (0.102 seconds)

* What is the explain<https://phoenix.apache.org/language/#explain> plan for your Phoenix query? CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS table and reconnect Sqlline and execute the query again. I think the issue is that the response times vary a lot, with 600 concurrent users the same query can take anything between 2ms to 10s.
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code? It is a simple table with 15 columns, the primary key is the uuid which is of type VARCHAR(36). The hbase equivalent code is:

HTableInterface hTable = pool.getTable("schema.DOCUMENTS");

Get get = new Get(toBytes(saltPrefix + uuid));

Result result = hTable.get(get);

* Any phoenix tuning defaults that you changed? No.


Kind Regards,


Edu


On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org> wrote:
Something seems inherently wrong in these test results.

* How are you running Phoenix queries? Were the concurrent Phoenix queries using the same JVM? Was the JVM restarted after changing number of concurrent users?
* Is the response time plotted when query is executed for the first time or second or average of both?
* Is the UUID filtered on randomly distributed? Does UUID match a single row?
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run?
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code?




On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com> wrote:

Hi,


We are exploring starting to use Phoenix and have done some load tests to see whether Phoenix would scale. We have noted that compared to HBase, Phoenix response times have a much slower average as the number of concurrent users increases. We are trying to understand whether this is expected or there is something we are missing out.


This is the test we have performed:

  *   Create table (20 columns) and load it with 400 million records indexed via a column called 'uuid'.
  *   Perform the following queries using 10,20,100,200,400 and 600 users per second, each user will perform each query twice:
     *   Phoenix: select * from schema.DOCUMENTS where uuid = ?
     *   Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where uuid = ?
     *   Hbase equivalent to: select * from schema.DOCUMENTS where uuid = ?
  *   The results are attached and they show that Phoenix response times are at least an order of magnitude above those of HBase

The tests were run from the Master node of a CDH5.7.2 cluster with Phoenix 4.7.0.

Are these test results expected?

Kind Regards,

Edu

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Re: Phoenix has slow response times compared to HBase

Posted by Mujtaba Chohan <mu...@apache.org>.

Hi Edu,

See the attached updated test which initialize Phoenix before running the
concurrent test to mimic real world usage where JVM should not be restarted
before each run. Apart from this it runs 2 queries per thread without
ignoring time for the first query and the average time still remain within
few ms.

-mujtaba


On Wed, Sep 7, 2016 at 2:00 PM, James Taylor <ja...@apache.org> wrote:

> Thanks for the update, Edu. Mujtaba is out of the office for the next
> couple of weeks, so unless someone else has time to pick this up, it likely
> won't be picked up until he returns.
>
>      James
>
> On Mon, Sep 5, 2016 at 8:36 AM, Narros, Eduardo (ELS-LON) <
> e.narros@elsevier.com> wrote:
>
>> Hi Mujtaba,
>>
>>
>> Thanks a lot for helping us to get to the bottom of this. Looking at your
>> test code we have observed that you are ignoring the execution time of the
>> first query. If we do the same, our results are similar to yours.
>>
>>
>> Unfortunately, our tests cannot ignore the time it takes to fetch a
>> record by a thread for the first time. We are interested in the response
>> times for the first and second queries per thread. (i.e. each thread just
>> runs 2 queries)
>>
>>
>> When we rerun your code without ignoring the first result and with the
>> number of runs per thread equal 2, we see similar results to our original
>> findings. We have attached our modified test harness to this email.
>>
>>
>> Kind Regards,
>>
>>
>> Edu
>>
>>
>> ------------------------------
>> *From:* Mujtaba Chohan <mc...@salesforce.com>
>> *Sent:* 03 September 2016 02:06:30
>> *To:* user@phoenix.apache.org
>> *Subject:* Re: Phoenix has slow response times compared to HBase
>>
>> Single user average: Phoenix 8ms, HBase 5ms
>> 50 users average: Phoenix 35ms, HBase 40ms
>> 500 users average: Phoenix 300-400ms, HBase 350-450ms
>>
>> Few notes:
>>
>> * We have yet to identify why Phoenix was showing slight advantage with
>> high number of concurrent users from single client.
>>
>> * For the case with 500 concurrent users from single client, region
>> server handler count and Phoenix thread pool size was bumped to 500 to
>> accommodate this level of concurrency.
>>
>> On Friday, September 2, 2016, James Taylor <ja...@apache.org>
>> wrote:
>>
>>> Thanks, Mujtaba. What's the average query time for HBase and Phoenix for
>>> the 1/50/500 simultaneous user scenarios?
>>>
>>> Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table
>>> (as Mujtaba showed in his ALTER TABLE statement - you can do this in the
>>> CREATE TABLE statement as well).
>>>
>>> Thanks,
>>> James
>>>
>>> On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org>
>>> wrote:
>>>
>>>> Here is the graph that I get simulating 1, 50 and 500 concurrent users
>>>> from single client. Query time for Phoenix is highly comparable with direct
>>>> HBase gets.
>>>>
>>>> See the chart below with query time (ms) for random point gets over
>>>> large table that will not fit HBase block cache. Query/gets were executed
>>>> for 1000 time for each user.
>>>>
>>>> [image: Inline image 1]
>>>> Source code to execute gets/phoenix query simulating multiple users is
>>>> at:
>>>> 
>>>>  directhbasemt.java
>>>> <https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>
>>>> 
>>>>  directphoenixmt.java
>>>> <https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>
>>>> 
>>>> Table DDL
>>>> create table testuuid (k varchar not null primary key, a varchar, b
>>>> varchar, c varchar, d varchar, e varchar, f varchar);
>>>>
>>>> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this
>>>> restricts how often server will check for metadata updates to improve
>>>> performance
>>>>
>>>> Table was filled with 68M rows.
>>>> Phoenix 4.8/HBase 0.98.17 running on single machine.
>>>>
>>>> //mujtaba
>>>>
>>>>
>>>> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <
>>>> e.narros@elsevier.com> wrote:
>>>>
>>>>> Hi Mujtaba,
>>>>>
>>>>>
>>>>> See the answers inline below:
>>>>>
>>>>>
>>>>> * How are you running Phoenix queries? *We are using apache-jmeter
>>>>> and the jdbc sampler.*
>>>>> * Were the concurrent Phoenix queries using the same JVM? *Yes.*
>>>>> * Was the JVM restarted after changing number of concurrent users?
>>>>> *Yes.*
>>>>> * Is the response time plotted when query is executed for the first
>>>>> time or second or average of both? *Average. We see response times
>>>>> ranging significantly even via sqlline. i.e. the same query run 11 times
>>>>> sequentially takes anything between 17ms to around 489ms with no other load
>>>>> on the server.*
>>>>> * Is the UUID filtered on randomly distributed? *Yes.*
>>>>> * Does UUID match a single row? *Yes.*
>>>>> * It seems that even non-concurrent Phoenix query which filters on
>>>>> UUID takes 500ms in your environment. Can you try the same query in Sqlline
>>>>> a few times and see how much time it takes for each run?
>>>>> *We run the same query 11 times via sqlline and these were the
>>>>> response times: *
>>>>> *1 row selected (0.489 seconds)*
>>>>> *1 row selected (0.279 seconds)*
>>>>> *1 row selected (0.227 seconds)*
>>>>> *1 row selected (0.22 seconds)*
>>>>> *1 row selected (0.17 seconds)*
>>>>> *1 row selected (0.152 seconds)*
>>>>> *1 row selected (0.129 seconds)*
>>>>> *1 row selected (0.17 seconds)*
>>>>> *1 row selected (0.153 seconds)*
>>>>> *1 row selected (0.259 seconds)*
>>>>> *1 row selected (0.102 seconds)*
>>>>>
>>>>> * What is the explain <https://phoenix.apache.org/language/#explain>
>>>>> plan for your Phoenix query? *CLIENT 1-CHUNK PARALLEL 1-WAY ROUND
>>>>> ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS*
>>>>> * If it's slow in Sqlline as well then try truncating your
>>>>> SYSTEM.STATS table and reconnect Sqlline and execute the query again. *I
>>>>> think the issue is that the response times vary a lot, with 600 concurrent
>>>>> users the same query can take anything between 2ms to 10s.*
>>>>> * Can you share your table schema and how you ran Phoenix queries and
>>>>> your HBase equivalent code? *It is a simple table with 15 columns,
>>>>> the primary key is the uuid which is of type VARCHAR(36). The hbase
>>>>> equivalent code is:*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> * HTableInterface hTable = pool.getTable("schema.DOCUMENTS");Get get =
>>>>> new Get(toBytes(saltPrefix + uuid));Result result = hTable.get(get); *
>>>>>
>>>>> * Any phoenix tuning defaults that you changed? *No.*
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>>
>>>>> Edu
>>>>>
>>>>>
>>>>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Something seems inherently wrong in these test results.
>>>>>>
>>>>>> * How are you running Phoenix queries? Were the concurrent Phoenix
>>>>>> queries using the same JVM? Was the JVM restarted after changing number of
>>>>>> concurrent users?
>>>>>> * Is the response time plotted when query is executed for the first
>>>>>> time or second or average of both?
>>>>>> * Is the UUID filtered on randomly distributed? Does UUID match a
>>>>>> single row?
>>>>>> * It seems that even non-concurrent Phoenix query which filters on
>>>>>> UUID takes 500ms in your environment. Can you try the same query in Sqlline
>>>>>> a few times and see how much time it takes for each run?
>>>>>> * If it's slow in Sqlline as well then try truncating your
>>>>>> SYSTEM.STATS
>>>>>> * Can you share your table schema and how you ran Phoenix queries and
>>>>>> your HBase equivalent code?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
>>>>>> e.narros@elsevier.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>> We are exploring starting to use Phoenix and have done some load
>>>>>>> tests to see whether Phoenix would scale. We have noted that compared to
>>>>>>> HBase, Phoenix response times have a much slower average as the number of
>>>>>>> concurrent users increases. We are trying to understand whether this is
>>>>>>> expected or there is something we are missing out.
>>>>>>>
>>>>>>>
>>>>>>> This is the test we have performed:
>>>>>>>
>>>>>>>
>>>>>>>    - Create table (20 columns) and load it with 400 million records
>>>>>>>    indexed via a column called 'uuid'.
>>>>>>>    - Perform the following queries using 10,20,100,200,400 and 600
>>>>>>>    users per second, each user will perform each query twice:
>>>>>>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>>>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS
>>>>>>>       where uuid = ?
>>>>>>>       - Hbase equivalent to: select * from schema.DOCUMENTS where
>>>>>>>       uuid = ?
>>>>>>>    - The results are attached and they show that Phoenix response
>>>>>>>    times are at least an order of magnitude above those of HBase
>>>>>>>
>>>>>>> The tests were run from the Master node of a CDH5.7.2 cluster with
>>>>>>> Phoenix 4.7.0.
>>>>>>>
>>>>>>> Are these test results expected?
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>>
>>>>>>> Edu
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>>>>> Registered in England and Wales.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>>> Registered in England and Wales.
>>>>>
>>>>
>>>>
>>>
>> ------------------------------
>>
>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>> Registered in England and Wales.
>>
>
>

Re: Phoenix has slow response times compared to HBase

Posted by James Taylor <ja...@apache.org>.

Thanks for the update, Edu. Mujtaba is out of the office for the next
couple of weeks, so unless someone else has time to pick this up, it likely
won't be picked up until he returns.

     James

On Mon, Sep 5, 2016 at 8:36 AM, Narros, Eduardo (ELS-LON) <
e.narros@elsevier.com> wrote:

> Hi Mujtaba,
>
>
> Thanks a lot for helping us to get to the bottom of this. Looking at your
> test code we have observed that you are ignoring the execution time of the
> first query. If we do the same, our results are similar to yours.
>
>
> Unfortunately, our tests cannot ignore the time it takes to fetch a
> record by a thread for the first time. We are interested in the response
> times for the first and second queries per thread. (i.e. each thread just
> runs 2 queries)
>
>
> When we rerun your code without ignoring the first result and with the
> number of runs per thread equal 2, we see similar results to our original
> findings. We have attached our modified test harness to this email.
>
>
> Kind Regards,
>
>
> Edu
>
>
> ------------------------------
> *From:* Mujtaba Chohan <mc...@salesforce.com>
> *Sent:* 03 September 2016 02:06:30
> *To:* user@phoenix.apache.org
> *Subject:* Re: Phoenix has slow response times compared to HBase
>
> Single user average: Phoenix 8ms, HBase 5ms
> 50 users average: Phoenix 35ms, HBase 40ms
> 500 users average: Phoenix 300-400ms, HBase 350-450ms
>
> Few notes:
>
> * We have yet to identify why Phoenix was showing slight advantage with
> high number of concurrent users from single client.
>
> * For the case with 500 concurrent users from single client, region server
> handler count and Phoenix thread pool size was bumped to 500 to accommodate
> this level of concurrency.
>
> On Friday, September 2, 2016, James Taylor <ja...@apache.org> wrote:
>
>> Thanks, Mujtaba. What's the average query time for HBase and Phoenix for
>> the 1/50/500 simultaneous user scenarios?
>>
>> Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table
>> (as Mujtaba showed in his ALTER TABLE statement - you can do this in the
>> CREATE TABLE statement as well).
>>
>> Thanks,
>> James
>>
>> On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org>
>> wrote:
>>
>>> Here is the graph that I get simulating 1, 50 and 500 concurrent users
>>> from single client. Query time for Phoenix is highly comparable with direct
>>> HBase gets.
>>>
>>> See the chart below with query time (ms) for random point gets over
>>> large table that will not fit HBase block cache. Query/gets were executed
>>> for 1000 time for each user.
>>>
>>> [image: Inline image 1]
>>> Source code to execute gets/phoenix query simulating multiple users is
>>> at:
>>> 
>>>  directhbasemt.java
>>> <https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>
>>> 
>>>  directphoenixmt.java
>>> <https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>
>>> 
>>> Table DDL
>>> create table testuuid (k varchar not null primary key, a varchar, b
>>> varchar, c varchar, d varchar, e varchar, f varchar);
>>>
>>> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this
>>> restricts how often server will check for metadata updates to improve
>>> performance
>>>
>>> Table was filled with 68M rows.
>>> Phoenix 4.8/HBase 0.98.17 running on single machine.
>>>
>>> //mujtaba
>>>
>>>
>>> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <
>>> e.narros@elsevier.com> wrote:
>>>
>>>> Hi Mujtaba,
>>>>
>>>>
>>>> See the answers inline below:
>>>>
>>>>
>>>> * How are you running Phoenix queries? *We are using apache-jmeter and
>>>> the jdbc sampler.*
>>>> * Were the concurrent Phoenix queries using the same JVM? *Yes.*
>>>> * Was the JVM restarted after changing number of concurrent users?
>>>> *Yes.*
>>>> * Is the response time plotted when query is executed for the first
>>>> time or second or average of both? *Average. We see response times
>>>> ranging significantly even via sqlline. i.e. the same query run 11 times
>>>> sequentially takes anything between 17ms to around 489ms with no other load
>>>> on the server.*
>>>> * Is the UUID filtered on randomly distributed? *Yes.*
>>>> * Does UUID match a single row? *Yes.*
>>>> * It seems that even non-concurrent Phoenix query which filters on UUID
>>>> takes 500ms in your environment. Can you try the same query in Sqlline a
>>>> few times and see how much time it takes for each run?
>>>> *We run the same query 11 times via sqlline and these were the response
>>>> times: *
>>>> *1 row selected (0.489 seconds)*
>>>> *1 row selected (0.279 seconds)*
>>>> *1 row selected (0.227 seconds)*
>>>> *1 row selected (0.22 seconds)*
>>>> *1 row selected (0.17 seconds)*
>>>> *1 row selected (0.152 seconds)*
>>>> *1 row selected (0.129 seconds)*
>>>> *1 row selected (0.17 seconds)*
>>>> *1 row selected (0.153 seconds)*
>>>> *1 row selected (0.259 seconds)*
>>>> *1 row selected (0.102 seconds)*
>>>>
>>>> * What is the explain <https://phoenix.apache.org/language/#explain>
>>>> plan for your Phoenix query? *CLIENT 1-CHUNK PARALLEL 1-WAY ROUND
>>>> ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS*
>>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>>> table and reconnect Sqlline and execute the query again. *I think the
>>>> issue is that the response times vary a lot, with 600 concurrent users the
>>>> same query can take anything between 2ms to 10s.*
>>>> * Can you share your table schema and how you ran Phoenix queries and
>>>> your HBase equivalent code? *It is a simple table with 15 columns, the
>>>> primary key is the uuid which is of type VARCHAR(36). The hbase equivalent
>>>> code is:*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> * HTableInterface hTable = pool.getTable("schema.DOCUMENTS");Get get =
>>>> new Get(toBytes(saltPrefix + uuid));Result result = hTable.get(get); *
>>>>
>>>> * Any phoenix tuning defaults that you changed? *No.*
>>>>
>>>> Kind Regards,
>>>>
>>>>
>>>> Edu
>>>>
>>>>
>>>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org>
>>>> wrote:
>>>>
>>>>> Something seems inherently wrong in these test results.
>>>>>
>>>>> * How are you running Phoenix queries? Were the concurrent Phoenix
>>>>> queries using the same JVM? Was the JVM restarted after changing number of
>>>>> concurrent users?
>>>>> * Is the response time plotted when query is executed for the first
>>>>> time or second or average of both?
>>>>> * Is the UUID filtered on randomly distributed? Does UUID match a
>>>>> single row?
>>>>> * It seems that even non-concurrent Phoenix query which filters on
>>>>> UUID takes 500ms in your environment. Can you try the same query in Sqlline
>>>>> a few times and see how much time it takes for each run?
>>>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>>>> * Can you share your table schema and how you ran Phoenix queries and
>>>>> your HBase equivalent code?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
>>>>> e.narros@elsevier.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> We are exploring starting to use Phoenix and have done some load
>>>>>> tests to see whether Phoenix would scale. We have noted that compared to
>>>>>> HBase, Phoenix response times have a much slower average as the number of
>>>>>> concurrent users increases. We are trying to understand whether this is
>>>>>> expected or there is something we are missing out.
>>>>>>
>>>>>>
>>>>>> This is the test we have performed:
>>>>>>
>>>>>>
>>>>>>    - Create table (20 columns) and load it with 400 million records
>>>>>>    indexed via a column called 'uuid'.
>>>>>>    - Perform the following queries using 10,20,100,200,400 and 600
>>>>>>    users per second, each user will perform each query twice:
>>>>>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS
>>>>>>       where uuid = ?
>>>>>>       - Hbase equivalent to: select * from schema.DOCUMENTS where
>>>>>>       uuid = ?
>>>>>>    - The results are attached and they show that Phoenix response
>>>>>>    times are at least an order of magnitude above those of HBase
>>>>>>
>>>>>> The tests were run from the Master node of a CDH5.7.2 cluster with
>>>>>> Phoenix 4.7.0.
>>>>>>
>>>>>> Are these test results expected?
>>>>>>
>>>>>> Kind Regards,
>>>>>>
>>>>>> Edu
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>>>> Registered in England and Wales.
>>>>>>
>>>>>
>>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>> Registered in England and Wales.
>>>>
>>>
>>>
>>
> ------------------------------
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>

Re: Phoenix has slow response times compared to HBase

Posted by "Narros, Eduardo (ELS-LON)" <e....@elsevier.com>.

Hi Mujtaba,


Thanks a lot for helping us to get to the bottom of this. Looking at your test code we have observed that you are ignoring the execution time of the first query. If we do the same, our results are similar to yours.


Unfortunately, our tests cannot ignore the time it takes to fetch a record by a thread for the first time. We are interested in the response times for the first and second queries per thread. (i.e. each thread just runs 2 queries)


When we rerun your code without ignoring the first result and with the number of runs per thread equal 2, we see similar results to our original findings. We have attached our modified test harness to this email.


Kind Regards,


Edu


________________________________
From: Mujtaba Chohan <mc...@salesforce.com>
Sent: 03 September 2016 02:06:30
To: user@phoenix.apache.org
Subject: Re: Phoenix has slow response times compared to HBase

Single user average: Phoenix 8ms, HBase 5ms
50 users average: Phoenix 35ms, HBase 40ms
500 users average: Phoenix 300-400ms, HBase 350-450ms

Few notes:

* We have yet to identify why Phoenix was showing slight advantage with high number of concurrent users from single client.

* For the case with 500 concurrent users from single client, region server handler count and Phoenix thread pool size was bumped to 500 to accommodate this level of concurrency.

On Friday, September 2, 2016, James Taylor <ja...@apache.org>> wrote:
Thanks, Mujtaba. What's the average query time for HBase and Phoenix for the 1/50/500 simultaneous user scenarios?

Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table (as Mujtaba showed in his ALTER TABLE statement - you can do this in the CREATE TABLE statement as well).

Thanks,
James

On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mujtaba@apache.org<javascript:_e(%7B%7D,'cvml','mujtaba@apache.org');>> wrote:
Here is the graph that I get simulating 1, 50 and 500 concurrent users from single client. Query time for Phoenix is highly comparable with direct HBase gets.

See the chart below with query time (ms) for random point gets over large table that will not fit HBase block cache. Query/gets were executed for 1000 time for each user.

[Inline image 1]
Source code to execute gets/phoenix query simulating multiple users is at:

[https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] directhbasemt.java<https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>

[https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] directphoenixmt.java<https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>

Table DDL
create table testuuid (k varchar not null primary key, a varchar, b varchar, c varchar, d varchar, e varchar, f varchar);

alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this restricts how often server will check for metadata updates to improve performance

Table was filled with 68M rows.
Phoenix 4.8/HBase 0.98.17 running on single machine.

//mujtaba


On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <e.narros@elsevier.com<javascript:_e(%7B%7D,'cvml','e.narros@elsevier.com');>> wrote:

Hi Mujtaba,


See the answers inline below:

* How are you running Phoenix queries? We are using apache-jmeter and the jdbc sampler.
* Were the concurrent Phoenix queries using the same JVM? Yes.
* Was the JVM restarted after changing number of concurrent users? Yes.
* Is the response time plotted when query is executed for the first time or second or average of both? Average. We see response times ranging significantly even via sqlline. i.e. the same query run 11 times sequentially takes anything between 17ms to around 489ms with no other load on the server.
* Is the UUID filtered on randomly distributed? Yes.
* Does UUID match a single row? Yes.
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run? We run the same query 11 times via sqlline and these were the response times:
1 row selected (0.489 seconds)
1 row selected (0.279 seconds)
1 row selected (0.227 seconds)
1 row selected (0.22 seconds)
1 row selected (0.17 seconds)
1 row selected (0.152 seconds)
1 row selected (0.129 seconds)
1 row selected (0.17 seconds)
1 row selected (0.153 seconds)
1 row selected (0.259 seconds)
1 row selected (0.102 seconds)

* What is the explain<https://phoenix.apache.org/language/#explain> plan for your Phoenix query? CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS table and reconnect Sqlline and execute the query again. I think the issue is that the response times vary a lot, with 600 concurrent users the same query can take anything between 2ms to 10s.
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code? It is a simple table with 15 columns, the primary key is the uuid which is of type VARCHAR(36). The hbase equivalent code is:

HTableInterface hTable = pool.getTable("schema.DOCUMENTS");

Get get = new Get(toBytes(saltPrefix + uuid));

Result result = hTable.get(get);

* Any phoenix tuning defaults that you changed? No.


Kind Regards,


Edu


On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mujtaba@apache.org<javascript:_e(%7B%7D,'cvml','mujtaba@apache.org');>> wrote:
Something seems inherently wrong in these test results.

* How are you running Phoenix queries? Were the concurrent Phoenix queries using the same JVM? Was the JVM restarted after changing number of concurrent users?
* Is the response time plotted when query is executed for the first time or second or average of both?
* Is the UUID filtered on randomly distributed? Does UUID match a single row?
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run?
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code?




On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <e.narros@elsevier.com<javascript:_e(%7B%7D,'cvml','e.narros@elsevier.com');>> wrote:

Hi,


We are exploring starting to use Phoenix and have done some load tests to see whether Phoenix would scale. We have noted that compared to HBase, Phoenix response times have a much slower average as the number of concurrent users increases. We are trying to understand whether this is expected or there is something we are missing out.


This is the test we have performed:

  *   Create table (20 columns) and load it with 400 million records indexed via a column called 'uuid'.
  *   Perform the following queries using 10,20,100,200,400 and 600 users per second, each user will perform each query twice:
     *   Phoenix: select * from schema.DOCUMENTS where uuid = ?
     *   Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where uuid = ?
     *   Hbase equivalent to: select * from schema.DOCUMENTS where uuid = ?
  *   The results are attached and they show that Phoenix response times are at least an order of magnitude above those of HBase

The tests were run from the Master node of a CDH5.7.2 cluster with Phoenix 4.7.0.

Are these test results expected?

Kind Regards,

Edu

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Re: Phoenix has slow response times compared to HBase

Posted by Mujtaba Chohan <mc...@salesforce.com>.

Single user average: Phoenix 8ms, HBase 5ms
50 users average: Phoenix 35ms, HBase 40ms
500 users average: Phoenix 300-400ms, HBase 350-450ms

Few notes:

* We have yet to identify why Phoenix was showing slight advantage with
high number of concurrent users from single client.

* For the case with 500 concurrent users from single client, region server
handler count and Phoenix thread pool size was bumped to 500 to accommodate
this level of concurrency.

On Friday, September 2, 2016, James Taylor <ja...@apache.org> wrote:

> Thanks, Mujtaba. What's the average query time for HBase and Phoenix for
> the 1/50/500 simultaneous user scenarios?
>
> Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table
> (as Mujtaba showed in his ALTER TABLE statement - you can do this in the
> CREATE TABLE statement as well).
>
> Thanks,
> James
>
> On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mujtaba@apache.org
> <javascript:_e(%7B%7D,'cvml','mujtaba@apache.org');>> wrote:
>
>> Here is the graph that I get simulating 1, 50 and 500 concurrent users
>> from single client. Query time for Phoenix is highly comparable with direct
>> HBase gets.
>>
>> See the chart below with query time (ms) for random point gets over large
>> table that will not fit HBase block cache. Query/gets were executed for
>> 1000 time for each user.
>>
>> [image: Inline image 1]
>> Source code to execute gets/phoenix query simulating multiple users is at:
>> 
>>  directhbasemt.java
>> <https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>
>> 
>>  directphoenixmt.java
>> <https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>
>> 
>> Table DDL
>> create table testuuid (k varchar not null primary key, a varchar, b
>> varchar, c varchar, d varchar, e varchar, f varchar);
>>
>> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this
>> restricts how often server will check for metadata updates to improve
>> performance
>>
>> Table was filled with 68M rows.
>> Phoenix 4.8/HBase 0.98.17 running on single machine.
>>
>> //mujtaba
>>
>>
>> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <
>> e.narros@elsevier.com
>> <javascript:_e(%7B%7D,'cvml','e.narros@elsevier.com');>> wrote:
>>
>>> Hi Mujtaba,
>>>
>>>
>>> See the answers inline below:
>>>
>>>
>>> * How are you running Phoenix queries? *We are using apache-jmeter and
>>> the jdbc sampler.*
>>> * Were the concurrent Phoenix queries using the same JVM? *Yes.*
>>> * Was the JVM restarted after changing number of concurrent users?
>>> *Yes.*
>>> * Is the response time plotted when query is executed for the first time
>>> or second or average of both? *Average. We see response times ranging
>>> significantly even via sqlline. i.e. the same query run 11 times
>>> sequentially takes anything between 17ms to around 489ms with no other load
>>> on the server.*
>>> * Is the UUID filtered on randomly distributed? *Yes.*
>>> * Does UUID match a single row? *Yes.*
>>> * It seems that even non-concurrent Phoenix query which filters on UUID
>>> takes 500ms in your environment. Can you try the same query in Sqlline a
>>> few times and see how much time it takes for each run?
>>> *We run the same query 11 times via sqlline and these were the response
>>> times: *
>>> *1 row selected (0.489 seconds)*
>>> *1 row selected (0.279 seconds)*
>>> *1 row selected (0.227 seconds)*
>>> *1 row selected (0.22 seconds)*
>>> *1 row selected (0.17 seconds)*
>>> *1 row selected (0.152 seconds)*
>>> *1 row selected (0.129 seconds)*
>>> *1 row selected (0.17 seconds)*
>>> *1 row selected (0.153 seconds)*
>>> *1 row selected (0.259 seconds)*
>>> *1 row selected (0.102 seconds)*
>>>
>>> * What is the explain <https://phoenix.apache.org/language/#explain>
>>> plan for your Phoenix query? *CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN
>>> POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS*
>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>> table and reconnect Sqlline and execute the query again. *I think the
>>> issue is that the response times vary a lot, with 600 concurrent users the
>>> same query can take anything between 2ms to 10s.*
>>> * Can you share your table schema and how you ran Phoenix queries and
>>> your HBase equivalent code? *It is a simple table with 15 columns, the
>>> primary key is the uuid which is of type VARCHAR(36). The hbase equivalent
>>> code is:*
>>>
>>>
>>>
>>>
>>>
>>> * HTableInterface hTable = pool.getTable("schema.DOCUMENTS");Get get =
>>> new Get(toBytes(saltPrefix + uuid));Result result = hTable.get(get); *
>>>
>>> * Any phoenix tuning defaults that you changed? *No.*
>>>
>>> Kind Regards,
>>>
>>>
>>> Edu
>>>
>>>
>>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mujtaba@apache.org
>>> <javascript:_e(%7B%7D,'cvml','mujtaba@apache.org');>> wrote:
>>>
>>>> Something seems inherently wrong in these test results.
>>>>
>>>> * How are you running Phoenix queries? Were the concurrent Phoenix
>>>> queries using the same JVM? Was the JVM restarted after changing number of
>>>> concurrent users?
>>>> * Is the response time plotted when query is executed for the first
>>>> time or second or average of both?
>>>> * Is the UUID filtered on randomly distributed? Does UUID match a
>>>> single row?
>>>> * It seems that even non-concurrent Phoenix query which filters on UUID
>>>> takes 500ms in your environment. Can you try the same query in Sqlline a
>>>> few times and see how much time it takes for each run?
>>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>>> * Can you share your table schema and how you ran Phoenix queries and
>>>> your HBase equivalent code?
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
>>>> e.narros@elsevier.com
>>>> <javascript:_e(%7B%7D,'cvml','e.narros@elsevier.com');>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> We are exploring starting to use Phoenix and have done some load tests
>>>>> to see whether Phoenix would scale. We have noted that compared to HBase,
>>>>> Phoenix response times have a much slower average as the number of
>>>>> concurrent users increases. We are trying to understand whether this is
>>>>> expected or there is something we are missing out.
>>>>>
>>>>>
>>>>> This is the test we have performed:
>>>>>
>>>>>
>>>>>    - Create table (20 columns) and load it with 400 million records
>>>>>    indexed via a column called 'uuid'.
>>>>>    - Perform the following queries using 10,20,100,200,400 and 600
>>>>>    users per second, each user will perform each query twice:
>>>>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS
>>>>>       where uuid = ?
>>>>>       - Hbase equivalent to: select * from schema.DOCUMENTS where
>>>>>       uuid = ?
>>>>>    - The results are attached and they show that Phoenix response
>>>>>    times are at least an order of magnitude above those of HBase
>>>>>
>>>>> The tests were run from the Master node of a CDH5.7.2 cluster with
>>>>> Phoenix 4.7.0.
>>>>>
>>>>> Are these test results expected?
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Edu
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>>> Registered in England and Wales.
>>>>>
>>>>
>>>>
>>>
>>> ------------------------------
>>>
>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>> Registered in England and Wales.
>>>
>>
>>
>

Re: Phoenix has slow response times compared to HBase

Posted by James Taylor <ja...@apache.org>.

Thanks, Mujtaba. What's the average query time for HBase and Phoenix for
the 1/50/500 simultaneous user scenarios?

Edu - make sure to set the UPDATE_CACHE_FREQUENCY property on the table (as
Mujtaba showed in his ALTER TABLE statement - you can do this in the CREATE
TABLE statement as well).

Thanks,
James

On Fri, Sep 2, 2016 at 5:40 PM, Mujtaba Chohan <mu...@apache.org> wrote:

> Here is the graph that I get simulating 1, 50 and 500 concurrent users
> from single client. Query time for Phoenix is highly comparable with direct
> HBase gets.
>
> See the chart below with query time (ms) for random point gets over large
> table that will not fit HBase block cache. Query/gets were executed for
> 1000 time for each user.
>
> [image: Inline image 1]
> Source code to execute gets/phoenix query simulating multiple users is at:
> 
>  directhbasemt.java
> <https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>
> 
>  directphoenixmt.java
> <https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>
> 
> Table DDL
> create table testuuid (k varchar not null primary key, a varchar, b
> varchar, c varchar, d varchar, e varchar, f varchar);
>
> alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this
> restricts how often server will check for metadata updates to improve
> performance
>
> Table was filled with 68M rows.
> Phoenix 4.8/HBase 0.98.17 running on single machine.
>
> //mujtaba
>
>
> On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <
> e.narros@elsevier.com> wrote:
>
>> Hi Mujtaba,
>>
>>
>> See the answers inline below:
>>
>>
>> * How are you running Phoenix queries? *We are using apache-jmeter and
>> the jdbc sampler.*
>> * Were the concurrent Phoenix queries using the same JVM? *Yes.*
>> * Was the JVM restarted after changing number of concurrent users? *Yes.*
>> * Is the response time plotted when query is executed for the first time
>> or second or average of both? *Average. We see response times ranging
>> significantly even via sqlline. i.e. the same query run 11 times
>> sequentially takes anything between 17ms to around 489ms with no other load
>> on the server.*
>> * Is the UUID filtered on randomly distributed? *Yes.*
>> * Does UUID match a single row? *Yes.*
>> * It seems that even non-concurrent Phoenix query which filters on UUID
>> takes 500ms in your environment. Can you try the same query in Sqlline a
>> few times and see how much time it takes for each run?
>> *We run the same query 11 times via sqlline and these were the response
>> times: *
>> *1 row selected (0.489 seconds)*
>> *1 row selected (0.279 seconds)*
>> *1 row selected (0.227 seconds)*
>> *1 row selected (0.22 seconds)*
>> *1 row selected (0.17 seconds)*
>> *1 row selected (0.152 seconds)*
>> *1 row selected (0.129 seconds)*
>> *1 row selected (0.17 seconds)*
>> *1 row selected (0.153 seconds)*
>> *1 row selected (0.259 seconds)*
>> *1 row selected (0.102 seconds)*
>>
>> * What is the explain <https://phoenix.apache.org/language/#explain>
>> plan for your Phoenix query? *CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN
>> POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS*
>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>> table and reconnect Sqlline and execute the query again. *I think the
>> issue is that the response times vary a lot, with 600 concurrent users the
>> same query can take anything between 2ms to 10s.*
>> * Can you share your table schema and how you ran Phoenix queries and
>> your HBase equivalent code? *It is a simple table with 15 columns, the
>> primary key is the uuid which is of type VARCHAR(36). The hbase equivalent
>> code is:*
>>
>>
>>
>>
>>
>> * HTableInterface hTable = pool.getTable("schema.DOCUMENTS");Get get =
>> new Get(toBytes(saltPrefix + uuid));Result result = hTable.get(get); *
>>
>> * Any phoenix tuning defaults that you changed? *No.*
>>
>> Kind Regards,
>>
>>
>> Edu
>>
>>
>> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org>
>> wrote:
>>
>>> Something seems inherently wrong in these test results.
>>>
>>> * How are you running Phoenix queries? Were the concurrent Phoenix
>>> queries using the same JVM? Was the JVM restarted after changing number of
>>> concurrent users?
>>> * Is the response time plotted when query is executed for the first time
>>> or second or average of both?
>>> * Is the UUID filtered on randomly distributed? Does UUID match a single
>>> row?
>>> * It seems that even non-concurrent Phoenix query which filters on UUID
>>> takes 500ms in your environment. Can you try the same query in Sqlline a
>>> few times and see how much time it takes for each run?
>>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>>> * Can you share your table schema and how you ran Phoenix queries and
>>> your HBase equivalent code?
>>>
>>>
>>>
>>>
>>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
>>> e.narros@elsevier.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> We are exploring starting to use Phoenix and have done some load tests
>>>> to see whether Phoenix would scale. We have noted that compared to HBase,
>>>> Phoenix response times have a much slower average as the number of
>>>> concurrent users increases. We are trying to understand whether this is
>>>> expected or there is something we are missing out.
>>>>
>>>>
>>>> This is the test we have performed:
>>>>
>>>>
>>>>    - Create table (20 columns) and load it with 400 million records
>>>>    indexed via a column called 'uuid'.
>>>>    - Perform the following queries using 10,20,100,200,400 and 600
>>>>    users per second, each user will perform each query twice:
>>>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS
>>>>       where uuid = ?
>>>>       - Hbase equivalent to: select * from schema.DOCUMENTS where uuid
>>>>       = ?
>>>>    - The results are attached and they show that Phoenix response
>>>>    times are at least an order of magnitude above those of HBase
>>>>
>>>> The tests were run from the Master node of a CDH5.7.2 cluster with
>>>> Phoenix 4.7.0.
>>>>
>>>> Are these test results expected?
>>>>
>>>> Kind Regards,
>>>>
>>>> Edu
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>> Registered in England and Wales.
>>>>
>>>
>>>
>>
>> ------------------------------
>>
>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>> Registered in England and Wales.
>>
>
>

Re: Phoenix has slow response times compared to HBase

Posted by Mujtaba Chohan <mu...@apache.org>.

Here is the graph that I get simulating 1, 50 and 500 concurrent users from
single client. Query time for Phoenix is highly comparable with direct
HBase gets.

See the chart below with query time (ms) for random point gets over large
table that will not fit HBase block cache. Query/gets were executed for
1000 time for each user.

[image: Inline image 1]
Source code to execute gets/phoenix query simulating multiple users is at:

 directhbasemt.java
<https://drive.google.com/file/d/0B0AJDm160pciTVlhSzU3aWtrNW8/view?usp=drive_web>

 directphoenixmt.java
<https://drive.google.com/file/d/0B0AJDm160pciYkw4SzNVQy1meHM/view?usp=drive_web>

Table DDL
create table testuuid (k varchar not null primary key, a varchar, b
varchar, c varchar, d varchar, e varchar, f varchar);

alter table testuuid set "UPDATE_CACHE_FREQUENCY"=150000; // this restricts
how often server will check for metadata updates to improve performance

Table was filled with 68M rows.
Phoenix 4.8/HBase 0.98.17 running on single machine.

//mujtaba


On Thu, Sep 1, 2016 at 3:34 AM, Narros, Eduardo (ELS-LON) <
e.narros@elsevier.com> wrote:

> Hi Mujtaba,
>
>
> See the answers inline below:
>
>
> * How are you running Phoenix queries? *We are using apache-jmeter and
> the jdbc sampler.*
> * Were the concurrent Phoenix queries using the same JVM? *Yes.*
> * Was the JVM restarted after changing number of concurrent users? *Yes.*
> * Is the response time plotted when query is executed for the first time
> or second or average of both? *Average. We see response times ranging
> significantly even via sqlline. i.e. the same query run 11 times
> sequentially takes anything between 17ms to around 489ms with no other load
> on the server.*
> * Is the UUID filtered on randomly distributed? *Yes.*
> * Does UUID match a single row? *Yes.*
> * It seems that even non-concurrent Phoenix query which filters on UUID
> takes 500ms in your environment. Can you try the same query in Sqlline a
> few times and see how much time it takes for each run?
> *We run the same query 11 times via sqlline and these were the response
> times: *
> *1 row selected (0.489 seconds)*
> *1 row selected (0.279 seconds)*
> *1 row selected (0.227 seconds)*
> *1 row selected (0.22 seconds)*
> *1 row selected (0.17 seconds)*
> *1 row selected (0.152 seconds)*
> *1 row selected (0.129 seconds)*
> *1 row selected (0.17 seconds)*
> *1 row selected (0.153 seconds)*
> *1 row selected (0.259 seconds)*
> *1 row selected (0.102 seconds)*
>
> * What is the explain <https://phoenix.apache.org/language/#explain> plan
> for your Phoenix query? *CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN POINT
> LOOKUP ON 1 KEY OVER schema.DOCUMENTS*
> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
> table and reconnect Sqlline and execute the query again. *I think the
> issue is that the response times vary a lot, with 600 concurrent users the
> same query can take anything between 2ms to 10s.*
> * Can you share your table schema and how you ran Phoenix queries and your
> HBase equivalent code? *It is a simple table with 15 columns, the primary
> key is the uuid which is of type VARCHAR(36). The hbase equivalent code is:*
>
>
>
>
>
> * HTableInterface hTable = pool.getTable("schema.DOCUMENTS");Get get = new
> Get(toBytes(saltPrefix + uuid));Result result = hTable.get(get); *
>
> * Any phoenix tuning defaults that you changed? *No.*
>
> Kind Regards,
>
>
> Edu
>
>
> On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org>
> wrote:
>
>> Something seems inherently wrong in these test results.
>>
>> * How are you running Phoenix queries? Were the concurrent Phoenix
>> queries using the same JVM? Was the JVM restarted after changing number of
>> concurrent users?
>> * Is the response time plotted when query is executed for the first time
>> or second or average of both?
>> * Is the UUID filtered on randomly distributed? Does UUID match a single
>> row?
>> * It seems that even non-concurrent Phoenix query which filters on UUID
>> takes 500ms in your environment. Can you try the same query in Sqlline a
>> few times and see how much time it takes for each run?
>> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
>> * Can you share your table schema and how you ran Phoenix queries and
>> your HBase equivalent code?
>>
>>
>>
>>
>> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
>> e.narros@elsevier.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> We are exploring starting to use Phoenix and have done some load tests
>>> to see whether Phoenix would scale. We have noted that compared to HBase,
>>> Phoenix response times have a much slower average as the number of
>>> concurrent users increases. We are trying to understand whether this is
>>> expected or there is something we are missing out.
>>>
>>>
>>> This is the test we have performed:
>>>
>>>
>>>    - Create table (20 columns) and load it with 400 million records
>>>    indexed via a column called 'uuid'.
>>>    - Perform the following queries using 10,20,100,200,400 and 600
>>>    users per second, each user will perform each query twice:
>>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS
>>>       where uuid = ?
>>>       - Hbase equivalent to: select * from schema.DOCUMENTS where uuid
>>>       = ?
>>>    - The results are attached and they show that Phoenix response times
>>>    are at least an order of magnitude above those of HBase
>>>
>>> The tests were run from the Master node of a CDH5.7.2 cluster with
>>> Phoenix 4.7.0.
>>>
>>> Are these test results expected?
>>>
>>> Kind Regards,
>>>
>>> Edu
>>>
>>>
>>> ------------------------------
>>>
>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>> Registered in England and Wales.
>>>
>>
>>
>
> ------------------------------
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>

Re: Phoenix has slow response times compared to HBase

Posted by "Narros, Eduardo (ELS-LON)" <e....@elsevier.com>.

Hi Mujtaba,

See the answers inline below:

* How are you running Phoenix queries? We are using apache-jmeter and the jdbc sampler.
* Were the concurrent Phoenix queries using the same JVM? Yes.
* Was the JVM restarted after changing number of concurrent users? Yes.
* Is the response time plotted when query is executed for the first time or second or average of both? Average. We see response times ranging significantly even via sqlline. i.e. the same query run 11 times sequentially takes anything between 17ms to around 489ms with no other load on the server.
* Is the UUID filtered on randomly distributed? Yes.
* Does UUID match a single row? Yes.
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run? We run the same query 11 times via sqlline and these were the response times:
1 row selected (0.489 seconds)
1 row selected (0.279 seconds)
1 row selected (0.227 seconds)
1 row selected (0.22 seconds)
1 row selected (0.17 seconds)
1 row selected (0.152 seconds)
1 row selected (0.129 seconds)
1 row selected (0.17 seconds)
1 row selected (0.153 seconds)
1 row selected (0.259 seconds)
1 row selected (0.102 seconds)

* What is the explain<https://phoenix.apache.org/language/#explain> plan for your Phoenix query? CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER schema.DOCUMENTS
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS table and reconnect Sqlline and execute the query again. I think the issue is that the response times vary a lot, with 600 concurrent users the same query can take anything between 2ms to 10s.
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code? It is a simple table with 15 columns, the primary key is the uuid which is of type VARCHAR(36). The hbase equivalent code is:

HTableInterface hTable = pool.getTable("schema.DOCUMENTS");

Get get = new Get(toBytes(saltPrefix + uuid));

Result result = hTable.get(get);

* Any phoenix tuning defaults that you changed? No.

Kind Regards,

Edu

On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org>> wrote:
Something seems inherently wrong in these test results.

* How are you running Phoenix queries? Were the concurrent Phoenix queries using the same JVM? Was the JVM restarted after changing number of concurrent users?
* Is the response time plotted when query is executed for the first time or second or average of both?
* Is the UUID filtered on randomly distributed? Does UUID match a single row?
* It seems that even non-concurrent Phoenix query which filters on UUID takes 500ms in your environment. Can you try the same query in Sqlline a few times and see how much time it takes for each run?
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
* Can you share your table schema and how you ran Phoenix queries and your HBase equivalent code?

On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <e....@elsevier.com>> wrote:

Hi,

We are exploring starting to use Phoenix and have done some load tests to see whether Phoenix would scale. We have noted that compared to HBase, Phoenix response times have a much slower average as the number of concurrent users increases. We are trying to understand whether this is expected or there is something we are missing out.

This is the test we have performed:

* Create table (20 columns) and load it with 400 million records indexed via a column called 'uuid'.
* Perform the following queries using 10,20,100,200,400 and 600 users per second, each user will perform each query twice:
* Phoenix: select * from schema.DOCUMENTS where uuid = ?
* Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where uuid = ?
* Hbase equivalent to: select * from schema.DOCUMENTS where uuid = ?
* The results are attached and they show that Phoenix response times are at least an order of magnitude above those of HBase

The tests were run from the Master node of a CDH5.7.2 cluster with Phoenix 4.7.0.

Are these test results expected?

Kind Regards,

Edu

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Re: Phoenix has slow response times compared to HBase

Posted by Mujtaba Chohan <mu...@apache.org>.

Something seems inherently wrong in these test results.

* How are you running Phoenix queries? Were the concurrent Phoenix queries
using the same JVM? Was the JVM restarted after changing number of
concurrent users?
* Is the response time plotted when query is executed for the first time or
second or average of both?
* Is the UUID filtered on randomly distributed? Does UUID match a single
row?
* It seems that even non-concurrent Phoenix query which filters on UUID
takes 500ms in your environment. Can you try the same query in Sqlline a
few times and see how much time it takes for each run?
* What is the explain <https://phoenix.apache.org/language/#explain> plan
for your Phoenix query?
* If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
table and reconnect Sqlline and execute the query again
* Can you share your table schema and how you ran Phoenix queries and your
HBase equivalent code?
* Any phoenix tuning defaults that you changed?

Thanks,
Mujtaba

(previous response wasn't complete before I hit send)

On Wed, Aug 31, 2016 at 10:40 AM, Mujtaba Chohan <mu...@apache.org> wrote:

> Something seems inherently wrong in these test results.
>
> * How are you running Phoenix queries? Were the concurrent Phoenix queries
> using the same JVM? Was the JVM restarted after changing number of
> concurrent users?
> * Is the response time plotted when query is executed for the first time
> or second or average of both?
> * Is the UUID filtered on randomly distributed? Does UUID match a single
> row?
> * It seems that even non-concurrent Phoenix query which filters on UUID
> takes 500ms in your environment. Can you try the same query in Sqlline a
> few times and see how much time it takes for each run?
> * If it's slow in Sqlline as well then try truncating your SYSTEM.STATS
> * Can you share your table schema and how you ran Phoenix queries and your
> HBase equivalent code?
>
>
>
>
> On Wed, Aug 31, 2016 at 5:42 AM, Narros, Eduardo (ELS-LON) <
> e.narros@elsevier.com> wrote:
>
>> Hi,
>>
>>
>> We are exploring starting to use Phoenix and have done some load tests to
>> see whether Phoenix would scale. We have noted that compared to HBase,
>> Phoenix response times have a much slower average as the number of
>> concurrent users increases. We are trying to understand whether this is
>> expected or there is something we are missing out.
>>
>>
>> This is the test we have performed:
>>
>>
>>    - Create table (20 columns) and load it with 400 million records
>>    indexed via a column called 'uuid'.
>>    - Perform the following queries using 10,20,100,200,400 and 600 users
>>    per second, each user will perform each query twice:
>>       - Phoenix: select * from schema.DOCUMENTS where uuid = ?
>>       - Phoenix: select /*+ SERIAL SMALL */* from schema.DOCUMENTS where
>>       uuid = ?
>>       - Hbase equivalent to: select * from schema.DOCUMENTS where uuid =
>>       ?
>>    - The results are attached and they show that Phoenix response times
>>    are at least an order of magnitude above those of HBase
>>
>> The tests were run from the Master node of a CDH5.7.2 cluster with
>> Phoenix 4.7.0.
>>
>> Are these test results expected?
>>
>> Kind Regards,
>>
>> Edu
>>
>>
>> ------------------------------
>>
>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>> Registered in England and Wales.
>>
>
>