You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Job Thomas <jo...@suntecgroup.com> on 2013/11/26 09:26:49 UTC

HBase: Paralel Query

 
 
Hi All,
 
 How can we configure Hbase  inorder to perform multythreading/parallel query faster .
 
These are some bits from my analysis:
  
Each Thread contain 10 query ( Random) 
 
 Tread        H2(Msec)  Phoenix(Msec)
  1            34             215
  2            63             222
  4            120           324
  6            200          340
  8           250           460
  10         350          560
  12          410         592
 
I have to find some points in the graph ploted with these values where lines are intercepting .
So I need hbase to perform well with multythreaded condition .
 
 
Best Regards,
Job M Thomas

Re: HBase: Paralel Query

Posted by Asaf Mesika <as...@gmail.com>.

The need is too broad. You need to do through the HBase jmx metrics, ands
machine metrics to see what is your bottleneck.

On Tuesday, November 26, 2013, Job Thomas wrote:

>
>
> Hi All,
>
>  How can we configure Hbase  inorder to perform multythreading/parallel
> query faster .
>
> These are some bits from my analysis:
>
> Each Thread contain 10 query ( Random)
>
>  Tread        H2(Msec)  Phoenix(Msec)
>   1            34             215
>   2            63             222
>   4            120           324
>   6            200          340
>   8           250           460
>   10         350          560
>   12          410         592
>
> I have to find some points in the graph ploted with these values where
> lines are intercepting .
> So I need hbase to perform well with multythreaded condition .
>
>
> Best Regards,
> Job M Thomas
>

RE: HBase: Paralel Query

Posted by Job Thomas <jo...@suntecgroup.com>.

Here is the describtion of two tables created :
************************************************************************************************************************************************************************
FIRST TABLE 
************************************************************************************************************************************************************************
hbase(main):013:0> describe 'TEST5MILLION8KB'
DESCRIPTION                                                                                            ENABLED
 'TEST5MILLION8KB', {METHOD => 'table_att', coprocessor$1 => '|com.salesforce.phoenix.coprocessor.Scan true
 RegionObserver|1|', coprocessor$2 => '|com.salesforce.phoenix.coprocessor.UngroupedAggregateRegionObs
 erver|1|', coprocessor$3 => '|com.salesforce.phoenix.coprocessor.GroupedAggregateRegionObserver|1|',
 coprocessor$4 => '|com.salesforce.phoenix.join.HashJoiningRegionObserver|1|', coprocessor$5 => '|com.
 salesforce.phoenix.coprocessor.ServerCachingEndpointImpl|1|', coprocessor$6 => '|com.salesforce.hbase
 .index.Indexer|1073741823|com.salesforce.hbase.index.codec.class=com.salesforce.phoenix.index.Phoenix
 IndexCodec,index.builder=com.salesforce.phoenix.index.PhoenixIndexBuilder'}, {NAME => 'M', DATA_BLOCK
 _ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSIO
 N => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'true', BLOCKSIZE => '81
 92', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}
************************************************************************************************************************************************************************
After first table created , I got very  good performance .
Then created second table  
************************************************************************************************************************************************************************
hbase(main):014:0> describe 'TEST5MILLION8KB2'
DESCRIPTION                                                                                            ENABLED
 'TEST5MILLION8KB2', {METHOD => 'table_att', coprocessor$1 => '|com.salesforce.phoenix.coprocessor.Sca true
 nRegionObserver|1|', coprocessor$2 => '|com.salesforce.phoenix.coprocessor.UngroupedAggregateRegionOb
 server|1|', coprocessor$3 => '|com.salesforce.phoenix.coprocessor.GroupedAggregateRegionObserver|1|',
  coprocessor$4 => '|com.salesforce.phoenix.join.HashJoiningRegionObserver|1|', coprocessor$5 => '|com
 .salesforce.phoenix.coprocessor.ServerCachingEndpointImpl|1|', coprocessor$6 => '|com.salesforce.hbas
 e.index.Indexer|1073741823|com.salesforce.hbase.index.codec.class=com.salesforce.phoenix.index.Phoeni
 xIndexCodec,index.builder=com.salesforce.phoenix.index.PhoenixIndexBuilder'}, {NAME => 'M', DATA_BLOC
 K_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSI
 ON => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'true', BLOCKSIZE => '8
 192', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}
1 row(s) in 0.0600 seconds
************************************************************************************************************************************************************************
the performance of second table has been decreased dramaticaly and that of second table is very good.
************************************************************************************************************************************************************************
 
Please give your suggestions.
With Thanks,
 
Best Regards,
Job M Thomas

________________________________

From: Job Thomas [mailto:jobt@suntecgroup.com]
Sent: Wed 11/27/2013 1:52 PM
To: user@hbase.apache.org; user@hbase.apache.org
Subject: RE: HBase: Paralel Query




Hi Ted,All

I have set

hfile.block.cache.size to 0.6
hbase.regionserver.handler.count to 60
DATA_BLOCK _ENCODING => 'FAST_DIFF'
 BLOOMFILTER => 'ROW'
 BLOCKSIZE => '8192'
 BLOCKCACHE => 'true'

The performance has been increased.

But after creating another table with same size and configurations , the performance of previous table has been reduced and I am getting good performance for the new table created.

I have seen that whle querying out of maxHeapMB=15983 Hbase using only  usedHeapMB=72.
why hbase not utilizing heap space even though I have set BLOCKSIZE => '8192' ( For to store more number of indexes in memory ).

I have read that once block size of hfile has been reduce, the sequential access speed will decrease . but I didn't experienced this  even though my   BLOCKSIZE is 192'

Best Regards,
Job M Thomas

________________________________

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Wed 11/27/2013 11:48 AM
To: user@hbase.apache.org
Subject: Re: HBase: Paralel Query



bq. I didn't enabled blockcache

What if you enable blockcache ?

Cheers


On Tue, Nov 26, 2013 at 8:45 PM, Job Thomas <jo...@suntecgroup.com> wrote:

> Hello lars,
>
> Here re the answers ,
>
> -> I have only one region server. ( I am testing Hbase via phoenix with
> Hbase in a single server).
> -> All queries are fired through Phoenix only.( select Lastname from
> tablename where Id=? ( Here Id is the primary key))
> -> hbase.regionserver.handler.count=30(default value).
> -> Hardware:   Core =8
>                      Ram =8 Gb
> -> I didn't enabled blockcache.
> -> Are the client in multiple threads in the process or multiple
> processes? - I am not clear
>
>
> Best Regards,
> Job M Thomas
>
> ________________________________
>
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Tue 11/26/2013 11:16 PM
> To: user@hbase.apache.org
> Subject: Re: HBase: Paralel Query
>
>
>
> Hi Job,
>
> first off some questions :)
> How many regions are you accessing?
> What type of query is this (get or scan)?
> How many handlers have you configured?
> What does you hardware look like (how many cores, etc)?
> Is the data all in the blockcache?
> If not, what does the disk IO look like?
> Are the client in multiple threads in the process or multiple processes?
>
>
> Sorry for all the questions, but we need a bit more data.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Job Thomas <jo...@suntecgroup.com>
> To: user@hbase.apache.org
> Sent: Tuesday, November 26, 2013 12:26 AM
> Subject: HBase: Paralel Query
>
>
>
>
> Hi All,
>
> How can we configure Hbase  inorder to perform multythreading/parallel
> query faster .
>
> These are some bits from my analysis:
>
> Each Thread contain 10 query ( Random)
>
> Tread        H2(Msec)  Phoenix(Msec)
>   1            34             215
>   2            63             222
>   4            120           324
>   6            200          340
>   8           250           460
>   10         350          560
>   12          410         592
>
> I have to find some points in the graph ploted with these values where
> lines are intercepting .
> So I need hbase to perform well with multythreaded condition .
>
>
> Best Regards,
> Job M Thomas
>
>

Re: HBase: Paralel Query

Posted by Ted Yu <yu...@gmail.com>.

bq. out of maxHeapMB=15983

In previous email you said RAM is 8GB. Above figure is larger than 8GB.

There're 6 coprocessors installed on each table.
I wonder if what you observed was related to HBASE-10047.

Cheers


On Wed, Nov 27, 2013 at 12:22 AM, Job Thomas <jo...@suntecgroup.com> wrote:

>
> Hi Ted,All
>
> I have set
>
> hfile.block.cache.size to 0.6
> hbase.regionserver.handler.count to 60
> DATA_BLOCK _ENCODING => 'FAST_DIFF'
>  BLOOMFILTER => 'ROW'
>  BLOCKSIZE => '8192'
>  BLOCKCACHE => 'true'
>
> The performance has been increased.
>
> But after creating another table with same size and configurations , the
> performance of previous table has been reduced and I am getting good
> performance for the new table created.
>
> I have seen that whle querying out of maxHeapMB=15983 Hbase using only
>  usedHeapMB=72.
> why hbase not utilizing heap space even though I have set BLOCKSIZE =>
> '8192' ( For to store more number of indexes in memory ).
>
> I have read that once block size of hfile has been reduce, the sequential
> access speed will decrease . but I didn't experienced this  even though my
>   BLOCKSIZE is 192'
>
> Best Regards,
> Job M Thomas
>
> ________________________________
>
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Wed 11/27/2013 11:48 AM
> To: user@hbase.apache.org
> Subject: Re: HBase: Paralel Query
>
>
>
> bq. I didn't enabled blockcache
>
> What if you enable blockcache ?
>
> Cheers
>
>
> On Tue, Nov 26, 2013 at 8:45 PM, Job Thomas <jo...@suntecgroup.com> wrote:
>
> > Hello lars,
> >
> > Here re the answers ,
> >
> > -> I have only one region server. ( I am testing Hbase via phoenix with
> > Hbase in a single server).
> > -> All queries are fired through Phoenix only.( select Lastname from
> > tablename where Id=? ( Here Id is the primary key))
> > -> hbase.regionserver.handler.count=30(default value).
> > -> Hardware:   Core =8
> >                      Ram =8 Gb
> > -> I didn't enabled blockcache.
> > -> Are the client in multiple threads in the process or multiple
> > processes? - I am not clear
> >
> >
> > Best Regards,
> > Job M Thomas
> >
> > ________________________________
> >
> > From: lars hofhansl [mailto:larsh@apache.org]
> > Sent: Tue 11/26/2013 11:16 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase: Paralel Query
> >
> >
> >
> > Hi Job,
> >
> > first off some questions :)
> > How many regions are you accessing?
> > What type of query is this (get or scan)?
> > How many handlers have you configured?
> > What does you hardware look like (how many cores, etc)?
> > Is the data all in the blockcache?
> > If not, what does the disk IO look like?
> > Are the client in multiple threads in the process or multiple processes?
> >
> >
> > Sorry for all the questions, but we need a bit more data.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Job Thomas <jo...@suntecgroup.com>
> > To: user@hbase.apache.org
> > Sent: Tuesday, November 26, 2013 12:26 AM
> > Subject: HBase: Paralel Query
> >
> >
> >
> >
> > Hi All,
> >
> > How can we configure Hbase  inorder to perform multythreading/parallel
> > query faster .
> >
> > These are some bits from my analysis:
> >
> > Each Thread contain 10 query ( Random)
> >
> > Tread        H2(Msec)  Phoenix(Msec)
> >   1            34             215
> >   2            63             222
> >   4            120           324
> >   6            200          340
> >   8           250           460
> >   10         350          560
> >   12          410         592
> >
> > I have to find some points in the graph ploted with these values where
> > lines are intercepting .
> > So I need hbase to perform well with multythreaded condition .
> >
> >
> > Best Regards,
> > Job M Thomas
> >
> >
>
>
>

RE: HBase: Paralel Query

Posted by Job Thomas <jo...@suntecgroup.com>.

 
Hi Ted,All
 
I have set 
 
hfile.block.cache.size to 0.6
hbase.regionserver.handler.count to 60
DATA_BLOCK _ENCODING => 'FAST_DIFF' 
 BLOOMFILTER => 'ROW'
 BLOCKSIZE => '8192'
 BLOCKCACHE => 'true'
 
The performance has been increased. 
 
But after creating another table with same size and configurations , the performance of previous table has been reduced and I am getting good performance for the new table created.
 
I have seen that whle querying out of maxHeapMB=15983 Hbase using only  usedHeapMB=72.
why hbase not utilizing heap space even though I have set BLOCKSIZE => '8192' ( For to store more number of indexes in memory ).
 
I have read that once block size of hfile has been reduce, the sequential access speed will decrease . but I didn't experienced this  even though my   BLOCKSIZE is 192' 
 
Best Regards,
Job M Thomas

________________________________

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Wed 11/27/2013 11:48 AM
To: user@hbase.apache.org
Subject: Re: HBase: Paralel Query



bq. I didn't enabled blockcache

What if you enable blockcache ?

Cheers


On Tue, Nov 26, 2013 at 8:45 PM, Job Thomas <jo...@suntecgroup.com> wrote:

> Hello lars,
>
> Here re the answers ,
>
> -> I have only one region server. ( I am testing Hbase via phoenix with
> Hbase in a single server).
> -> All queries are fired through Phoenix only.( select Lastname from
> tablename where Id=? ( Here Id is the primary key))
> -> hbase.regionserver.handler.count=30(default value).
> -> Hardware:   Core =8
>                      Ram =8 Gb
> -> I didn't enabled blockcache.
> -> Are the client in multiple threads in the process or multiple
> processes? - I am not clear
>
>
> Best Regards,
> Job M Thomas
>
> ________________________________
>
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Tue 11/26/2013 11:16 PM
> To: user@hbase.apache.org
> Subject: Re: HBase: Paralel Query
>
>
>
> Hi Job,
>
> first off some questions :)
> How many regions are you accessing?
> What type of query is this (get or scan)?
> How many handlers have you configured?
> What does you hardware look like (how many cores, etc)?
> Is the data all in the blockcache?
> If not, what does the disk IO look like?
> Are the client in multiple threads in the process or multiple processes?
>
>
> Sorry for all the questions, but we need a bit more data.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Job Thomas <jo...@suntecgroup.com>
> To: user@hbase.apache.org
> Sent: Tuesday, November 26, 2013 12:26 AM
> Subject: HBase: Paralel Query
>
>
>
>
> Hi All,
>
> How can we configure Hbase  inorder to perform multythreading/parallel
> query faster .
>
> These are some bits from my analysis:
>
> Each Thread contain 10 query ( Random)
>
> Tread        H2(Msec)  Phoenix(Msec)
>   1            34             215
>   2            63             222
>   4            120           324
>   6            200          340
>   8           250           460
>   10         350          560
>   12          410         592
>
> I have to find some points in the graph ploted with these values where
> lines are intercepting .
> So I need hbase to perform well with multythreaded condition .
>
>
> Best Regards,
> Job M Thomas
>
>

Re: HBase: Paralel Query

Posted by Ted Yu <yu...@gmail.com>.

bq. I didn't enabled blockcache

What if you enable blockcache ?

Cheers


On Tue, Nov 26, 2013 at 8:45 PM, Job Thomas <jo...@suntecgroup.com> wrote:

> Hello lars,
>
> Here re the answers ,
>
> -> I have only one region server. ( I am testing Hbase via phoenix with
> Hbase in a single server).
> -> All queries are fired through Phoenix only.( select Lastname from
> tablename where Id=? ( Here Id is the primary key))
> -> hbase.regionserver.handler.count=30(default value).
> -> Hardware:   Core =8
>                      Ram =8 Gb
> -> I didn't enabled blockcache.
> -> Are the client in multiple threads in the process or multiple
> processes? - I am not clear
>
>
> Best Regards,
> Job M Thomas
>
> ________________________________
>
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Tue 11/26/2013 11:16 PM
> To: user@hbase.apache.org
> Subject: Re: HBase: Paralel Query
>
>
>
> Hi Job,
>
> first off some questions :)
> How many regions are you accessing?
> What type of query is this (get or scan)?
> How many handlers have you configured?
> What does you hardware look like (how many cores, etc)?
> Is the data all in the blockcache?
> If not, what does the disk IO look like?
> Are the client in multiple threads in the process or multiple processes?
>
>
> Sorry for all the questions, but we need a bit more data.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Job Thomas <jo...@suntecgroup.com>
> To: user@hbase.apache.org
> Sent: Tuesday, November 26, 2013 12:26 AM
> Subject: HBase: Paralel Query
>
>
>
>
> Hi All,
>
> How can we configure Hbase  inorder to perform multythreading/parallel
> query faster .
>
> These are some bits from my analysis:
>
> Each Thread contain 10 query ( Random)
>
> Tread        H2(Msec)  Phoenix(Msec)
>   1            34             215
>   2            63             222
>   4            120           324
>   6            200          340
>   8           250           460
>   10         350          560
>   12          410         592
>
> I have to find some points in the graph ploted with these values where
> lines are intercepting .
> So I need hbase to perform well with multythreaded condition .
>
>
> Best Regards,
> Job M Thomas
>
>

RE: HBase: Paralel Query

Posted by Job Thomas <jo...@suntecgroup.com>.

Hello lars,
 
Here re the answers ,
 
-> I have only one region server. ( I am testing Hbase via phoenix with Hbase in a single server).
-> All queries are fired through Phoenix only.( select Lastname from tablename where Id=? ( Here Id is the primary key))
-> hbase.regionserver.handler.count=30(default value).
-> Hardware:   Core =8
                     Ram =8 Gb
-> I didn't enabled blockcache.
-> Are the client in multiple threads in the process or multiple processes? - I am not clear 
 
 
Best Regards,
Job M Thomas

________________________________

From: lars hofhansl [mailto:larsh@apache.org]
Sent: Tue 11/26/2013 11:16 PM
To: user@hbase.apache.org
Subject: Re: HBase: Paralel Query 



Hi Job,

first off some questions :)
How many regions are you accessing?
What type of query is this (get or scan)?
How many handlers have you configured?
What does you hardware look like (how many cores, etc)?
Is the data all in the blockcache?
If not, what does the disk IO look like?
Are the client in multiple threads in the process or multiple processes?


Sorry for all the questions, but we need a bit more data.


-- Lars



________________________________
 From: Job Thomas <jo...@suntecgroup.com>
To: user@hbase.apache.org
Sent: Tuesday, November 26, 2013 12:26 AM
Subject: HBase: Paralel Query




Hi All,

How can we configure Hbase  inorder to perform multythreading/parallel query faster .

These are some bits from my analysis:
 
Each Thread contain 10 query ( Random)

Tread        H2(Msec)  Phoenix(Msec)
  1            34             215
  2            63             222
  4            120           324
  6            200          340
  8           250           460
  10         350          560
  12          410         592

I have to find some points in the graph ploted with these values where lines are intercepting .
So I need hbase to perform well with multythreaded condition .


Best Regards,
Job M Thomas

Re: HBase: Paralel Query

Posted by lars hofhansl <la...@apache.org>.

Hi Job,

first off some questions :)
How many regions are you accessing?
What type of query is this (get or scan)?
How many handlers have you configured?
What does you hardware look like (how many cores, etc)?
Is the data all in the blockcache?
If not, what does the disk IO look like?
Are the client in multiple threads in the process or multiple processes?


Sorry for all the questions, but we need a bit more data.


-- Lars



________________________________
 From: Job Thomas <jo...@suntecgroup.com>
To: user@hbase.apache.org 
Sent: Tuesday, November 26, 2013 12:26 AM
Subject: HBase: Paralel Query 
 



Hi All,

How can we configure Hbase  inorder to perform multythreading/parallel query faster .

These are some bits from my analysis:
  
Each Thread contain 10 query ( Random) 

Tread        H2(Msec)  Phoenix(Msec)
  1            34             215
  2            63             222
  4            120           324
  6            200          340
  8           250           460
  10         350          560
  12          410         592

I have to find some points in the graph ploted with these values where lines are intercepting .
So I need hbase to perform well with multythreaded condition .


Best Regards,
Job M Thomas