You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Job Thomas <jo...@suntecgroup.com> on 2013/11/26 09:26:49 UTC
HBase: Paralel Query
Hi All,
How can we configure Hbase inorder to perform multythreading/parallel query faster .
These are some bits from my analysis:
Each Thread contain 10 query ( Random)
Tread H2(Msec) Phoenix(Msec)
1 34 215
2 63 222
4 120 324
6 200 340
8 250 460
10 350 560
12 410 592
I have to find some points in the graph ploted with these values where lines are intercepting .
So I need hbase to perform well with multythreaded condition .
Best Regards,
Job M Thomas
Re: HBase: Paralel Query
Posted by Asaf Mesika <as...@gmail.com>.
The need is too broad. You need to do through the HBase jmx metrics, ands
machine metrics to see what is your bottleneck.
On Tuesday, November 26, 2013, Job Thomas wrote:
>
>
> Hi All,
>
> How can we configure Hbase inorder to perform multythreading/parallel
> query faster .
>
> These are some bits from my analysis:
>
> Each Thread contain 10 query ( Random)
>
> Tread H2(Msec) Phoenix(Msec)
> 1 34 215
> 2 63 222
> 4 120 324
> 6 200 340
> 8 250 460
> 10 350 560
> 12 410 592
>
> I have to find some points in the graph ploted with these values where
> lines are intercepting .
> So I need hbase to perform well with multythreaded condition .
>
>
> Best Regards,
> Job M Thomas
>
RE: HBase: Paralel Query
Posted by Job Thomas <jo...@suntecgroup.com>.
Here is the describtion of two tables created :
************************************************************************************************************************************************************************
FIRST TABLE
************************************************************************************************************************************************************************
hbase(main):013:0> describe 'TEST5MILLION8KB'
DESCRIPTION ENABLED
'TEST5MILLION8KB', {METHOD => 'table_att', coprocessor$1 => '|com.salesforce.phoenix.coprocessor.Scan true
RegionObserver|1|', coprocessor$2 => '|com.salesforce.phoenix.coprocessor.UngroupedAggregateRegionObs
erver|1|', coprocessor$3 => '|com.salesforce.phoenix.coprocessor.GroupedAggregateRegionObserver|1|',
coprocessor$4 => '|com.salesforce.phoenix.join.HashJoiningRegionObserver|1|', coprocessor$5 => '|com.
salesforce.phoenix.coprocessor.ServerCachingEndpointImpl|1|', coprocessor$6 => '|com.salesforce.hbase
.index.Indexer|1073741823|com.salesforce.hbase.index.codec.class=com.salesforce.phoenix.index.Phoenix
IndexCodec,index.builder=com.salesforce.phoenix.index.PhoenixIndexBuilder'}, {NAME => 'M', DATA_BLOCK
_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSIO
N => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'true', BLOCKSIZE => '81
92', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}
************************************************************************************************************************************************************************
After first table created , I got very good performance .
Then created second table
************************************************************************************************************************************************************************
hbase(main):014:0> describe 'TEST5MILLION8KB2'
DESCRIPTION ENABLED
'TEST5MILLION8KB2', {METHOD => 'table_att', coprocessor$1 => '|com.salesforce.phoenix.coprocessor.Sca true
nRegionObserver|1|', coprocessor$2 => '|com.salesforce.phoenix.coprocessor.UngroupedAggregateRegionOb
server|1|', coprocessor$3 => '|com.salesforce.phoenix.coprocessor.GroupedAggregateRegionObserver|1|',
coprocessor$4 => '|com.salesforce.phoenix.join.HashJoiningRegionObserver|1|', coprocessor$5 => '|com
.salesforce.phoenix.coprocessor.ServerCachingEndpointImpl|1|', coprocessor$6 => '|com.salesforce.hbas
e.index.Indexer|1073741823|com.salesforce.hbase.index.codec.class=com.salesforce.phoenix.index.Phoeni
xIndexCodec,index.builder=com.salesforce.phoenix.index.PhoenixIndexBuilder'}, {NAME => 'M', DATA_BLOC
K_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSI
ON => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'true', BLOCKSIZE => '8
192', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}
1 row(s) in 0.0600 seconds
************************************************************************************************************************************************************************
the performance of second table has been decreased dramaticaly and that of second table is very good.
************************************************************************************************************************************************************************
Please give your suggestions.
With Thanks,
Best Regards,
Job M Thomas
________________________________
From: Job Thomas [mailto:jobt@suntecgroup.com]
Sent: Wed 11/27/2013 1:52 PM
To: user@hbase.apache.org; user@hbase.apache.org
Subject: RE: HBase: Paralel Query
Hi Ted,All
I have set
hfile.block.cache.size to 0.6
hbase.regionserver.handler.count to 60
DATA_BLOCK _ENCODING => 'FAST_DIFF'
BLOOMFILTER => 'ROW'
BLOCKSIZE => '8192'
BLOCKCACHE => 'true'
The performance has been increased.
But after creating another table with same size and configurations , the performance of previous table has been reduced and I am getting good performance for the new table created.
I have seen that whle querying out of maxHeapMB=15983 Hbase using only usedHeapMB=72.
why hbase not utilizing heap space even though I have set BLOCKSIZE => '8192' ( For to store more number of indexes in memory ).
I have read that once block size of hfile has been reduce, the sequential access speed will decrease . but I didn't experienced this even though my BLOCKSIZE is 192'
Best Regards,
Job M Thomas
________________________________
From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Wed 11/27/2013 11:48 AM
To: user@hbase.apache.org
Subject: Re: HBase: Paralel Query
bq. I didn't enabled blockcache
What if you enable blockcache ?
Cheers
On Tue, Nov 26, 2013 at 8:45 PM, Job Thomas <jo...@suntecgroup.com> wrote:
> Hello lars,
>
> Here re the answers ,
>
> -> I have only one region server. ( I am testing Hbase via phoenix with
> Hbase in a single server).
> -> All queries are fired through Phoenix only.( select Lastname from
> tablename where Id=? ( Here Id is the primary key))
> -> hbase.regionserver.handler.count=30(default value).
> -> Hardware: Core =8
> Ram =8 Gb
> -> I didn't enabled blockcache.
> -> Are the client in multiple threads in the process or multiple
> processes? - I am not clear
>
>
> Best Regards,
> Job M Thomas
>
> ________________________________
>
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Tue 11/26/2013 11:16 PM
> To: user@hbase.apache.org
> Subject: Re: HBase: Paralel Query
>
>
>
> Hi Job,
>
> first off some questions :)
> How many regions are you accessing?
> What type of query is this (get or scan)?
> How many handlers have you configured?
> What does you hardware look like (how many cores, etc)?
> Is the data all in the blockcache?
> If not, what does the disk IO look like?
> Are the client in multiple threads in the process or multiple processes?
>
>
> Sorry for all the questions, but we need a bit more data.
>
>
> -- Lars
>
>
>
> ________________________________
> From: Job Thomas <jo...@suntecgroup.com>
> To: user@hbase.apache.org
> Sent: Tuesday, November 26, 2013 12:26 AM
> Subject: HBase: Paralel Query
>
>
>
>
> Hi All,
>
> How can we configure Hbase inorder to perform multythreading/parallel
> query faster .
>
> These are some bits from my analysis:
>
> Each Thread contain 10 query ( Random)
>
> Tread H2(Msec) Phoenix(Msec)
> 1 34 215
> 2 63 222
> 4 120 324
> 6 200 340
> 8 250 460
> 10 350 560
> 12 410 592
>
> I have to find some points in the graph ploted with these values where
> lines are intercepting .
> So I need hbase to perform well with multythreaded condition .
>
>
> Best Regards,
> Job M Thomas
>
>
Re: HBase: Paralel Query
Posted by Ted Yu <yu...@gmail.com>.
bq. out of maxHeapMB=15983
In previous email you said RAM is 8GB. Above figure is larger than 8GB.
There're 6 coprocessors installed on each table.
I wonder if what you observed was related to HBASE-10047.
Cheers
On Wed, Nov 27, 2013 at 12:22 AM, Job Thomas <jo...@suntecgroup.com> wrote:
>
> Hi Ted,All
>
> I have set
>
> hfile.block.cache.size to 0.6
> hbase.regionserver.handler.count to 60
> DATA_BLOCK _ENCODING => 'FAST_DIFF'
> BLOOMFILTER => 'ROW'
> BLOCKSIZE => '8192'
> BLOCKCACHE => 'true'
>
> The performance has been increased.
>
> But after creating another table with same size and configurations , the
> performance of previous table has been reduced and I am getting good
> performance for the new table created.
>
> I have seen that whle querying out of maxHeapMB=15983 Hbase using only
> usedHeapMB=72.
> why hbase not utilizing heap space even though I have set BLOCKSIZE =>
> '8192' ( For to store more number of indexes in memory ).
>
> I have read that once block size of hfile has been reduce, the sequential
> access speed will decrease . but I didn't experienced this even though my
> BLOCKSIZE is 192'
>
> Best Regards,
> Job M Thomas
>
> ________________________________
>
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Wed 11/27/2013 11:48 AM
> To: user@hbase.apache.org
> Subject: Re: HBase: Paralel Query
>
>
>
> bq. I didn't enabled blockcache
>
> What if you enable blockcache ?
>
> Cheers
>
>
> On Tue, Nov 26, 2013 at 8:45 PM, Job Thomas <jo...@suntecgroup.com> wrote:
>
> > Hello lars,
> >
> > Here re the answers ,
> >
> > -> I have only one region server. ( I am testing Hbase via phoenix with
> > Hbase in a single server).
> > -> All queries are fired through Phoenix only.( select Lastname from
> > tablename where Id=? ( Here Id is the primary key))
> > -> hbase.regionserver.handler.count=30(default value).
> > -> Hardware: Core =8
> > Ram =8 Gb
> > -> I didn't enabled blockcache.
> > -> Are the client in multiple threads in the process or multiple
> > processes? - I am not clear
> >
> >
> > Best Regards,
> > Job M Thomas
> >
> > ________________________________
> >
> > From: lars hofhansl [mailto:larsh@apache.org]
> > Sent: Tue 11/26/2013 11:16 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase: Paralel Query
> >
> >
> >
> > Hi Job,
> >
> > first off some questions :)
> > How many regions are you accessing?
> > What type of query is this (get or scan)?
> > How many handlers have you configured?
> > What does you hardware look like (how many cores, etc)?
> > Is the data all in the blockcache?
> > If not, what does the disk IO look like?
> > Are the client in multiple threads in the process or multiple processes?
> >
> >
> > Sorry for all the questions, but we need a bit more data.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> > From: Job Thomas <jo...@suntecgroup.com>
> > To: user@hbase.apache.org
> > Sent: Tuesday, November 26, 2013 12:26 AM
> > Subject: HBase: Paralel Query
> >
> >
> >
> >
> > Hi All,
> >
> > How can we configure Hbase inorder to perform multythreading/parallel
> > query faster .
> >
> > These are some bits from my analysis:
> >
> > Each Thread contain 10 query ( Random)
> >
> > Tread H2(Msec) Phoenix(Msec)
> > 1 34 215
> > 2 63 222
> > 4 120 324
> > 6 200 340
> > 8 250 460
> > 10 350 560
> > 12 410 592
> >
> > I have to find some points in the graph ploted with these values where
> > lines are intercepting .
> > So I need hbase to perform well with multythreaded condition .
> >
> >
> > Best Regards,
> > Job M Thomas
> >
> >
>
>
>
RE: HBase: Paralel Query
Posted by Job Thomas <jo...@suntecgroup.com>.
Hi Ted,All
I have set
hfile.block.cache.size to 0.6
hbase.regionserver.handler.count to 60
DATA_BLOCK _ENCODING => 'FAST_DIFF'
BLOOMFILTER => 'ROW'
BLOCKSIZE => '8192'
BLOCKCACHE => 'true'
The performance has been increased.
But after creating another table with same size and configurations , the performance of previous table has been reduced and I am getting good performance for the new table created.
I have seen that whle querying out of maxHeapMB=15983 Hbase using only usedHeapMB=72.
why hbase not utilizing heap space even though I have set BLOCKSIZE => '8192' ( For to store more number of indexes in memory ).
I have read that once block size of hfile has been reduce, the sequential access speed will decrease . but I didn't experienced this even though my BLOCKSIZE is 192'
Best Regards,
Job M Thomas
________________________________
From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Wed 11/27/2013 11:48 AM
To: user@hbase.apache.org
Subject: Re: HBase: Paralel Query
bq. I didn't enabled blockcache
What if you enable blockcache ?
Cheers
On Tue, Nov 26, 2013 at 8:45 PM, Job Thomas <jo...@suntecgroup.com> wrote:
> Hello lars,
>
> Here re the answers ,
>
> -> I have only one region server. ( I am testing Hbase via phoenix with
> Hbase in a single server).
> -> All queries are fired through Phoenix only.( select Lastname from
> tablename where Id=? ( Here Id is the primary key))
> -> hbase.regionserver.handler.count=30(default value).
> -> Hardware: Core =8
> Ram =8 Gb
> -> I didn't enabled blockcache.
> -> Are the client in multiple threads in the process or multiple
> processes? - I am not clear
>
>
> Best Regards,
> Job M Thomas
>
> ________________________________
>
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Tue 11/26/2013 11:16 PM
> To: user@hbase.apache.org
> Subject: Re: HBase: Paralel Query
>
>
>
> Hi Job,
>
> first off some questions :)
> How many regions are you accessing?
> What type of query is this (get or scan)?
> How many handlers have you configured?
> What does you hardware look like (how many cores, etc)?
> Is the data all in the blockcache?
> If not, what does the disk IO look like?
> Are the client in multiple threads in the process or multiple processes?
>
>
> Sorry for all the questions, but we need a bit more data.
>
>
> -- Lars
>
>
>
> ________________________________
> From: Job Thomas <jo...@suntecgroup.com>
> To: user@hbase.apache.org
> Sent: Tuesday, November 26, 2013 12:26 AM
> Subject: HBase: Paralel Query
>
>
>
>
> Hi All,
>
> How can we configure Hbase inorder to perform multythreading/parallel
> query faster .
>
> These are some bits from my analysis:
>
> Each Thread contain 10 query ( Random)
>
> Tread H2(Msec) Phoenix(Msec)
> 1 34 215
> 2 63 222
> 4 120 324
> 6 200 340
> 8 250 460
> 10 350 560
> 12 410 592
>
> I have to find some points in the graph ploted with these values where
> lines are intercepting .
> So I need hbase to perform well with multythreaded condition .
>
>
> Best Regards,
> Job M Thomas
>
>
Re: HBase: Paralel Query
Posted by Ted Yu <yu...@gmail.com>.
bq. I didn't enabled blockcache
What if you enable blockcache ?
Cheers
On Tue, Nov 26, 2013 at 8:45 PM, Job Thomas <jo...@suntecgroup.com> wrote:
> Hello lars,
>
> Here re the answers ,
>
> -> I have only one region server. ( I am testing Hbase via phoenix with
> Hbase in a single server).
> -> All queries are fired through Phoenix only.( select Lastname from
> tablename where Id=? ( Here Id is the primary key))
> -> hbase.regionserver.handler.count=30(default value).
> -> Hardware: Core =8
> Ram =8 Gb
> -> I didn't enabled blockcache.
> -> Are the client in multiple threads in the process or multiple
> processes? - I am not clear
>
>
> Best Regards,
> Job M Thomas
>
> ________________________________
>
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Tue 11/26/2013 11:16 PM
> To: user@hbase.apache.org
> Subject: Re: HBase: Paralel Query
>
>
>
> Hi Job,
>
> first off some questions :)
> How many regions are you accessing?
> What type of query is this (get or scan)?
> How many handlers have you configured?
> What does you hardware look like (how many cores, etc)?
> Is the data all in the blockcache?
> If not, what does the disk IO look like?
> Are the client in multiple threads in the process or multiple processes?
>
>
> Sorry for all the questions, but we need a bit more data.
>
>
> -- Lars
>
>
>
> ________________________________
> From: Job Thomas <jo...@suntecgroup.com>
> To: user@hbase.apache.org
> Sent: Tuesday, November 26, 2013 12:26 AM
> Subject: HBase: Paralel Query
>
>
>
>
> Hi All,
>
> How can we configure Hbase inorder to perform multythreading/parallel
> query faster .
>
> These are some bits from my analysis:
>
> Each Thread contain 10 query ( Random)
>
> Tread H2(Msec) Phoenix(Msec)
> 1 34 215
> 2 63 222
> 4 120 324
> 6 200 340
> 8 250 460
> 10 350 560
> 12 410 592
>
> I have to find some points in the graph ploted with these values where
> lines are intercepting .
> So I need hbase to perform well with multythreaded condition .
>
>
> Best Regards,
> Job M Thomas
>
>
RE: HBase: Paralel Query
Posted by Job Thomas <jo...@suntecgroup.com>.
Hello lars,
Here re the answers ,
-> I have only one region server. ( I am testing Hbase via phoenix with Hbase in a single server).
-> All queries are fired through Phoenix only.( select Lastname from tablename where Id=? ( Here Id is the primary key))
-> hbase.regionserver.handler.count=30(default value).
-> Hardware: Core =8
Ram =8 Gb
-> I didn't enabled blockcache.
-> Are the client in multiple threads in the process or multiple processes? - I am not clear
Best Regards,
Job M Thomas
________________________________
From: lars hofhansl [mailto:larsh@apache.org]
Sent: Tue 11/26/2013 11:16 PM
To: user@hbase.apache.org
Subject: Re: HBase: Paralel Query
Hi Job,
first off some questions :)
How many regions are you accessing?
What type of query is this (get or scan)?
How many handlers have you configured?
What does you hardware look like (how many cores, etc)?
Is the data all in the blockcache?
If not, what does the disk IO look like?
Are the client in multiple threads in the process or multiple processes?
Sorry for all the questions, but we need a bit more data.
-- Lars
________________________________
From: Job Thomas <jo...@suntecgroup.com>
To: user@hbase.apache.org
Sent: Tuesday, November 26, 2013 12:26 AM
Subject: HBase: Paralel Query
Hi All,
How can we configure Hbase inorder to perform multythreading/parallel query faster .
These are some bits from my analysis:
Each Thread contain 10 query ( Random)
Tread H2(Msec) Phoenix(Msec)
1 34 215
2 63 222
4 120 324
6 200 340
8 250 460
10 350 560
12 410 592
I have to find some points in the graph ploted with these values where lines are intercepting .
So I need hbase to perform well with multythreaded condition .
Best Regards,
Job M Thomas
Re: HBase: Paralel Query
Posted by lars hofhansl <la...@apache.org>.
Hi Job,
first off some questions :)
How many regions are you accessing?
What type of query is this (get or scan)?
How many handlers have you configured?
What does you hardware look like (how many cores, etc)?
Is the data all in the blockcache?
If not, what does the disk IO look like?
Are the client in multiple threads in the process or multiple processes?
Sorry for all the questions, but we need a bit more data.
-- Lars
________________________________
From: Job Thomas <jo...@suntecgroup.com>
To: user@hbase.apache.org
Sent: Tuesday, November 26, 2013 12:26 AM
Subject: HBase: Paralel Query
Hi All,
How can we configure Hbase inorder to perform multythreading/parallel query faster .
These are some bits from my analysis:
Each Thread contain 10 query ( Random)
Tread H2(Msec) Phoenix(Msec)
1 34 215
2 63 222
4 120 324
6 200 340
8 250 460
10 350 560
12 410 592
I have to find some points in the graph ploted with these values where lines are intercepting .
So I need hbase to perform well with multythreaded condition .
Best Regards,
Job M Thomas