You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jarod Feng <ja...@gmail.com> on 2008/11/07 09:20:28 UTC

low performance on hadoop

Hi ,
I'm using hbase0.2.1 + hadoop 0.17 with 11 client & 1master servers
I crawl some data and save them to hbase with map & reduce.
after that , I try to count the total number of some data.
it too slow , you can see the monitor , 1 of clients have a high request ,
but others is 0.

S1:60020 1226025034218 requests: 0 regions: 5
S2:60020 1226025034364 requests: 0 regions: 5
S3:60020 1226025033874 requests: 0 regions: 5
S4:60020 1226025035074 requests: 5085 regions: 4
S5:60020 1226025034712 requests: 0 regions: 5
S6:60020 1226025034716 requests: 0 regions: 5
S7:60020 1226025034280 requests: 0 regions: 4
S8:60020 1226025034130 requests: 0 regions: 5
S9:60020 1226025033726 requests: 0 regions: 5
S10:60020 1226025034539 requests: 0 regions: 5
S11:60020 1226025034528 requests: 0 regions: 4

I have use hadoop for many month , but this is my first time to develop
based hbase


Thanks,
-- 
View this message in context: http://www.nabble.com/low-performance-on-hadoop-tp20376261p20376261.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: low performance on hadoop

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Jarod,

Yes HBase 0.18 only works with Hadoop 0.18, you should upgrade your HDFS if
you want to try it.

Regards your scanning issue, what you call a "client" is in fact called a
Region Server. Because a scan is sequential and the rows in HBase are sorted
through out the cluster, only 1 region server at a time will receive the
hits. If you want to do it faster, try using MapReduce.

J-D

On Sun, Nov 9, 2008 at 8:25 PM, Jarod Feng <ja...@gmail.com> wrote:

>
> Hi Jean
>
> there is not 1 table , but 1 of them is a huge table.
>
> I use scanner to get all result and use calculate the total hits number.
>
> I use hadoop 0.1.7 , hbase 0.18 I have tried , is doesn't work with Hadoop
> 0.1.7
>
>
> Jean-Daniel Cryans-2 wrote:
> >
> > Jarod,
> >
> > Some informations are missing. Is it only 1 table you have? How did you
> > try
> > to count the number of rows the first time? If you are using only 1
> > client,
> > scanning is sequential so it's normal to have only 1 region server taking
> > all the hits at a time. Maybe try using the RowCounter mapreduce job
> > provided with HBase?
> >
> > Also, try to upgrade to HBase 0.18.1 because Hadoop 0.18 is a bit faster.
> >
> > J-D
> >
> > On Fri, Nov 7, 2008 at 3:21 AM, Jarod Feng <ja...@gmail.com> wrote:
> >
> >>
> >> Hi ,
> >> I'm using hbase0.2.1 + hadoop 0.17 with 11 client & 1master servers
> >> I crawl some data and save them to hbase with map & reduce.
> >> after that , I try to count the total number of some data.
> >> it too slow , you can see the monitor , 1 of clients have a high request
> >> ,
> >> but others is 0.
> >>
> >> S1:60020 1226025034218 requests: 0 regions: 5
> >> S2:60020 1226025034364 requests: 0 regions: 5
> >> S3:60020 1226025033874 requests: 0 regions: 5
> >> S4:60020 1226025035074 requests: 5085 regions: 4
> >> S5:60020 1226025034712 requests: 0 regions: 5
> >> S6:60020 1226025034716 requests: 0 regions: 5
> >> S7:60020 1226025034280 requests: 0 regions: 4
> >> S8:60020 1226025034130 requests: 0 regions: 5
> >> S9:60020 1226025033726 requests: 0 regions: 5
> >> S10:60020 1226025034539 requests: 0 regions: 5
> >> S11:60020 1226025034528 requests: 0 regions: 4
> >>
> >> I don't know the reason , and when I use shell for count or use single
> >> process without map-reduce , it's faster
> >>
> >> I have use hadoop for many month , but this is my first time to develop
> >> based hbase
> >>
> >>
> >> Thanks,
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/low-performance-on-hadoop-tp20376261p20376261.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/low-performance-on-hadoop-tp20376261p20413091.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: low performance on hadoop

Posted by Jarod Feng <ja...@gmail.com>.
Hi Jean

there is not 1 table , but 1 of them is a huge table.

I use scanner to get all result and use calculate the total hits number.

I use hadoop 0.1.7 , hbase 0.18 I have tried , is doesn't work with Hadoop
0.1.7


Jean-Daniel Cryans-2 wrote:
> 
> Jarod,
> 
> Some informations are missing. Is it only 1 table you have? How did you
> try
> to count the number of rows the first time? If you are using only 1
> client,
> scanning is sequential so it's normal to have only 1 region server taking
> all the hits at a time. Maybe try using the RowCounter mapreduce job
> provided with HBase?
> 
> Also, try to upgrade to HBase 0.18.1 because Hadoop 0.18 is a bit faster.
> 
> J-D
> 
> On Fri, Nov 7, 2008 at 3:21 AM, Jarod Feng <ja...@gmail.com> wrote:
> 
>>
>> Hi ,
>> I'm using hbase0.2.1 + hadoop 0.17 with 11 client & 1master servers
>> I crawl some data and save them to hbase with map & reduce.
>> after that , I try to count the total number of some data.
>> it too slow , you can see the monitor , 1 of clients have a high request
>> ,
>> but others is 0.
>>
>> S1:60020 1226025034218 requests: 0 regions: 5
>> S2:60020 1226025034364 requests: 0 regions: 5
>> S3:60020 1226025033874 requests: 0 regions: 5
>> S4:60020 1226025035074 requests: 5085 regions: 4
>> S5:60020 1226025034712 requests: 0 regions: 5
>> S6:60020 1226025034716 requests: 0 regions: 5
>> S7:60020 1226025034280 requests: 0 regions: 4
>> S8:60020 1226025034130 requests: 0 regions: 5
>> S9:60020 1226025033726 requests: 0 regions: 5
>> S10:60020 1226025034539 requests: 0 regions: 5
>> S11:60020 1226025034528 requests: 0 regions: 4
>>
>> I don't know the reason , and when I use shell for count or use single
>> process without map-reduce , it's faster
>>
>> I have use hadoop for many month , but this is my first time to develop
>> based hbase
>>
>>
>> Thanks,
>> --
>> View this message in context:
>> http://www.nabble.com/low-performance-on-hadoop-tp20376261p20376261.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/low-performance-on-hadoop-tp20376261p20413091.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: low performance on hadoop

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Jarod,

Some informations are missing. Is it only 1 table you have? How did you try
to count the number of rows the first time? If you are using only 1 client,
scanning is sequential so it's normal to have only 1 region server taking
all the hits at a time. Maybe try using the RowCounter mapreduce job
provided with HBase?

Also, try to upgrade to HBase 0.18.1 because Hadoop 0.18 is a bit faster.

J-D

On Fri, Nov 7, 2008 at 3:21 AM, Jarod Feng <ja...@gmail.com> wrote:

>
> Hi ,
> I'm using hbase0.2.1 + hadoop 0.17 with 11 client & 1master servers
> I crawl some data and save them to hbase with map & reduce.
> after that , I try to count the total number of some data.
> it too slow , you can see the monitor , 1 of clients have a high request ,
> but others is 0.
>
> S1:60020 1226025034218 requests: 0 regions: 5
> S2:60020 1226025034364 requests: 0 regions: 5
> S3:60020 1226025033874 requests: 0 regions: 5
> S4:60020 1226025035074 requests: 5085 regions: 4
> S5:60020 1226025034712 requests: 0 regions: 5
> S6:60020 1226025034716 requests: 0 regions: 5
> S7:60020 1226025034280 requests: 0 regions: 4
> S8:60020 1226025034130 requests: 0 regions: 5
> S9:60020 1226025033726 requests: 0 regions: 5
> S10:60020 1226025034539 requests: 0 regions: 5
> S11:60020 1226025034528 requests: 0 regions: 4
>
> I don't know the reason , and when I use shell for count or use single
> process without map-reduce , it's faster
>
> I have use hadoop for many month , but this is my first time to develop
> based hbase
>
>
> Thanks,
> --
> View this message in context:
> http://www.nabble.com/low-performance-on-hadoop-tp20376261p20376261.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: low performance on hadoop

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
I just added a patch to HBASE-987
https://issues.apache.org/jira/browse/HBASE-987

it has a Partitioner in to to group the records in to a reducer per region
you can set it as the partitioner in the job I am not sure if it will work 
for 0.17

But you can give it a try the file in the patch you are looking for is 
HRegionPartitioner.java

Billy



"Jarod Feng" <ja...@gmail.com> wrote in 
message news:20376261.post@talk.nabble.com...
>
> Hi ,
> I'm using hbase0.2.1 + hadoop 0.17 with 11 client & 1master servers
> I crawl some data and save them to hbase with map & reduce.
> after that , I try to count the total number of some data.
> it too slow , you can see the monitor , 1 of clients have a high request ,
> but others is 0.
>
> S1:60020 1226025034218 requests: 0 regions: 5
> S2:60020 1226025034364 requests: 0 regions: 5
> S3:60020 1226025033874 requests: 0 regions: 5
> S4:60020 1226025035074 requests: 5085 regions: 4
> S5:60020 1226025034712 requests: 0 regions: 5
> S6:60020 1226025034716 requests: 0 regions: 5
> S7:60020 1226025034280 requests: 0 regions: 4
> S8:60020 1226025034130 requests: 0 regions: 5
> S9:60020 1226025033726 requests: 0 regions: 5
> S10:60020 1226025034539 requests: 0 regions: 5
> S11:60020 1226025034528 requests: 0 regions: 4
>
> I have use hadoop for many month , but this is my first time to develop
> based hbase
>
>
> Thanks,
> -- 
> View this message in context: 
> http://www.nabble.com/low-performance-on-hadoop-tp20376261p20376261.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>