You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by schubert zhang <zs...@gmail.com> on 2009/03/19 06:54:16 UTC

Performance become slower and slower during inserting

I am testing the performance of HBase, after about one weeks's test.
I found the HBase become more and more slow when inserting data.

(3 regionserver, HBase 0.19.1 and hadoop 0.19.2)

Each row have about 32 column (in one family), the row have about 400 bytes
raw data.

For example:
1. when there are only 10-32 regions, the inserting time about 550000 rows
is about 3-4 minutes.
2. when there are about 64 regions, the inserting time about 550000 rows is
about 6-10 minutes.
3. and then more than 10 minutes.
.....

Schubert

Re: Performance become slower and slower during inserting

Posted by schubert zhang <zs...@gmail.com>.
sorry, the previous meminfo is on Master, following is the meminfo on one of
regionservers.
[schubert@nd1 logs]$ cat /proc/meminfo
MemTotal:      4043484 kB
MemFree:         26008 kB
Buffers:          7844 kB
Cached:         210060 kB
SwapCached:     859024 kB
Active:        3756908 kB
Inactive:       179896 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      4043484 kB
LowFree:         26008 kB
SwapTotal:     2097144 kB
SwapFree:      1059584 kB
Dirty:           23260 kB
Writeback:         136 kB
AnonPages:     3714684 kB
Mapped:          13984 kB
Slab:            46324 kB
PageTables:      13020 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4118884 kB
Committed_AS:  4311660 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    263504 kB
VmallocChunk: 34359474679 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB
[schubert@nd1 logs]$ free
             total       used       free     shared    buffers     cached
Mem:       4043484    4017832      25652          0       7832     209508
-/+ buffers/cache:    3800492     242992
Swap:      2097144    1037560    1059584


On Fri, Mar 20, 2009 at 2:39 AM, schubert zhang <zs...@gmail.com> wrote:

> Thanks ErikDo you mean I should try to remove the history log files of
> mapreduce job in HDFS? In fact, I had disabled it by seting:  <property>
>     <name>hadoop.job.history.user.location</name>
>     <value>none</value>
>   </property>
> So, there is no big logs in HDFin my cluster. I think the MapReduce
> framework is fine.
>
> J-D and Stack,
> I am thinking if it is caused by memory issue.
> Because my rowkey is very random, and thus when BatchUpdating, almost every
> region will be hit, then almost every region memcache are active for
> receiving data for write.
>
> this is my meminfo:
> [schubert@nd0 logs]$ cat /proc/meminfo
> MemTotal:      4043484 kB
> MemFree:        932000 kB
> Buffers:        368968 kB
> Cached:         870028 kB
> SwapCached:          0 kB
> Active:        2446548 kB
> Inactive:       528540 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:      4043484 kB
> LowFree:        932000 kB
> SwapTotal:     2097144 kB
> SwapFree:      2097144 kB
> Dirty:             276 kB
> Writeback:           0 kB
> AnonPages:     1735980 kB
> Mapped:          23956 kB
> Slab:           103464 kB
> PageTables:      10284 kB
> NFS_Unstable:        0 kB
> Bounce:              0 kB
> CommitLimit:   4118884 kB
> Committed_AS:  2296176 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed:    263496 kB
> VmallocChunk: 34359474679 kB
> HugePages_Total:     0
> HugePages_Free:      0
> HugePages_Rsvd:      0
> Hugepagesize:     2048 kB
>
>  [schubert@nd0 logs]$ free
>              total       used       free     shared    buffers     cached
> Mem:       4043484    3111608     931876          0     369008     870080
> -/+ buffers/cache:    1872520    2170964
> Swap:      2097144          0    2097144
>
> Schubert
>
>
> On Thu, Mar 19, 2009 at 11:43 PM, Erik Holstad <er...@gmail.com>wrote:
>
>> Are you sure that it is the insert into HBase that is getting slower? We
>> have seen slowing
>> down of the MR jobs themselves over time, depending on old logs etc stored
>> on Hdfs, so
>> it would be helpful if you could try to clean out old files and try to see
>> it the times are as bad.
>>
>> Regards Erik
>>
>
>

Re: Performance become slower and slower during inserting

Posted by schubert zhang <zs...@gmail.com>.
Thanks ErikDo you mean I should try to remove the history log files of
mapreduce job in HDFS? In fact, I had disabled it by seting:  <property>
    <name>hadoop.job.history.user.location</name>
    <value>none</value>
  </property>
So, there is no big logs in HDFin my cluster. I think the MapReduce
framework is fine.

J-D and Stack,
I am thinking if it is caused by memory issue.
Because my rowkey is very random, and thus when BatchUpdating, almost every
region will be hit, then almost every region memcache are active for
receiving data for write.

this is my meminfo:
[schubert@nd0 logs]$ cat /proc/meminfo
MemTotal:      4043484 kB
MemFree:        932000 kB
Buffers:        368968 kB
Cached:         870028 kB
SwapCached:          0 kB
Active:        2446548 kB
Inactive:       528540 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      4043484 kB
LowFree:        932000 kB
SwapTotal:     2097144 kB
SwapFree:      2097144 kB
Dirty:             276 kB
Writeback:           0 kB
AnonPages:     1735980 kB
Mapped:          23956 kB
Slab:           103464 kB
PageTables:      10284 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4118884 kB
Committed_AS:  2296176 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    263496 kB
VmallocChunk: 34359474679 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

 [schubert@nd0 logs]$ free
             total       used       free     shared    buffers     cached
Mem:       4043484    3111608     931876          0     369008     870080
-/+ buffers/cache:    1872520    2170964
Swap:      2097144          0    2097144

Schubert


On Thu, Mar 19, 2009 at 11:43 PM, Erik Holstad <er...@gmail.com>wrote:

> Are you sure that it is the insert into HBase that is getting slower? We
> have seen slowing
> down of the MR jobs themselves over time, depending on old logs etc stored
> on Hdfs, so
> it would be helpful if you could try to clean out old files and try to see
> it the times are as bad.
>
> Regards Erik
>

Re: Performance become slower and slower during inserting

Posted by Erik Holstad <er...@gmail.com>.
Are you sure that it is the insert into HBase that is getting slower? We
have seen slowing
down of the MR jobs themselves over time, depending on old logs etc stored
on Hdfs, so
it would be helpful if you could try to clean out old files and try to see
it the times are as bad.

Regards Erik

Re: Performance become slower and slower during inserting

Posted by Jean-Daniel Cryans <jd...@apache.org>.
This is unfortunate but it's already better in the current trunk with
the new file format. It's still very unstable tho.

J-D

On Thu, Mar 19, 2009 at 10:55 AM, schubert zhang <zs...@gmail.com> wrote:
> My data loader MapReduce job like following:1. only use mapper, number of
> reducer is 0.
> 2. mapred.tasktracker.map.tasks.maximum=2
> 3. my input file is about 20MB each (50000 rows, each row have about 32
> column within one family).
> 3. each time the MapReduce job load 11 files (3regionserver * 2 *1.95 = 11)
>
> Yes, I think the META scanning and more region compactions and spliting will
> slow HBase.
>
> Schubert
>
> On Thu, Mar 19, 2009 at 9:07 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> How many tasks that are writing into HBase are being spawn? One thing
>> that sure explains some slow down is the fact that your HBase clients
>> must build up their META cache, which requires lookups in the META
>> table.
>>
>> J-D
>>
>> On Thu, Mar 19, 2009 at 1:54 AM, schubert zhang <zs...@gmail.com> wrote:
>> > I am testing the performance of HBase, after about one weeks's test.
>> > I found the HBase become more and more slow when inserting data.
>> >
>> > (3 regionserver, HBase 0.19.1 and hadoop 0.19.2)
>> >
>> > Each row have about 32 column (in one family), the row have about 400
>> bytes
>> > raw data.
>> >
>> > For example:
>> > 1. when there are only 10-32 regions, the inserting time about 550000
>> rows
>> > is about 3-4 minutes.
>> > 2. when there are about 64 regions, the inserting time about 550000 rows
>> is
>> > about 6-10 minutes.
>> > 3. and then more than 10 minutes.
>> > .....
>> >
>> > Schubert
>> >
>>
>

Re: Performance become slower and slower during inserting

Posted by schubert zhang <zs...@gmail.com>.
My data loader MapReduce job like following:1. only use mapper, number of
reducer is 0.
2. mapred.tasktracker.map.tasks.maximum=2
3. my input file is about 20MB each (50000 rows, each row have about 32
column within one family).
3. each time the MapReduce job load 11 files (3regionserver * 2 *1.95 = 11)

Yes, I think the META scanning and more region compactions and spliting will
slow HBase.

Schubert

On Thu, Mar 19, 2009 at 9:07 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> How many tasks that are writing into HBase are being spawn? One thing
> that sure explains some slow down is the fact that your HBase clients
> must build up their META cache, which requires lookups in the META
> table.
>
> J-D
>
> On Thu, Mar 19, 2009 at 1:54 AM, schubert zhang <zs...@gmail.com> wrote:
> > I am testing the performance of HBase, after about one weeks's test.
> > I found the HBase become more and more slow when inserting data.
> >
> > (3 regionserver, HBase 0.19.1 and hadoop 0.19.2)
> >
> > Each row have about 32 column (in one family), the row have about 400
> bytes
> > raw data.
> >
> > For example:
> > 1. when there are only 10-32 regions, the inserting time about 550000
> rows
> > is about 3-4 minutes.
> > 2. when there are about 64 regions, the inserting time about 550000 rows
> is
> > about 6-10 minutes.
> > 3. and then more than 10 minutes.
> > .....
> >
> > Schubert
> >
>

Re: Performance become slower and slower during inserting

Posted by Jean-Daniel Cryans <jd...@apache.org>.
How many tasks that are writing into HBase are being spawn? One thing
that sure explains some slow down is the fact that your HBase clients
must build up their META cache, which requires lookups in the META
table.

J-D

On Thu, Mar 19, 2009 at 1:54 AM, schubert zhang <zs...@gmail.com> wrote:
> I am testing the performance of HBase, after about one weeks's test.
> I found the HBase become more and more slow when inserting data.
>
> (3 regionserver, HBase 0.19.1 and hadoop 0.19.2)
>
> Each row have about 32 column (in one family), the row have about 400 bytes
> raw data.
>
> For example:
> 1. when there are only 10-32 regions, the inserting time about 550000 rows
> is about 3-4 minutes.
> 2. when there are about 64 regions, the inserting time about 550000 rows is
> about 6-10 minutes.
> 3. and then more than 10 minutes.
> .....
>
> Schubert
>