You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jim Twensky <ji...@gmail.com> on 2009/04/30 00:33:18 UTC

Improving import performance

Hi,

I'm doing some experiments to import large datasets to Hbase using a Map
job. Before posting some numbers, here is a summary of my test cluster:

I have 7 regionservers and 1 master. I also run HDFS datanodes and Hadoop
tasktrackers on the same 7 regionservers. Similarly, I run the Hadoop
namenode on the same machine that I run the Hbase master. Each machine is an
IBM e325 node that has two 2.2 GHz AMD64 processors, 4 GB RAM, and 80 GB
local disk.

My dataset is simply the output of another map reduce job, consisting of 7
sequence files with a total size of 40 GB. Each file contains key, value
records of the form (Text, LongWritable). The keys are sentences or phrases
extracted from sentences and the values are frequencies. The total number of
records is roughly 420m and an average key is around 100 bytes. (40GB / 420m
- ignoring long writables)

I tried to randomize the (key,value) pairs with another map reduce job and I
also set:

            table.setAutoFlush(false);
            table.setWriteBufferSize(1024*1024*10);

based on some advice that I read before on the list. My Map function that
imports data to Hbase is as follow:

    public void map(Text key, LongWritable value,
OutputCollector<NullWritable, NullWritable> output, Reporter reporter)
throws IOException {

        BatchUpdate update = new BatchUpdate(key.toString());
        update.put("frequency:value",Bytes.toBytes(value.get()));

        table.commit(update);
    }

So far I can hit 20% of the import in 40-45 minutes so importing the whole
data set will presumbly take more than 3.5 hours. I tried diffirent write
buffer sizes between5 MB and 20 MB and didn't get any significant
improvements. I did my experiments with 1 or 2 mappers per node although 1
mapper per node seemed to do better than 2 nodes.  When I refresh the Hbase
master web interface during my imports, I see the requests are generally
divided equally to 7 regionservers and as I keep hitting the refreh button,
I can see that I get 10000 to 70000 requests at once.

I read some earlier posts from Ryan and Stack, and I was actually expecting
at least twice better performonce so I decided to ask to the list whether
this is an expected performance or way below it.

I'd appreciate any comments/suggestions.


Thanks,
Jim

Re: Improving import performance

Posted by Stack <sa...@gmail.com>.

Please add your experience to the wiki. That would be great




On May 25, 2009, at 15:14, Jim Twensky <ji...@gmail.com> wrote:

> Thanks for the valuable tips Ryan. I probably should have replied  
> sooner but
> I was busy experimenting with my tiny cluster and I'd like to share  
> some of
> my experience in the list for future reference.
>
> I listened to your advise and did a lot of reading on Sun's Garbage
> Collector, particularly the new CMS collector and the parallel  
> collector.
> When I tried the CMS collector along with the incremental option I  
> didn't
> get a big performance hit although the incremental mode is suggested  
> for
> machines with one or two processors in Suns' documentation. When I  
> removed
> the incremental option, I got a small ~5% overall increase in my  
> uploads.
>
> Moreover, when I tried to set MaxNewSize and NexSize to 12m, the  
> time spent
> on minor collections reduced just like you suggested, but the number  
> of the
> minor collections increased dramatically so the total time spent on  
> new
> generation garbage collection ( # of minor collections * average  
> time spent
> on one collection ) increased my upload time. I then decided to  
> increase
> MaxNewSize and NewSize to 48m and I observed that I got less  
> frequent but
> longer minor collections. I also increased the client buffer size  
> from 12m
> to 24m.
>
> Here are the best garbage collector settings I could come up with:
>
> export HBASE_OPTS="-Xmx1536m -Xms1536m -XX:MaxNewSize=48m - 
> XX:NewSize=48m
> -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
> -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -Xloggc:/tmp/hadoop-gc.log"
>
> On a small cluster with 7 regionservers and 1 master node, I could  
> upload
> 4GB of data (approximately 40m rows if I remember correct) in 16  
> minutes.
>
> For comparison, the following settings:
>
> export HBASE_OPTS="-Xmx1536m -Xms1536m -XX:MaxNewSize=6m - 
> XX:NewSize=6m
> -XX:+UseConcMarkSweepGC
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -Xloggc:/tmp/hadoop-gc.log
>
> resulted in a 20 min upload time.
>
> I also should mention again that I'm ona shared cluster with a total  
> of 100
> machines where each machine has a dual core CPU with 4 GB of memory.  
> The 8
> machines that I used were dedicated to me during the test time so  
> there was
> no one using them accept for me but I don't know if the network  
> traffic in
> the whole cluster affected the tests or not because I didn't measure  
> it.
>
>
> I thought this info might be useful for people using old hardware and
> limited memory and processing power like me. I also saw that you  
> posted some
> of your experience in the Performance Tuning section of the wiki, so  
> I can
> write some advice for users like me if you think that would be  
> helpful for
> the rest of the community.
>
> Thanks,
> Jim
>
>
> On Wed, Apr 29, 2009 at 5:54 PM, Ryan Rawson <ry...@gmail.com>  
> wrote:
>
>> You might have to delve into tweaking the GC settings.  Here is  
>> what I am
>> setting:
>>
>> export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:NewSize=12m
>> -XX:MaxNewSize=12m -verbose:gc -XX:+PrintGCDetails -XX: 
>> +PrintGCTimeStamps
>> -Xloggc:/export/hadoop/logs/gc.log"
>>
>> What I found is once your in-memory complexity rises, the JVM  
>> resizes the
>> ParNew/new-gen space up and up, thus extending the length of the so- 
>> called
>> 'minor collections'.  At one point with a 150MB ParNew I was seeing
>> typically 100-150ms and outliers of 400ms for 'minor' GC pauses.   
>> If you
>> have all machines in your cluster pausing for 100ms at semi-random
>> intervals, that holds up your import as the clients are waiting on  
>> the
>> paused JVM to continue.
>>
>> The key thing I had to do was set -XX:MaxNewSize=12m (about 2 * my  
>> L2 on
>> xeons).  You get more frequent, but smaller, GCs.  Your heap also  
>> tends to
>> grow larger than before, but with CMS it doesnt result in larger VM  
>> pauses,
>> just more ram usage.  I I personally use -Xmx4500m.   With your  
>> machines,
>> I'd consider a setting at least 1500, preferably 2000-2500m.  Of  
>> course I
>> have a ton of heap to chuck at it, so CMS collections dont happen  
>> all the
>> time (but when they do, they can prune 1500mb of garbage).
>>
>> Since you are on a 2 core, you will probably have to set the CMS to
>> incremental:
>> -XX:+CMSIncrementalMode
>>
>> To prevent the CMS GC from starving out your main threads.
>>
>> Good luck with it!
>> -ryan
>>
>> On Wed, Apr 29, 2009 at 3:33 PM, Jim Twensky <ji...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm doing some experiments to import large datasets to Hbase using  
>>> a Map
>>> job. Before posting some numbers, here is a summary of my test  
>>> cluster:
>>>
>>> I have 7 regionservers and 1 master. I also run HDFS datanodes and  
>>> Hadoop
>>> tasktrackers on the same 7 regionservers. Similarly, I run the  
>>> Hadoop
>>> namenode on the same machine that I run the Hbase master. Each  
>>> machine is
>>> an
>>> IBM e325 node that has two 2.2 GHz AMD64 processors, 4 GB RAM, and  
>>> 80 GB
>>> local disk.
>>>
>>> My dataset is simply the output of another map reduce job,  
>>> consisting of
>> 7
>>> sequence files with a total size of 40 GB. Each file contains key,  
>>> value
>>> records of the form (Text, LongWritable). The keys are sentences or
>> phrases
>>> extracted from sentences and the values are frequencies. The total  
>>> number
>>> of
>>> records is roughly 420m and an average key is around 100 bytes.  
>>> (40GB /
>>> 420m
>>> - ignoring long writables)
>>>
>>> I tried to randomize the (key,value) pairs with another map reduce  
>>> job
>> and
>>> I
>>> also set:
>>>
>>>           table.setAutoFlush(false);
>>>           table.setWriteBufferSize(1024*1024*10);
>>>
>>> based on some advice that I read before on the list. My Map  
>>> function that
>>> imports data to Hbase is as follow:
>>>
>>>   public void map(Text key, LongWritable value,
>>> OutputCollector<NullWritable, NullWritable> output, Reporter  
>>> reporter)
>>> throws IOException {
>>>
>>>       BatchUpdate update = new BatchUpdate(key.toString());
>>>       update.put("frequency:value",Bytes.toBytes(value.get()));
>>>
>>>       table.commit(update);
>>>   }
>>>
>>> So far I can hit 20% of the import in 40-45 minutes so importing the
>> whole
>>> data set will presumbly take more than 3.5 hours. I tried  
>>> diffirent write
>>> buffer sizes between5 MB and 20 MB and didn't get any significant
>>> improvements. I did my experiments with 1 or 2 mappers per node  
>>> although
>> 1
>>> mapper per node seemed to do better than 2 nodes.  When I refresh  
>>> the
>> Hbase
>>> master web interface during my imports, I see the requests are  
>>> generally
>>> divided equally to 7 regionservers and as I keep hitting the refreh
>> button,
>>> I can see that I get 10000 to 70000 requests at once.
>>>
>>> I read some earlier posts from Ryan and Stack, and I was actually
>> expecting
>>> at least twice better performonce so I decided to ask to the list  
>>> whether
>>> this is an expected performance or way below it.
>>>
>>> I'd appreciate any comments/suggestions.
>>>
>>>
>>> Thanks,
>>> Jim
>>>
>>

Re: Improving import performance

Posted by Jim Twensky <ji...@gmail.com>.

Thanks for the valuable tips Ryan. I probably should have replied sooner but
I was busy experimenting with my tiny cluster and I'd like to share some of
my experience in the list for future reference.

I listened to your advise and did a lot of reading on Sun's Garbage
Collector, particularly the new CMS collector and the parallel collector.
When I tried the CMS collector along with the incremental option I didn't
get a big performance hit although the incremental mode is suggested for
machines with one or two processors in Suns' documentation. When I removed
the incremental option, I got a small ~5% overall increase in my uploads.

Moreover, when I tried to set MaxNewSize and NexSize to 12m, the time spent
on minor collections reduced just like you suggested, but the number of the
minor collections increased dramatically so the total time spent on new
generation garbage collection ( # of minor collections * average time spent
on one collection ) increased my upload time. I then decided to increase
MaxNewSize and NewSize to 48m and I observed that I got less frequent but
longer minor collections. I also increased the client buffer size from 12m
to 24m.

Here are the best garbage collector settings I could come up with:

export HBASE_OPTS="-Xmx1536m -Xms1536m -XX:MaxNewSize=48m -XX:NewSize=48m
-XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -Xloggc:/tmp/hadoop-gc.log"

On a small cluster with 7 regionservers and 1 master node, I could upload
4GB of data (approximately 40m rows if I remember correct) in 16 minutes.

For comparison, the following settings:

export HBASE_OPTS="-Xmx1536m -Xms1536m -XX:MaxNewSize=6m -XX:NewSize=6m
-XX:+UseConcMarkSweepGC
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:/tmp/hadoop-gc.log

resulted in a 20 min upload time.

I also should mention again that I'm ona shared cluster with a total of 100
machines where each machine has a dual core CPU with 4 GB of memory. The 8
machines that I used were dedicated to me during the test time so there was
no one using them accept for me but I don't know if the network traffic in
the whole cluster affected the tests or not because I didn't measure it.

I thought this info might be useful for people using old hardware and
limited memory and processing power like me. I also saw that you posted some
of your experience in the Performance Tuning section of the wiki, so I can
write some advice for users like me if you think that would be helpful for
the rest of the community.

Thanks,
Jim

On Wed, Apr 29, 2009 at 5:54 PM, Ryan Rawson <ry...@gmail.com> wrote:

> You might have to delve into tweaking the GC settings.  Here is what I am
> setting:
>
> export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:NewSize=12m
> -XX:MaxNewSize=12m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -Xloggc:/export/hadoop/logs/gc.log"
>
> What I found is once your in-memory complexity rises, the JVM resizes the
> ParNew/new-gen space up and up, thus extending the length of the so-called
> 'minor collections'.  At one point with a 150MB ParNew I was seeing
> typically 100-150ms and outliers of 400ms for 'minor' GC pauses.  If you
> have all machines in your cluster pausing for 100ms at semi-random
> intervals, that holds up your import as the clients are waiting on the
> paused JVM to continue.
>
> The key thing I had to do was set -XX:MaxNewSize=12m (about 2 * my L2 on
> xeons).  You get more frequent, but smaller, GCs.  Your heap also tends to
> grow larger than before, but with CMS it doesnt result in larger VM pauses,
> just more ram usage.  I I personally use -Xmx4500m.   With your machines,
> I'd consider a setting at least 1500, preferably 2000-2500m.  Of course I
> have a ton of heap to chuck at it, so CMS collections dont happen all the
> time (but when they do, they can prune 1500mb of garbage).
>
> Since you are on a 2 core, you will probably have to set the CMS to
> incremental:
> -XX:+CMSIncrementalMode
>
> To prevent the CMS GC from starving out your main threads.
>
> Good luck with it!
> -ryan
>
> On Wed, Apr 29, 2009 at 3:33 PM, Jim Twensky <ji...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I'm doing some experiments to import large datasets to Hbase using a Map
> > job. Before posting some numbers, here is a summary of my test cluster:
> >
> > I have 7 regionservers and 1 master. I also run HDFS datanodes and Hadoop
> > tasktrackers on the same 7 regionservers. Similarly, I run the Hadoop
> > namenode on the same machine that I run the Hbase master. Each machine is
> > an
> > IBM e325 node that has two 2.2 GHz AMD64 processors, 4 GB RAM, and 80 GB
> > local disk.
> >
> > My dataset is simply the output of another map reduce job, consisting of
> 7
> > sequence files with a total size of 40 GB. Each file contains key, value
> > records of the form (Text, LongWritable). The keys are sentences or
> phrases
> > extracted from sentences and the values are frequencies. The total number
> > of
> > records is roughly 420m and an average key is around 100 bytes. (40GB /
> > 420m
> > - ignoring long writables)
> >
> > I tried to randomize the (key,value) pairs with another map reduce job
> and
> > I
> > also set:
> >
> >            table.setAutoFlush(false);
> >            table.setWriteBufferSize(1024*1024*10);
> >
> > based on some advice that I read before on the list. My Map function that
> > imports data to Hbase is as follow:
> >
> >    public void map(Text key, LongWritable value,
> > OutputCollector<NullWritable, NullWritable> output, Reporter reporter)
> > throws IOException {
> >
> >        BatchUpdate update = new BatchUpdate(key.toString());
> >        update.put("frequency:value",Bytes.toBytes(value.get()));
> >
> >        table.commit(update);
> >    }
> >
> > So far I can hit 20% of the import in 40-45 minutes so importing the
> whole
> > data set will presumbly take more than 3.5 hours. I tried diffirent write
> > buffer sizes between5 MB and 20 MB and didn't get any significant
> > improvements. I did my experiments with 1 or 2 mappers per node although
> 1
> > mapper per node seemed to do better than 2 nodes.  When I refresh the
> Hbase
> > master web interface during my imports, I see the requests are generally
> > divided equally to 7 regionservers and as I keep hitting the refreh
> button,
> > I can see that I get 10000 to 70000 requests at once.
> >
> > I read some earlier posts from Ryan and Stack, and I was actually
> expecting
> > at least twice better performonce so I decided to ask to the list whether
> > this is an expected performance or way below it.
> >
> > I'd appreciate any comments/suggestions.
> >
> >
> > Thanks,
> > Jim
> >
>

Re: Improving import performance

Posted by Ryan Rawson <ry...@gmail.com>.

You might have to delve into tweaking the GC settings.  Here is what I am
setting:

export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:NewSize=12m
-XX:MaxNewSize=12m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:/export/hadoop/logs/gc.log"

What I found is once your in-memory complexity rises, the JVM resizes the
ParNew/new-gen space up and up, thus extending the length of the so-called
'minor collections'.  At one point with a 150MB ParNew I was seeing
typically 100-150ms and outliers of 400ms for 'minor' GC pauses.  If you
have all machines in your cluster pausing for 100ms at semi-random
intervals, that holds up your import as the clients are waiting on the
paused JVM to continue.

The key thing I had to do was set -XX:MaxNewSize=12m (about 2 * my L2 on
xeons).  You get more frequent, but smaller, GCs.  Your heap also tends to
grow larger than before, but with CMS it doesnt result in larger VM pauses,
just more ram usage.  I I personally use -Xmx4500m.   With your machines,
I'd consider a setting at least 1500, preferably 2000-2500m.  Of course I
have a ton of heap to chuck at it, so CMS collections dont happen all the
time (but when they do, they can prune 1500mb of garbage).

Since you are on a 2 core, you will probably have to set the CMS to
incremental:
-XX:+CMSIncrementalMode

To prevent the CMS GC from starving out your main threads.

Good luck with it!
-ryan

On Wed, Apr 29, 2009 at 3:33 PM, Jim Twensky <ji...@gmail.com> wrote:

> Hi,
>
> I'm doing some experiments to import large datasets to Hbase using a Map
> job. Before posting some numbers, here is a summary of my test cluster:
>
> I have 7 regionservers and 1 master. I also run HDFS datanodes and Hadoop
> tasktrackers on the same 7 regionservers. Similarly, I run the Hadoop
> namenode on the same machine that I run the Hbase master. Each machine is
> an
> IBM e325 node that has two 2.2 GHz AMD64 processors, 4 GB RAM, and 80 GB
> local disk.
>
> My dataset is simply the output of another map reduce job, consisting of 7
> sequence files with a total size of 40 GB. Each file contains key, value
> records of the form (Text, LongWritable). The keys are sentences or phrases
> extracted from sentences and the values are frequencies. The total number
> of
> records is roughly 420m and an average key is around 100 bytes. (40GB /
> 420m
> - ignoring long writables)
>
> I tried to randomize the (key,value) pairs with another map reduce job and
> I
> also set:
>
>            table.setAutoFlush(false);
>            table.setWriteBufferSize(1024*1024*10);
>
> based on some advice that I read before on the list. My Map function that
> imports data to Hbase is as follow:
>
>    public void map(Text key, LongWritable value,
> OutputCollector<NullWritable, NullWritable> output, Reporter reporter)
> throws IOException {
>
>        BatchUpdate update = new BatchUpdate(key.toString());
>        update.put("frequency:value",Bytes.toBytes(value.get()));
>
>        table.commit(update);
>    }
>
> So far I can hit 20% of the import in 40-45 minutes so importing the whole
> data set will presumbly take more than 3.5 hours. I tried diffirent write
> buffer sizes between5 MB and 20 MB and didn't get any significant
> improvements. I did my experiments with 1 or 2 mappers per node although 1
> mapper per node seemed to do better than 2 nodes.  When I refresh the Hbase
> master web interface during my imports, I see the requests are generally
> divided equally to 7 regionservers and as I keep hitting the refreh button,
> I can see that I get 10000 to 70000 requests at once.
>
> I read some earlier posts from Ryan and Stack, and I was actually expecting
> at least twice better performonce so I decided to ask to the list whether
> this is an expected performance or way below it.
>
> I'd appreciate any comments/suggestions.
>
>
> Thanks,
> Jim
>