You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2012/07/07 07:49:49 UTC

For HBase compactions - Lucene's IO impact reduction code

Hi,

Here is something that may be of interest to HBase:

Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene developers, wrote a really nice post about new things in this version of Lucene.  The part that I think is interesting for HBase, and that HBase devs may want to look at (and borrow to use with compactions) is this:

Reducing merge IO impact 

Merging (consolidating many small segments into a single big one) is a very IO and CPU intensive operation which can easily interfere with ongoing searches. In 4.0.0 we now have two ways to reduct this impact:
	* Rate-limit the IO caused by ongoing merging, by callingFSDirectory.setMaxMergeWriteMBPerSec. 


	* Use the new NativeUnixDirectory which bypasses the OS's IO cache for all merge IO, by using direct IO. This ensures that a merge won't evict hot pages used by searches. (Note that there is also a native WindowsDirectory, but it does not yet use direct IO during merging... patches welcome!). 

Remember to also set swappiness to 0 on Linux if you want to maximize search responsiveness. 

More generally, the APIs that open an input or output file (Directory.openInput andDirectory.createOutput) now take an IOContext describing what's being done (e.g., flush vs merge), so you can create a custom Directory that changes its behavior depending on the context. 

These changes were part of a 2011 Google Summer of Code project (thank you Varun!).  

 

Thoughts?

Otis
----
Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm

Re: For HBase compactions - Lucene's IO impact reduction code

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi Lars,

Yeah, I was really thinking more about this part being useful for HBase:
"Use the new NativeUnixDirectory which bypasses the OS's IO cache for all merge IO, by using direct IO. This ensures that a merge won't evict hot pages used by searches."

Here it is: https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/misc/src/java/org/apache/lucene/store/NativeUnixDirectory.java

And it looks this requires something called NativePosixUtil.cpp which lives in Lucene.  Here is a reference: http://fossies.org/dox/apache-solr-3.6.0-src/NativePosixUtil_8cpp.html 

Judging by the lack of discussion around this I'm guessing this is not a big enough itch - either because this is not an actual problem or because we have no way of knowing how much damage compactions are doing to OS buffers.

But you can see some agreements around the above actually being attractive - http://search-hadoop.com/m/waHGf0r3K42 -- from February 2011.

Otis
----
Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm 



>________________________________
> From: Lars George <la...@gmail.com>
>To: "dev@hbase.apache.org" <de...@hbase.apache.org> 
>Sent: Saturday, July 7, 2012 3:01 AM
>Subject: Re: For HBase compactions - Lucene's IO impact reduction code
> 
>Hi Otis,
>
>Throttling I think is a less needed feature as we typically struggle to keep up with the compaction queue under load. Reducing background noise caused by compactions is more an exercise of tuning the compaction algorithm itself. That is still somewhat of a black art it seems. 
>
>As for the OS buffer bypassing, Todd did some work along these lines in HDFS, which helped speeding up HBase (for CDH this went into CDH3u4). Not sure if it is really the same or not, so I leave this for someone else to comment on. 
>
>But indeed interesting ideas and should be discussed thoroughly. 
>
>Lars
>
>On Jul 7, 2012, at 7:49, Otis Gospodnetic <ot...@yahoo.com> wrote:
>
>> Hi,
>> 
>> Here is something that may be of interest to HBase:
>> 
>> Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene developers, wrote a really nice post about new things in this version of Lucene.  The part that I think is interesting for HBase, and that HBase devs may want to look at (and borrow to use with compactions) is this:
>> 
>> Reducing merge IO impact 
>> 
>> Merging (consolidating many small segments into a single big one) is a very IO and CPU intensive operation which can easily interfere with ongoing searches. In 4.0.0 we now have two ways to reduct this impact:
>>    * Rate-limit the IO caused by ongoing merging, by callingFSDirectory.setMaxMergeWriteMBPerSec. 
>> 
>> 
>>    * Use the new NativeUnixDirectory which bypasses the OS's IO cache for all merge IO, by using direct IO. This ensures that a merge won't evict hot pages used by searches. (Note that there is also a native WindowsDirectory, but it does not yet use direct IO during merging... patches welcome!). 
>> 
>> Remember to also set swappiness to 0 on Linux if you want to maximize search responsiveness. 
>> 
>> More generally, the APIs that open an input or output file (Directory.openInput andDirectory.createOutput) now take an IOContext describing what's being done (e.g., flush vs merge), so you can create a custom Directory that changes its behavior depending on the context. 
>> 
>> These changes were part of a 2011 Google Summer of Code project (thank you Varun!).  
>> 
>>  
>> 
>> Thoughts?
>> 
>> Otis
>> ----
>> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm 
>
>
>

Re: For HBase compactions - Lucene's IO impact reduction code

Posted by Stack <st...@duboce.net>.

On Sat, Jul 7, 2012 at 12:28 PM, Ted Yu <yu...@gmail.com> wrote:
> I created HBASE-6351 with Otis's comments.
>
> Let's continue discussion from there.
>

I'd suggest not being so quick moving discussion up to JIRA.

See Karl Fogel on this topic in his oldie but a goodie "Producing Open
Source Software":

"Make sure the bug tracker doesn't turn into a discussion forum.
Although it is important to maintain a human presence in the bug
tracker, it is not fundamentally suited to real-time discussion. Think
of it rather as an archiver, a way to organize facts and references
to other discussions, primarily those that take place on mailing lists.

"There are two reasons to make this distinction. First, the bug
tracker is more cumbersome to use than the mailing lists (or than
real-time chat forums, for that matter). This is not because bug
trackers have bad user interface design, it's just that their interfaces
were designed for capturing and presenting discrete states, not
free-flowing discussions. Second, not everyone who should be
involved in discussing a given issue is necessarily watching the bug
tracker. Part of good issue management...is to make sure each issue
is brought to the right peoples' attention, rather than requiring every
developer to monitor all issues. In the section called “No
Conversations in the Bug Tracker” in
Chapter 6, Communications, we'll look at ways to make sure people
don't accidentally siphon discussions out of appropriate forums
and into the bug tracker."

Pg. 50 of http://producingoss.com/en/producingoss.pdf

In general, I'd be in favor of there being more discussion on dev list.

St.Ack

Re: For HBase compactions - Lucene's IO impact reduction code

Posted by Ted Yu <yu...@gmail.com>.

I created HBASE-6351 with Otis's comments.

Let's continue discussion from there.

On Sat, Jul 7, 2012 at 12:01 AM, Lars George <la...@gmail.com> wrote:

> Hi Otis,
>
> Throttling I think is a less needed feature as we typically struggle to
> keep up with the compaction queue under load. Reducing background noise
> caused by compactions is more an exercise of tuning the compaction
> algorithm itself. That is still somewhat of a black art it seems.
>
> As for the OS buffer bypassing, Todd did some work along these lines in
> HDFS, which helped speeding up HBase (for CDH this went into CDH3u4). Not
> sure if it is really the same or not, so I leave this for someone else to
> comment on.
>
> But indeed interesting ideas and should be discussed thoroughly.
>
> Lars
>
> On Jul 7, 2012, at 7:49, Otis Gospodnetic <ot...@yahoo.com>
> wrote:
>
> > Hi,
> >
> > Here is something that may be of interest to HBase:
> >
> > Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the
> Lucene developers, wrote a really nice post about new things in this
> version of Lucene.  The part that I think is interesting for HBase, and
> that HBase devs may want to look at (and borrow to use with compactions) is
> this:
> >
> > Reducing merge IO impact
> >
> > Merging (consolidating many small segments into a single big one) is a
> very IO and CPU intensive operation which can easily interfere with ongoing
> searches. In 4.0.0 we now have two ways to reduct this impact:
> >    * Rate-limit the IO caused by ongoing merging, by
> callingFSDirectory.setMaxMergeWriteMBPerSec.
> >
> >
> >    * Use the new NativeUnixDirectory which bypasses the OS's IO cache
> for all merge IO, by using direct IO. This ensures that a merge won't evict
> hot pages used by searches. (Note that there is also a native
> WindowsDirectory, but it does not yet use direct IO during merging...
> patches welcome!).
> >
> > Remember to also set swappiness to 0 on Linux if you want to maximize
> search responsiveness.
> >
> > More generally, the APIs that open an input or output file
> (Directory.openInput andDirectory.createOutput) now take an IOContext
> describing what's being done (e.g., flush vs merge), so you can create a
> custom Directory that changes its behavior depending on the context.
> >
> > These changes were part of a 2011 Google Summer of Code project (thank
> you Varun!).
> >
> >
> >
> > Thoughts?
> >
> > Otis
> > ----
> > Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>

Re: For HBase compactions - Lucene's IO impact reduction code

Posted by Lars George <la...@gmail.com>.

Hi Otis,

Throttling I think is a less needed feature as we typically struggle to keep up with the compaction queue under load. Reducing background noise caused by compactions is more an exercise of tuning the compaction algorithm itself. That is still somewhat of a black art it seems. 

As for the OS buffer bypassing, Todd did some work along these lines in HDFS, which helped speeding up HBase (for CDH this went into CDH3u4). Not sure if it is really the same or not, so I leave this for someone else to comment on. 

But indeed interesting ideas and should be discussed thoroughly. 

Lars

On Jul 7, 2012, at 7:49, Otis Gospodnetic <ot...@yahoo.com> wrote:

> Hi,
> 
> Here is something that may be of interest to HBase:
> 
> Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene developers, wrote a really nice post about new things in this version of Lucene.  The part that I think is interesting for HBase, and that HBase devs may want to look at (and borrow to use with compactions) is this:
> 
> Reducing merge IO impact 
> 
> Merging (consolidating many small segments into a single big one) is a very IO and CPU intensive operation which can easily interfere with ongoing searches. In 4.0.0 we now have two ways to reduct this impact:
>    * Rate-limit the IO caused by ongoing merging, by callingFSDirectory.setMaxMergeWriteMBPerSec. 
> 
> 
>    * Use the new NativeUnixDirectory which bypasses the OS's IO cache for all merge IO, by using direct IO. This ensures that a merge won't evict hot pages used by searches. (Note that there is also a native WindowsDirectory, but it does not yet use direct IO during merging... patches welcome!). 
> 
> Remember to also set swappiness to 0 on Linux if you want to maximize search responsiveness. 
> 
> More generally, the APIs that open an input or output file (Directory.openInput andDirectory.createOutput) now take an IOContext describing what's being done (e.g., flush vs merge), so you can create a custom Directory that changes its behavior depending on the context. 
> 
> These changes were part of a 2011 Google Summer of Code project (thank you Varun!).  
> 
>  
> 
> Thoughts?
> 
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm