You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2011/12/15 16:56:17 UTC

FSFS nosync

>From a discussion on IRC:

A BDB repository allows the admin to set DB_TXN_NOSYNC in the DBD
configuration file, this allows the admin to trade performance for
robustness.  We could do something similar in FSFS.  When loading a
dumpfile into a FSFS repository I see 13 calls to fsync per revision on
a Linux box.  If we had such a flag in fsfs.conf (Stefan suggests
"eat-my-data=yes") the code could write all the same data in the same
order but avoid making any flush calls thus allowing the OS to order
physical writes for optimum speed.

Even if not used on a live repository it is useful to have it available
when doing things such as loading a dumpfile.  The admin could set the
flag when loading a dumpfile into a new repository and clear it once
happy with the load.

As far as an implementation goes: the 13 fsync calls per-revision break
down to 6 in svn_io_file_flush called directly from fs_fs.c, and 7 from
svn_io_write_unique.  So the code would skip the svn_io_file_flush calls
and to use some new noflush version of svn_io_write_unique.

Comments?

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: FSFS nosync

Posted by Peter Samuelson <pe...@p12n.org>.
[Stefan Sperling]
> On Thu, Dec 15, 2011 at 04:04:13PM -0600, Peter Samuelson wrote:
> >     http://packages.debian.org/sid/eatmydata
> >     https://launchpad.net/libeatmydata
> > 
> > It apparently works on Linux and Solaris.  Don't know if that's enough
> > coverage for general interest.

> Even though the eatmydata code base is small and looks easily
> portable I'd be in favour in an implementation that's native to
> svnadmin without the need to use an ld preload hack.

Well, 'svnadmin load' is far from the only time eatmydata is useful to
a sysadmin.  Anything that is transaction-based and you're doing a lot
of at a time can benefit, from cleaning a spam infestation out of a
mail spool, to a package-manager-managed system upgrade in a disposable
VM.

So, in general, the advice is, "if you're doing something that involves
a lot of disk syncs, but in circumstances where you care a lot more
about speed than crash-safety, you should have a look at eatmydata."  I
prefer this approach over adding a "fast but unsafe" mode to every tool
that might be used in such a situation.

                    *          *          *

Alternatively - this is the same speed/safety tradeoff as the famous
SVN_I_LOVE_CORRUPTED_WORKING_COPIES_SO_DISABLE_SLEEP_FOR_TIMESTAMPS.
Could we just use a similar environment variable in this case as well?
The known use case is svnadmin, which is self-contained - no need to
propagate the flag to a separate server process - so a variable would
work.

Peter

AW: FSFS nosync

Posted by Markus Schaber <m....@3s-software.com>.
Hi,

Von: Branko Čibej [mailto:brane@xbc.nu] 
>On 15.12.2011 23:38, Stefan Sperling wrote:
>> On Thu, Dec 15, 2011 at 04:04:13PM -0600, Peter Samuelson wrote:
>>> [Philip Martin]
>>>> If we had such a flag in fsfs.conf (Stefan suggests
>>>> "eat-my-data=yes") the code could write all the same data in the 
>>>> same order but avoid making any flush calls thus allowing the OS to 
>>>> order physical writes for optimum speed.
>>> Given the main use case is a distinct svnadmin operation, we could 
>>> just recommend the 'eatmydata' binary (I'm guessing this is what 
>>> Stefan was thinking of when he suggested that option name) which uses 
>>> an ELF preload trick to transparently disable syscalls like fsync() 
>>> and fdatasync().
>>>
>>>     http://packages.debian.org/sid/eatmydata
>>>     https://launchpad.net/libeatmydata
>>>
>>> It apparently works on Linux and Solaris.  Don't know if that's 
>>> enough coverage for general interest.
>> Yes, this is what inspired the idea in the first place and where the 
>> suggestion for the option name comes from (it was actually Philip who 
>> brought it up, not me).
>>
>> Even though the eatmydata code base is small and looks easily portable 
>> I'd be in favour in an implementation that's native to svnadmin 
>> without the need to use an ld preload hack.

>There ain't no ld-preload hacks on Windows.

There are:
http://en.wikipedia.org/wiki/DLL_injection
http://fy.chalmers.se/~appro/nt/DLL_PRELOAD/

But I admit that none of them is a suitable replacement in our use case.

Best regards

Markus Schaber
-- 
___________________________
We software Automation.

3S-Smart Software Solutions GmbH
Markus Schaber | Developer
Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax +49-831-54031-50

Email: m.schaber@3s-software.com | Web: http://www.3s-software.com 
CoDeSys internet forum: http://forum.3s-software.com
Download CoDeSys sample projects: http://www.3s-software.com/index.shtml?sample_projects

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 


Re: FSFS nosync

Posted by Branko Čibej <br...@xbc.nu>.
On 15.12.2011 23:38, Stefan Sperling wrote:
> On Thu, Dec 15, 2011 at 04:04:13PM -0600, Peter Samuelson wrote:
>> [Philip Martin]
>>> If we had such a flag in fsfs.conf (Stefan suggests
>>> "eat-my-data=yes") the code could write all the same data in the same
>>> order but avoid making any flush calls thus allowing the OS to order
>>> physical writes for optimum speed.
>> Given the main use case is a distinct svnadmin operation, we could just
>> recommend the 'eatmydata' binary (I'm guessing this is what Stefan was
>> thinking of when he suggested that option name) which uses an ELF
>> preload trick to transparently disable syscalls like fsync() and
>> fdatasync().
>>
>>     http://packages.debian.org/sid/eatmydata
>>     https://launchpad.net/libeatmydata
>>
>> It apparently works on Linux and Solaris.  Don't know if that's enough
>> coverage for general interest.
> Yes, this is what inspired the idea in the first place and where the
> suggestion for the option name comes from (it was actually Philip who
> brought it up, not me).
>
> Even though the eatmydata code base is small and looks easily portable
> I'd be in favour in an implementation that's native to svnadmin without
> the need to use an ld preload hack.

There ain't no ld-preload hacks on Windows.

-- Brane

Re: FSFS nosync

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Dec 15, 2011 at 04:04:13PM -0600, Peter Samuelson wrote:
> 
> [Philip Martin]
> > If we had such a flag in fsfs.conf (Stefan suggests
> > "eat-my-data=yes") the code could write all the same data in the same
> > order but avoid making any flush calls thus allowing the OS to order
> > physical writes for optimum speed.
> 
> Given the main use case is a distinct svnadmin operation, we could just
> recommend the 'eatmydata' binary (I'm guessing this is what Stefan was
> thinking of when he suggested that option name) which uses an ELF
> preload trick to transparently disable syscalls like fsync() and
> fdatasync().
> 
>     http://packages.debian.org/sid/eatmydata
>     https://launchpad.net/libeatmydata
> 
> It apparently works on Linux and Solaris.  Don't know if that's enough
> coverage for general interest.

Yes, this is what inspired the idea in the first place and where the
suggestion for the option name comes from (it was actually Philip who
brought it up, not me).

Even though the eatmydata code base is small and looks easily portable
I'd be in favour in an implementation that's native to svnadmin without
the need to use an ld preload hack.

Re: FSFS nosync

Posted by Peter Samuelson <pe...@p12n.org>.
[Philip Martin]
> If we had such a flag in fsfs.conf (Stefan suggests
> "eat-my-data=yes") the code could write all the same data in the same
> order but avoid making any flush calls thus allowing the OS to order
> physical writes for optimum speed.

Given the main use case is a distinct svnadmin operation, we could just
recommend the 'eatmydata' binary (I'm guessing this is what Stefan was
thinking of when he suggested that option name) which uses an ELF
preload trick to transparently disable syscalls like fsync() and
fdatasync().

    http://packages.debian.org/sid/eatmydata
    https://launchpad.net/libeatmydata

It apparently works on Linux and Solaris.  Don't know if that's enough
coverage for general interest.  Anyway, on Debian at least, you can
then run

    eatmydata svnadmin load -q /tmp/dumpfile ...

Given this more general solution that IMO sysadmins should have in
their toolbox anyway, I'm +1 on publicising it a bit more, and -0 on
reimplementing it in svnadmin/libsvn_repos/libsvn_fs.

Peter

Re: FSFS nosync

Posted by Greg Stein <gs...@gmail.com>.
On Dec 15, 2011 1:26 PM, "Stefan Sperling" <st...@elego.de> wrote:
>
> On Thu, Dec 15, 2011 at 01:04:04PM -0500, Greg Stein wrote:
> > Couldn't we just make that an option for loading, but not provide such a
> > feature for normal operation? That seems safer to me, and solves the
actual
> > use case.
>
> That would require revving the repos and fs APIs.
> Hence the idea of putting it into fsfs.conf.
>
> But I agree that having it in the config file is a potential
> danger because people might forget to turn it back off...

Right.

I'll take rev'ing APIs any day. The old ones still work, just a bit slower
and safer. And certainly nicer than Yet Another Config Option.

Cheers,
-g

Re: FSFS nosync

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Dec 15, 2011 at 01:04:04PM -0500, Greg Stein wrote:
> Couldn't we just make that an option for loading, but not provide such a
> feature for normal operation? That seems safer to me, and solves the actual
> use case.

That would require revving the repos and fs APIs.
Hence the idea of putting it into fsfs.conf.

But I agree that having it in the config file is a potential
danger because people might forget to turn it back off...

> +1 on the flag name :-)

load: usage: svnadmin load REPOS_PATH

Read a 'dumpfile'-formatted stream from stdin, committing
[...]

Valid options:
  [...]
  --eat-my-data	           : nom nom nom
                             [used for FSFS repositories only]


Re: FSFS nosync

Posted by Greg Stein <gs...@gmail.com>.
Couldn't we just make that an option for loading, but not provide such a
feature for normal operation? That seems safer to me, and solves the actual
use case.

+1 on the flag name :-)

Cheers,
-g
On Dec 15, 2011 10:59 AM, "Philip Martin" <ph...@wandisco.com>
wrote:

> From a discussion on IRC:
>
> A BDB repository allows the admin to set DB_TXN_NOSYNC in the DBD
> configuration file, this allows the admin to trade performance for
> robustness.  We could do something similar in FSFS.  When loading a
> dumpfile into a FSFS repository I see 13 calls to fsync per revision on
> a Linux box.  If we had such a flag in fsfs.conf (Stefan suggests
> "eat-my-data=yes") the code could write all the same data in the same
> order but avoid making any flush calls thus allowing the OS to order
> physical writes for optimum speed.
>
> Even if not used on a live repository it is useful to have it available
> when doing things such as loading a dumpfile.  The admin could set the
> flag when loading a dumpfile into a new repository and clear it once
> happy with the load.
>
> As far as an implementation goes: the 13 fsync calls per-revision break
> down to 6 in svn_io_file_flush called directly from fs_fs.c, and 7 from
> svn_io_write_unique.  So the code would skip the svn_io_file_flush calls
> and to use some new noflush version of svn_io_write_unique.
>
> Comments?
>
> --
> uberSVN: Apache Subversion Made Easy
> http://www.uberSVN.com
>