You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Bert Huijben <be...@qqmail.nl> on 2015/06/17 15:46:09 UTC

RE: svn commit: r1685985 - in /subversion/branches/fsx-1.10/subversion: include/private/svn_mutex.h libsvn_fs_x/batch_fsync.c libsvn_fs_x/batch_fsync.h libsvn_fs_x/fs.c libsvn_subr/mutex.c tests/libsvn_fs_x/fs-x-pack-test.c


> -----Original Message-----
> From: stefan2@apache.org [mailto:stefan2@apache.org]
> Sent: woensdag 17 juni 2015 12:09
> To: commits@subversion.apache.org
> Subject: svn commit: r1685985 - in /subversion/branches/fsx-1.10/subversion:
> include/private/svn_mutex.h libsvn_fs_x/batch_fsync.c
> libsvn_fs_x/batch_fsync.h libsvn_fs_x/fs.c libsvn_subr/mutex.c
> tests/libsvn_fs_x/fs-x-pack-test.c
> 
> Author: stefan2
> Date: Wed Jun 17 10:09:12 2015
> New Revision: 1685985
> 
> URL: http://svn.apache.org/r1685985
> Log:
> On the fsx-1.10 branch:
> Introduce a new infrastructure to FSX that allows us to do efficient fsyncs.
> 
> It basically uses a thread pool to execute multiple fsyncs concurrently.
> Interestingly, this generic implementation is faster on Linux than even the
> POSIX-provided aio_fsync functionality on the same system. As a centralized
> mechanism for scheduling fsyncs it also takes care of preventing redundant
> flushes.
> 
> With this commit, FSX does not actually use the new capabilities. That will
> be in the following commits.

Would it be possible to implement this on filehandles on Windows, instead of just on filenames.

Reopening a file that has just been closed is typically not fast on Windows, as virusscanners and file indexers are often fighting to use the same files. The next open operation then sometimes has to wait (when using OPLOCKS) or has to be retried (when the other process opens the file in a way that concurrent writes are denied).

All of this can be avoided by just flushing the filehandle we already had when the file was open for writing. By closing and re-opening we are doing a lot of unneeded work and add a lot of potential race conditions.

	Bert


Re: svn commit: r1685985 - in /subversion/branches/fsx-1.10/subversion: include/private/svn_mutex.h libsvn_fs_x/batch_fsync.c libsvn_fs_x/batch_fsync.h libsvn_fs_x/fs.c libsvn_subr/mutex.c tests/libsvn_fs_x/fs-x-pack-test.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Fri, Jun 19, 2015 at 11:50 AM, Branko Čibej <br...@wandisco.com> wrote:

> On 18.06.2015 21:58, Stefan Fuhrmann wrote:
> > One key design element is that the batch fsync
> > container owns the open files and will open them
> > only once. So, it can not only guarantee that we
> > don't need to reopen files for fsync but it can even
> > prevent reopening files in cases where different
> > functions were to open & close the same as part
> > of some bigger functionality.
>
> Just remember that we've had problems with too many open files in the
> past; make sure that your design limits that number to a sane value.
>
> To give you an idea of the kind of restriction we're talking about, this
> is what the newest OSX reports by default:
>
>     $ ulimit -a | grep 'open files'
>     open files                      (-n) 256
>
>
> Not exactly an abundance ...
>

Yes, I'm aware of that. For FSX, we will open at most 7
file & directory handles in any fsync batch. Basically one
for everything that is part of or container for a revision.
This is in the same order of magnitude like what we use
for delta chains.

-- Stefan^2.

Re: svn commit: r1685985 - in /subversion/branches/fsx-1.10/subversion: include/private/svn_mutex.h libsvn_fs_x/batch_fsync.c libsvn_fs_x/batch_fsync.h libsvn_fs_x/fs.c libsvn_subr/mutex.c tests/libsvn_fs_x/fs-x-pack-test.c

Posted by Branko Čibej <br...@wandisco.com>.
On 18.06.2015 21:58, Stefan Fuhrmann wrote:
> One key design element is that the batch fsync
> container owns the open files and will open them
> only once. So, it can not only guarantee that we
> don't need to reopen files for fsync but it can even
> prevent reopening files in cases where different
> functions were to open & close the same as part
> of some bigger functionality.

Just remember that we've had problems with too many open files in the
past; make sure that your design limits that number to a sane value.

To give you an idea of the kind of restriction we're talking about, this
is what the newest OSX reports by default:

    $ ulimit -a | grep 'open files'
    open files                      (-n) 256


Not exactly an abundance ...


-- Brane


Re: svn commit: r1685985 - in /subversion/branches/fsx-1.10/subversion: include/private/svn_mutex.h libsvn_fs_x/batch_fsync.c libsvn_fs_x/batch_fsync.h libsvn_fs_x/fs.c libsvn_subr/mutex.c tests/libsvn_fs_x/fs-x-pack-test.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Wed, Jun 17, 2015 at 3:46 PM, Bert Huijben <be...@qqmail.nl> wrote:

>
>
> > -----Original Message-----
> > From: stefan2@apache.org [mailto:stefan2@apache.org]
> > Sent: woensdag 17 juni 2015 12:09
> > To: commits@subversion.apache.org
> > Subject: svn commit: r1685985 - in
> /subversion/branches/fsx-1.10/subversion:
> > include/private/svn_mutex.h libsvn_fs_x/batch_fsync.c
> > libsvn_fs_x/batch_fsync.h libsvn_fs_x/fs.c libsvn_subr/mutex.c
> > tests/libsvn_fs_x/fs-x-pack-test.c
> >
> > Author: stefan2
> > Date: Wed Jun 17 10:09:12 2015
> > New Revision: 1685985
> >
> > URL: http://svn.apache.org/r1685985
> > Log:
> > On the fsx-1.10 branch:
> > Introduce a new infrastructure to FSX that allows us to do efficient
> fsyncs.
> >
> > It basically uses a thread pool to execute multiple fsyncs concurrently.
> > Interestingly, this generic implementation is faster on Linux than even
> the
> > POSIX-provided aio_fsync functionality on the same system. As a
> centralized
> > mechanism for scheduling fsyncs it also takes care of preventing
> redundant
> > flushes.
> >
> > With this commit, FSX does not actually use the new capabilities. That
> will
> > be in the following commits.
>
> Would it be possible to implement this on filehandles on Windows, instead
> of just on filenames.
>

That is actually what I'm doing. It may not be obvious
from the interface but the usage is such that files will
be opened only once and fsync'ed through that handle.

The only exception so far is the 'current' file which needs
to be fsync'ed twice (before and after the rename). But
I plan to eliminate the extra fsync even there.


> Reopening a file that has just been closed is typically not fast on
> Windows, as virusscanners and file indexers are often fighting to use the
> same files. The next open operation then sometimes has to wait (when using
> OPLOCKS) or has to be retried (when the other process opens the file in a
> way that concurrent writes are denied).
>

I may not have answered your post when you wrote
that a couple of weeks ago but I heard you and used
it as input for the batch fsync design ;)


> All of this can be avoided by just flushing the filehandle we already had
> when the file was open for writing. By closing and re-opening we are doing
> a lot of unneeded work and add a lot of potential race conditions.
>

One key design element is that the batch fsync
container owns the open files and will open them
only once. So, it can not only guarantee that we
don't need to reopen files for fsync but it can even
prevent reopening files in cases where different
functions were to open & close the same as part
of some bigger functionality.

That said, this is all very much work-in-progress
and the final logic may be quite different again.

-- Stefan^2.

Re: svn commit: r1685985 - in /subversion/branches/fsx-1.10/subversion: include/private/svn_mutex.h libsvn_fs_x/batch_fsync.c libsvn_fs_x/batch_fsync.h libsvn_fs_x/fs.c libsvn_subr/mutex.c tests/libsvn_fs_x/fs-x-pack-test.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Wed, Jun 17, 2015 at 3:46 PM, Bert Huijben <be...@qqmail.nl> wrote:

>
>
> > -----Original Message-----
> > From: stefan2@apache.org [mailto:stefan2@apache.org]
> > Sent: woensdag 17 juni 2015 12:09
> > To: commits@subversion.apache.org
> > Subject: svn commit: r1685985 - in
> /subversion/branches/fsx-1.10/subversion:
> > include/private/svn_mutex.h libsvn_fs_x/batch_fsync.c
> > libsvn_fs_x/batch_fsync.h libsvn_fs_x/fs.c libsvn_subr/mutex.c
> > tests/libsvn_fs_x/fs-x-pack-test.c
> >
> > Author: stefan2
> > Date: Wed Jun 17 10:09:12 2015
> > New Revision: 1685985
> >
> > URL: http://svn.apache.org/r1685985
> > Log:
> > On the fsx-1.10 branch:
> > Introduce a new infrastructure to FSX that allows us to do efficient
> fsyncs.
> >
> > It basically uses a thread pool to execute multiple fsyncs concurrently.
> > Interestingly, this generic implementation is faster on Linux than even
> the
> > POSIX-provided aio_fsync functionality on the same system. As a
> centralized
> > mechanism for scheduling fsyncs it also takes care of preventing
> redundant
> > flushes.
> >
> > With this commit, FSX does not actually use the new capabilities. That
> will
> > be in the following commits.
>
> Would it be possible to implement this on filehandles on Windows, instead
> of just on filenames.
>

That is actually what I'm doing. It may not be obvious
from the interface but the usage is such that files will
be opened only once and fsync'ed through that handle.

The only exception so far is the 'current' file which needs
to be fsync'ed twice (before and after the rename). But
I plan to eliminate the extra fsync even there.


> Reopening a file that has just been closed is typically not fast on
> Windows, as virusscanners and file indexers are often fighting to use the
> same files. The next open operation then sometimes has to wait (when using
> OPLOCKS) or has to be retried (when the other process opens the file in a
> way that concurrent writes are denied).
>

I may not have answered your post when you wrote
that a couple of weeks ago but I heard you and used
it as input for the batch fsync design ;)


> All of this can be avoided by just flushing the filehandle we already had
> when the file was open for writing. By closing and re-opening we are doing
> a lot of unneeded work and add a lot of potential race conditions.
>

One key design element is that the batch fsync
container owns the open files and will open them
only once. So, it can not only guarantee that we
don't need to reopen files for fsync but it can even
prevent reopening files in cases where different
functions were to open & close the same as part
of some bigger functionality.

That said, this is all very much work-in-progress
and the final logic may be quite different again.

-- Stefan^2.