You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Bert Huijben <be...@qqmail.nl> on 2014/06/30 19:34:02 UTC

[Patch] Reader Writer lock performance on Windows

	Hi,

I was profiling a subversion operation which actively used a recently added
memory cache that uses apr rwlocks. Somehow just these locks used more than
2.5% of the total processing time (which is mostly IO bound).
(For future reference: 'svn log file:///<RUBY>/trunk/ChangeLog' against a
packed local format 6 fsfs repository)

This made me look at the current implementation of rwlocks on Windows: a
mutex, combined with an event. Both quite heavy synchronization primitives.

Since Windows Vista, Microsoft provides a 'Slim Reader Writer lock'
implementation, which could just be used by apr instead of this old
implementation on all common Windows platforms. 
See
http://msdn.microsoft.com/en-us/library/windows/desktop/aa904937(v=vs.85).as
px

I wrote an initial implementation which might need some further cleanup (see
patch). The results of testlockperf.exe with the new code are quite
spectacular on my test VM. from roughly 10 times faster with a single thread
to > 100 times faster with 6 threads.

1 thread: 516999 usec vs 40000 usec (> 10*)
2 threads: 8932818 usec vs 78998 usec
3 threads: 16307486 usec vs 121002 usec
4 threads: 22326492 usec vs 159000 usec
5 threads: 27488411 usec vs 196000 usec
6 threads: 33191969 usec vs 237000 usec (> 140*)

One important difference between the legacy implementations and the new one
is that the new one will mostly be +- a spin lock the waiting thread, while
the old one just makes the process wait on the mutex, which is mostly like
suspend the process. 

So there might be some theoretic cases with very long locking times where
the old code would be preferred. But the caching logic where this code is
generally used would really benefit from switching.

[[
* include/arch/win32/apr_arch_misc.h
  (APR_DECLARE_LATE_DLL_FUNC_VOID): Declare APR_DECLARE_LATE_DLL_FUNC
variant with void return.

  (_RTL_SRWLOCK,
   RTL_SRWLOCK,
   PRTL_SRWLOCK,
    RTL_SRWLOCK_INIT,
   SRWLOCK,
   PSRWLOCK,
   SRWLOCK_INIT): Define like windows, for platforms that don't predefine.
  (InitializeSRWLock,
   AcquireSRWLockExclusive,
   AcquireSRWLockShared,
   ReleaseSRWLockExclusive,
   ReleaseSRWLockShared,
   TryAcquireSRWLockExclusive,
   TryAcquireSRWLockShared): Define when not defined by Windows.

* include/arch/win32/apr_arch_thread_rwlock.h
  (apr_thread_rwlock_t): Add union for slim reader writer value.

* locks/win32/thread_rwlock.c:
  (HAVE_NATIVE_SRW): Define when slim writers are (always) available.
  (apr_thread_rwlock_create,
   apr_thread_rwlock_rdlock,
   apr_thread_rwlock_tryrdlock,
   apr_thread_rwlock_wrlock,
   apr_thread_rwlock_trywrlock,
   apr_thread_rwlock_unlock,
   apr_thread_rwlock_destroy): Add slim writer implementation.
]]

Subversion's FSFS in memory cache greatly benefits from this patch, so I
would like to see this fix backported to future APR 1.5/1.6 versions.

	Bert


RE: [Patch] Reader Writer lock performance on Windows

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Bert Huijben [mailto:bert@qqmail.nl]
> Sent: maandag 30 juni 2014 19:34
> To: dev@apr.apache.org
> Cc: stefan2@apache.org; ivan@apache.org
> Subject: [Patch] Reader Writer lock performance on Windows
> 
> 	Hi,

The patch as posted triggers the race condition in
APR_DECLARE_LATE_DLL_FUNC() for which I already posted a patch some time ago
(but which isn't applied yet).

I updated the patch to work around this problem, as without some patch the
apr testsuite tries to require read locks in two threads at exactly the same
time on my machine which in many cases ignores a AcquireSRWLockShared call.

The updated patch makes sure all function pointers are initialized when
creating a reader-writer lock.

	Bert