You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@harmony.apache.org by Aleksey Shipilev <al...@gmail.com> on 2008/02/13 00:12:13 UTC

[drlvm][startup][performance] Implementing futex'es

Hi, Alexei, all,

Just another idea for startup optimizations pops out of our talk with
Egor Pasko. :)

As you probably know there are many places in VM and JIT that use
locking for safety reasons. Most of this locking is driven by mutexes,
that is, the kernel calls. That's a good option in case of contention,
because such locking will need arbitration (e.g. "who will take the
mutex next"?) from kernel side. But what if that locking is not
contended? Even then we will make the kernel call for trying to catch
the mutex.

Linux has long ago implemented such thing as "fast user-space mutex",
"futex" [1]. Generally it is simple memory region that could be
incremented/decremented atomically. In case of contention futex, of
course, will resort to kernel-side mutex.

That mean we could save precious time using futexes instead of
mutexes: we definitely will save on kernel call time.

AFAIK, current implementation of porting layer has no support for
futexes even on Linux side. And so we might try to implement them for
Windows part and use the Linux-provided futex'es on Linux part. Then
after the implementation of hyfutex_lock/unlock we might try to
migrate performance-significant locks to futexes one-by-one. Profilers
are good directions, maybe anywhere else too.

What do you think?

Thanks,
Aleksey,
ESSD, Intel

[1] http://en.wikipedia.org/wiki/Futex

Re: [drlvm][startup][performance] Implementing futex'es

Posted by Aleksey Shipilev <al...@gmail.com>.

On 13 Feb 2008 12:37:32 +0300, Egor Pasko <eg...@gmail.com> wrote:
> why guess? just run Harmony under strace on your linux box, you will
> see only futexes triggered. Cheers.
Yep, that's what I told - Harmony relies on pthreads, pthreads relies
on futexes.

I'm not the guru in threading and atomics, but my simple
implementation [1] of futexes on Windows gives 2x faster uncontended
locking:

C:\Work\VSWork\Futex\Release>Futex.exe
mutex(): 4281 msecs
futex(): 2062 msecs

I haven't checked this prototype in MT application though.

Thanks,
Aleksey.

[1]
#include "stdafx.h"
#include "windows.h"
#include "time.h"

CRITICAL_SECTION mutex;
CRITICAL_SECTION fallbackMutex;
volatile int thinLock;
volatile bool isContended;

__forceinline void mutex_init() {
	InitializeCriticalSection(&mutex);
}

__forceinline void mutex_lock() {
	EnterCriticalSection(&mutex);
}

__forceinline void mutex_unlock() {
	LeaveCriticalSection(&mutex);
}

__forceinline void futex_init() {
	thinLock = 0;
	isContended = false;
	InitializeCriticalSection(&fallbackMutex);
}

__forceinline void futex_lock() {
	volatile void* thinLockPtr = &thinLock;
	int result;

	__asm {
		mov ecx, thinLockPtr
		mov eax, 0			
		mov edx, 1
		lock cmpxchg [ecx], edx		
		mov result, eax
	}
	if (result == 0) {
		return;
	} else {
		printf("falling back on lock: result = %d\n", result);
		EnterCriticalSection(&fallbackMutex);
		isContended = true;
	}
}

__forceinline void futex_unlock() {
	if (!isContended) {
		thinLock = 0;
	} else {
		printf("falling back on unlock\n");
		LeaveCriticalSection(&fallbackMutex);	
	}
}

int _tmain(int argc, _TCHAR* argv[])
{
	mutex_init();
	futex_init();

	time_t t1, t2;
	int count = 100000000;

	t1 = clock();
	for(int c = 0; c < count; c++) {
		mutex_lock();
		mutex_unlock();
	}
	t2 = clock();

	printf("mutex(): %d msecs\n", (t2-t1));

	t1 = clock();
	for(int c = 0; c < count; c++) {
		futex_lock();
		futex_unlock();
	}
	t2 = clock();

	printf("futex(): %d msecs\n", (t2-t1));

	return 0;
}

Re: [drlvm][startup][performance] Implementing futex'es

Posted by Egor Pasko <eg...@gmail.com>.

On the 0x3E9 day of Apache Harmony Aleksey Shipilev wrote:
> Alexei,
> 
> I've also heard that currently pthread mutexes are built on base of
> futexes for sake of performance. I don't know what's going on in
> EnterCriticalSection on Windows though. That's interesting... I'll
> measure the performance with some microbenchmark :)

why guess? just run Harmony under strace on your linux box, you will
see only futexes triggered. Cheers.

> Thanks,
> Aleksey.
> 
> On Feb 13, 2008 3:38 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > Hello Aleksey,
> > Is not the current mutex implementation no less efficient than futex?
> > I've heard that mutexes try to spin before switching to kernel, and
> > Windows critical sections do the same trick.
> >
> > Thanks!
> >
> >
> > On Feb 13, 2008 2:12 AM, Aleksey Shipilev <al...@gmail.com> wrote:
> > > Hi, Alexei, all,
> > >
> > > Just another idea for startup optimizations pops out of our talk with
> > > Egor Pasko. :)
> > >
> > > As you probably know there are many places in VM and JIT that use
> > > locking for safety reasons. Most of this locking is driven by mutexes,
> > > that is, the kernel calls. That's a good option in case of contention,
> > > because such locking will need arbitration (e.g. "who will take the
> > > mutex next"?) from kernel side. But what if that locking is not
> > > contended? Even then we will make the kernel call for trying to catch
> > > the mutex.
> > >
> > > Linux has long ago implemented such thing as "fast user-space mutex",
> > > "futex" [1]. Generally it is simple memory region that could be
> > > incremented/decremented atomically. In case of contention futex, of
> > > course, will resort to kernel-side mutex.
> > >
> > > That mean we could save precious time using futexes instead of
> > > mutexes: we definitely will save on kernel call time.
> > >
> > > AFAIK, current implementation of porting layer has no support for
> > > futexes even on Linux side. And so we might try to implement them for
> > > Windows part and use the Linux-provided futex'es on Linux part. Then
> > > after the implementation of hyfutex_lock/unlock we might try to
> > > migrate performance-significant locks to futexes one-by-one. Profilers
> > > are good directions, maybe anywhere else too.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Aleksey,
> > > ESSD, Intel
> > >
> > > [1] http://en.wikipedia.org/wiki/Futex
> > >
> >
> >
> >
> > --
> > With best regards,
> > Alexei
> >
> 

-- 
Egor Pasko

Re: [drlvm][startup][performance] Implementing futex'es

Posted by Aleksey Shipilev <al...@gmail.com>.

Alexei,

I've also heard that currently pthread mutexes are built on base of
futexes for sake of performance. I don't know what's going on in
EnterCriticalSection on Windows though. That's interesting... I'll
measure the performance with some microbenchmark :)

Thanks,
Aleksey.

On Feb 13, 2008 3:38 AM, Alexei Fedotov <al...@gmail.com> wrote:
> Hello Aleksey,
> Is not the current mutex implementation no less efficient than futex?
> I've heard that mutexes try to spin before switching to kernel, and
> Windows critical sections do the same trick.
>
> Thanks!
>
>
> On Feb 13, 2008 2:12 AM, Aleksey Shipilev <al...@gmail.com> wrote:
> > Hi, Alexei, all,
> >
> > Just another idea for startup optimizations pops out of our talk with
> > Egor Pasko. :)
> >
> > As you probably know there are many places in VM and JIT that use
> > locking for safety reasons. Most of this locking is driven by mutexes,
> > that is, the kernel calls. That's a good option in case of contention,
> > because such locking will need arbitration (e.g. "who will take the
> > mutex next"?) from kernel side. But what if that locking is not
> > contended? Even then we will make the kernel call for trying to catch
> > the mutex.
> >
> > Linux has long ago implemented such thing as "fast user-space mutex",
> > "futex" [1]. Generally it is simple memory region that could be
> > incremented/decremented atomically. In case of contention futex, of
> > course, will resort to kernel-side mutex.
> >
> > That mean we could save precious time using futexes instead of
> > mutexes: we definitely will save on kernel call time.
> >
> > AFAIK, current implementation of porting layer has no support for
> > futexes even on Linux side. And so we might try to implement them for
> > Windows part and use the Linux-provided futex'es on Linux part. Then
> > after the implementation of hyfutex_lock/unlock we might try to
> > migrate performance-significant locks to futexes one-by-one. Profilers
> > are good directions, maybe anywhere else too.
> >
> > What do you think?
> >
> > Thanks,
> > Aleksey,
> > ESSD, Intel
> >
> > [1] http://en.wikipedia.org/wiki/Futex
> >
>
>
>
> --
> With best regards,
> Alexei
>

Re: [drlvm][startup][performance] Implementing futex'es

Posted by Alexei Fedotov <al...@gmail.com>.

Hello Aleksey,
Is not the current mutex implementation no less efficient than futex?
I've heard that mutexes try to spin before switching to kernel, and
Windows critical sections do the same trick.

Thanks!

On Feb 13, 2008 2:12 AM, Aleksey Shipilev <al...@gmail.com> wrote:
> Hi, Alexei, all,
>
> Just another idea for startup optimizations pops out of our talk with
> Egor Pasko. :)
>
> As you probably know there are many places in VM and JIT that use
> locking for safety reasons. Most of this locking is driven by mutexes,
> that is, the kernel calls. That's a good option in case of contention,
> because such locking will need arbitration (e.g. "who will take the
> mutex next"?) from kernel side. But what if that locking is not
> contended? Even then we will make the kernel call for trying to catch
> the mutex.
>
> Linux has long ago implemented such thing as "fast user-space mutex",
> "futex" [1]. Generally it is simple memory region that could be
> incremented/decremented atomically. In case of contention futex, of
> course, will resort to kernel-side mutex.
>
> That mean we could save precious time using futexes instead of
> mutexes: we definitely will save on kernel call time.
>
> AFAIK, current implementation of porting layer has no support for
> futexes even on Linux side. And so we might try to implement them for
> Windows part and use the Linux-provided futex'es on Linux part. Then
> after the implementation of hyfutex_lock/unlock we might try to
> migrate performance-significant locks to futexes one-by-one. Profilers
> are good directions, maybe anywhere else too.
>
> What do you think?
>
> Thanks,
> Aleksey,
> ESSD, Intel
>
> [1] http://en.wikipedia.org/wiki/Futex
>



-- 
With best regards,
Alexei

Re: [drlvm][startup][performance] Implementing futex'es

Posted by Pavel Pervov <pm...@gmail.com>.

Aleksey,

As far as I understand OS internals, both Windows and Linux does not
make "syscall"s unless contension happens. So, pthread_mutex_lock (or
EnterCriticalSection) in case of single thread is just an atomic
increment.

WBR,
    Pavel.

On 2/13/08, Aleksey Shipilev <al...@gmail.com> wrote:
> Hi, Alexei, all,
>
> Just another idea for startup optimizations pops out of our talk with
> Egor Pasko. :)
>
> As you probably know there are many places in VM and JIT that use
> locking for safety reasons. Most of this locking is driven by mutexes,
> that is, the kernel calls. That's a good option in case of contention,
> because such locking will need arbitration (e.g. "who will take the
> mutex next"?) from kernel side. But what if that locking is not
> contended? Even then we will make the kernel call for trying to catch
> the mutex.
>
> Linux has long ago implemented such thing as "fast user-space mutex",
> "futex" [1]. Generally it is simple memory region that could be
> incremented/decremented atomically. In case of contention futex, of
> course, will resort to kernel-side mutex.
>
> That mean we could save precious time using futexes instead of
> mutexes: we definitely will save on kernel call time.
>
> AFAIK, current implementation of porting layer has no support for
> futexes even on Linux side. And so we might try to implement them for
> Windows part and use the Linux-provided futex'es on Linux part. Then
> after the implementation of hyfutex_lock/unlock we might try to
> migrate performance-significant locks to futexes one-by-one. Profilers
> are good directions, maybe anywhere else too.
>
> What do you think?
>
> Thanks,
> Aleksey,
> ESSD, Intel
>
> [1] http://en.wikipedia.org/wiki/Futex
>