You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@harmony.apache.org by "Sergey Kuksenko (JIRA)" <ji...@apache.org> on 2007/05/29 14:48:16 UTC

[jira] Created: (HARMONY-3995) [drlvm][threading][performance] Performance improvement for uncontended synchronization.

[drlvm][threading][performance] Performance improvement for uncontended synchronization.
----------------------------------------------------------------------------------------

                 Key: HARMONY-3995
                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
             Project: Harmony
          Issue Type: Improvement
          Components: DRLVM
            Reporter: Sergey Kuksenko


It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
- If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.

Some datails about attached microbenchmark. Here I emulates the following scenario:
- the main thread creates a bunch of data (objects with synchronized access) 
- the main thread separates all data for 4 "processing" threads
- the main thread runs 4 processing threads and waits results from them.

The number is amount of synchronized operations divided by 10. (then more then better)
For example:
synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
non-synchronized OPS = 19891    - 
The last number shows speed of the same operations without any synchronization.
Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
Here is some measurements for Sun1.6 and DRLVM on the microbench:
1. Sun1.6
CMDLINE:  java -server -jar synchTest.jar

Measure phase; threads(4); time(180)
synchronized OPS     = 7886
non-synchronized OPS = 59907

2. DRLVM 
2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar

synchronized OPS     = 7939
non-synchronized OPS = 50985

2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar

synchronized OPS     = 25735
non-synchronized OPS = 50998

Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HARMONY-3995) [drlvm][thread][performance] Performance improvement for uncontended synchronization.

Posted by "weldon washburn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

weldon washburn reassigned HARMONY-3995:
----------------------------------------

    Assignee: weldon washburn

> [drlvm][thread][performance] Performance improvement for uncontended synchronization.
> -------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>            Assignee: weldon washburn
>         Attachments: soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HARMONY-3995) [drlvm][threading][performance] Performance improvement for uncontended synchronization.

Posted by "Sergey Kuksenko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Kuksenko updated HARMONY-3995:
-------------------------------------

    Attachment: synchTest.zip

microbenchmark.

> [drlvm][threading][performance] Performance improvement for uncontended synchronization.
> ----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>         Attachments: soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HARMONY-3995) [drlvm][thread][performance] Performance improvement for uncontended synchronization.

Posted by "Sergey Kuksenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506493 ] 

Sergey Kuksenko commented on HARMONY-3995:
------------------------------------------

Verified.

> [drlvm][thread][performance] Performance improvement for uncontended synchronization.
> -------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>            Assignee: weldon washburn
>         Attachments: rough_ideas.diff, soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HARMONY-3995) [drlvm][threading][performance] Performance improvement for uncontended synchronization.

Posted by "Sergey Kuksenko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Kuksenko updated HARMONY-3995:
-------------------------------------

    Attachment: soft_unreserv_2.patch

The patch

> [drlvm][threading][performance] Performance improvement for uncontended synchronization.
> ----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>         Attachments: soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HARMONY-3995) [drlvm][thread][performance] Performance improvement for uncontended synchronization.

Posted by "weldon washburn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504810 ] 

weldon washburn commented on HARMONY-3995:
------------------------------------------

hmm... it seems a week has gone by since I posted rough_ideas.diff.  It seems I am the only one interested in cleaning up this bad code.

The only way I will commit soft_unreserv_2.patch is if we all agree not to make any additional performance patches until the underlying code is cleaned up.

> [drlvm][thread][performance] Performance improvement for uncontended synchronization.
> -------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>            Assignee: weldon washburn
>         Attachments: rough_ideas.diff, soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (HARMONY-3995) [drlvm][thread][performance] Performance improvement for uncontended synchronization.

Posted by "weldon washburn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

weldon washburn closed HARMONY-3995.
------------------------------------

    Resolution: Fixed

The deal on committing this patch is that no bugs will be reported in lock reservation code until after Thread Manager code cleanup.  If there are any problems with this code, lock reservation will be turned off.

> [drlvm][thread][performance] Performance improvement for uncontended synchronization.
> -------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>            Assignee: weldon washburn
>         Attachments: rough_ideas.diff, soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HARMONY-3995) [drlvm][thread][performance] Performance improvement for uncontended synchronization.

Posted by "weldon washburn (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

weldon washburn updated HARMONY-3995:
-------------------------------------

    Attachment: rough_ideas.diff

> [drlvm][thread][performance] Performance improvement for uncontended synchronization.
> -------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>            Assignee: weldon washburn
>         Attachments: rough_ideas.diff, soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HARMONY-3995) [drlvm][thread][performance] Performance improvement for uncontended synchronization.

Posted by "weldon washburn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502675 ] 

weldon washburn commented on HARMONY-3995:
------------------------------------------

I was surprised to see that #ifdef  LOCK_RESERVATION was included in the debug build.  I looked at the code the patch actually modifies.  In specific, unreserve_lock().  This routine looks like it has several race conditions that would be good to clean up first.   Otherwise we run the risk that a new performance patch will disturb the race conditions. Attached is a rough cut at cleaning this up.  It does not compile yet.  The purpose is to give us framework to talk about the race conditions as well as the proposed performance mods.

> [drlvm][thread][performance] Performance improvement for uncontended synchronization.
> -------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>            Assignee: weldon washburn
>         Attachments: rough_ideas.diff, soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HARMONY-3995) [drlvm][thread][performance] Performance improvement for uncontended synchronization.

Posted by "Gregory Shimansky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HARMONY-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gregory Shimansky updated HARMONY-3995:
---------------------------------------

    Summary: [drlvm][thread][performance] Performance improvement for uncontended synchronization.  (was: [drlvm][threading][performance] Performance improvement for uncontended synchronization.)

> [drlvm][thread][performance] Performance improvement for uncontended synchronization.
> -------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3995
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3995
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>            Reporter: Sergey Kuksenko
>         Attachments: soft_unreserv_2.patch, synchTest.zip
>
>
> It is fact that even simple atomic instructions (lock cmpxchg, etc...) have a big influence on performance especially for multyprocessors systems. DRLVM uses reservation locks scheme for uncontended synchronizarion. Here is in case of local (from the single thread) and uncontended synchronization all monitor_enter and monitor_enter primitives are executed without atomic instructions. In case of non-local (from several threads) and still uncontended synchronization DRLVM uses thin-locks scheme (with atomic instructions).  Lock unreservation is rather expensive operation because of necessity to stop the owner thread. That is why DRLVM uses unreservation only once - for transferring to thin lock. From the other side there is a common situation which are not covered by the current scheme - it is transferring locality - when after several synchronizations from one thread data are tranferred to another thread and locality (access from one thread) is continued in new thread. 
> The attached patch provide improvement in case of tranferring locality. The following heuristics is used:
> - If at the moment of unreservation the owner thread is already stopped then the lock will be unreserved but won't be switched to thin lock state. The lock stays in reservation mode and will be reserved for the next thread tryied to acquire it. In others words if unreservation costs nothing (thread is already stopped (in wait, sleep, terminated ... state)) then DRLVM unreserve the lock but save it for future reservations.
> There are a bunch of applications where it gives a performance boost. Also I've attached a microbenchmark which shows the real performance boost of the patch. From the other site we need to do additional investigation where the patch gives boost. That is why the patch doesn't change the current unreservation. The patch introduses new option "-XX:thread.soft_unreservation" which is turned off by default. Turning it on allows to use new unreservation (soft) scheme.
> Some datails about attached microbenchmark. Here I emulates the following scenario:
> - the main thread creates a bunch of data (objects with synchronized access) 
> - the main thread separates all data for 4 "processing" threads
> - the main thread runs 4 processing threads and waits results from them.
> The number is amount of synchronized operations divided by 10. (then more then better)
> For example:
> synchronized OPS     = 7147           - Here is we have ~71470 synch ops per second.
> non-synchronized OPS = 19891    - 
> The last number shows speed of the same operations without any synchronization.
> Ratio between synchronized and non-synchronized OPS shows the dagradation caused by synchronization (even uncontended).
> Here is some measurements for Sun1.6 and DRLVM on the microbench:
> 1. Sun1.6
> CMDLINE:  java -server -jar synchTest.jar
> Measure phase; threads(4); time(180)
> synchronized OPS     = 7886
> non-synchronized OPS = 59907
> 2. DRLVM 
> 2.1 java -XX:thread.soft_unreservation=false -Xem:server -jar synchTest.jar
> synchronized OPS     = 7939
> non-synchronized OPS = 50985
> 2.1 java -XX:thread.soft_unreservation=true -Xem:server -jar synchTest.jar
> synchronized OPS     = 25735
> non-synchronized OPS = 50998
> Thus turning the option on gives DRLVM speedup of 3.2x times. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.