You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "jinglong.liujl (JIRA)" <ji...@apache.org> on 2011/01/14 18:05:46 UTC

[jira] Created: (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

[IPC] Improvement of lock mechanism in Listener and Reader thread
-----------------------------------------------------------------

                 Key: HADOOP-7105
                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
             Project: Hadoop Common
          Issue Type: Improvement
          Components: ipc
    Affects Versions: 0.21.0
            Reporter: jinglong.liujl
         Attachments: improveListenerLock.patch

In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
To improve Listener capacity, we make 2 modification.
1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.

We have made test, 

./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000

case 1 : Currently 
can not pass. and report 
hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).

case 2 : tuning back log to 10240
average cost : 1285.72 ms

case 3 : tuning back log to 10240 , and improve lock mechanism in patch
average cost :  941.32 ms


performance in average cost will improve 26%



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "jinglong.liujl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jinglong.liujl updated HADOOP-7105:
-----------------------------------

    Attachment: improveListenerLock.patch

> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "luoli (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473120#comment-13473120 ] 

luoli commented on HADOOP-7105:
-------------------------------

hi longjing ge,I am a little confused...
What is the synchronized in Reader suppose to do original? 
If it is to protect the readSelector, then the patch can't protect it any more. 
if not, why don't just remove the synchronized in registerChannel?

                
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock2.patch, improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "jinglong.liujl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992507#comment-12992507 ] 

jinglong.liujl commented on HADOOP-7105:
----------------------------------------

1. Firstly, My test is take on baidu's branch and made comperation between with and without this patch.
We use  NNThroughputBenchmark, but it has a little different with community trunk. it add nn.throughput.bench.rpcmode in NNThroughputBenchmark to make rpc test. I'll create an extra issue for it.

2. Our dfs.name.dir is single SATA disk， 7 reader threads, backlog is 10240.
./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
with nn.throughput.bench.rpcmode is true (means use rpc to call namenode function)




> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991551#comment-12991551 ] 

Todd Lipcon commented on HADOOP-7105:
-------------------------------------

Actually, looking more closely at NNThroughputBenchmark, it doesn't even use IPC at all, but rather calls the NN directly.

So, I don't understand how this patch should affect the results of that benchmark.

> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985904#action_12985904 ] 

Todd Lipcon commented on HADOOP-7105:
-------------------------------------

I think this patch could cause a missed notify - the Reader could be slow in waking up from readSelector.select() and then not catch the notify() in finishAdd(), right?

Here's an idea - what if Reader had a non-blocking queue of channels waiting to be registered, and it processed those before entering select()? Then the Listener wouldn't ever have to wait on the Reader to process the accept-queue, but there wouldn't be a chance of missed notifies?

> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991547#comment-12991547 ] 

Todd Lipcon commented on HADOOP-7105:
-------------------------------------

I tried to reproduce the benchmark results here and didn't see the same improvement. I'm using 8 reader threads, backlog 10240, and your patch. Also applied HDFS-1597 to fix a bug in edit log syncing throughput.

How have you set up the dfs.name.dir? Is it a single directory or multiple? Is it on a disk with an nvram-backed write cache? My test was on an SSD.

> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "luoli (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476805#comment-13476805 ] 

luoli commented on HADOOP-7105:
-------------------------------

jinglong, which issue? a link please?
                
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock2.patch, improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "jinglong.liujl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474994#comment-13474994 ] 

jinglong.liujl commented on HADOOP-7105:
----------------------------------------

There're duplicate synchronized locks in registerChannel(), one is registerChannel which's in Reader , the other is in socketchannel.register which's in jdk. It's safty to remove one of it.
As Todd's suggestion, we've add a non-blocking queue in rpc to make accept not block by reader. And this patch will be released in another issue.
                
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock2.patch, improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990937#comment-12990937 ] 

dhruba borthakur commented on HADOOP-7105:
------------------------------------------

Looks like a very good fix. +1

> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7105) [IPC] Improvement of lock mechanism in Listener and Reader thread

Posted by "jinglong.liujl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jinglong.liujl updated HADOOP-7105:
-----------------------------------

    Attachment: improveListenerLock2.patch

Thanks Todd's suggestion very much. 

To fix it， I make the "adding " mofication in finishAdd() and notify in a waitlock, and keep them as a atomi operation.

And  non-blocking queue  is a greate idea, but it should refractor the currently RPC framework ，which is not the purpose of this issue.
In fact, this patch can reduce the cost of  "Listener wait for  Reader". (Only when finishAdd(), listener should wait for Reader). After our test, remove synchronized lock of registerChannel will improve performance of rpc responce for about 20%.

> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: jinglong.liujl
>         Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1.  Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test, 
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark  -op create -threads 10000 -files 10000
> case 1 : Currently 
> can not pass. and report 
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost :  941.32 ms
> performance in average cost will improve 26%

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.