You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "jinglong.liujl (JIRA)" <ji...@apache.org> on 2011/01/14 18:05:46 UTC
[jira] Created: (HADOOP-7105) [IPC] Improvement of lock mechanism
in Listener and Reader thread
[IPC] Improvement of lock mechanism in Listener and Reader thread
-----------------------------------------------------------------
Key: HADOOP-7105
URL: https://issues.apache.org/jira/browse/HADOOP-7105
Project: Hadoop Common
Issue Type: Improvement
Components: ipc
Affects Versions: 0.21.0
Reporter: jinglong.liujl
Attachments: improveListenerLock.patch
In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
To improve Listener capacity, we make 2 modification.
1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
We have made test,
./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
case 1 : Currently
can not pass. and report
hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
case 2 : tuning back log to 10240
average cost : 1285.72 ms
case 3 : tuning back log to 10240 , and improve lock mechanism in patch
average cost : 941.32 ms
performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-7105) [IPC] Improvement of lock mechanism
in Listener and Reader thread
Posted by "jinglong.liujl (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jinglong.liujl updated HADOOP-7105:
-----------------------------------
Attachment: improveListenerLock.patch
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HADOOP-7105) [IPC] Improvement of lock
mechanism in Listener and Reader thread
Posted by "luoli (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473120#comment-13473120 ]
luoli commented on HADOOP-7105:
-------------------------------
hi longjing ge,I am a little confused...
What is the synchronized in Reader suppose to do original?
If it is to protect the readSelector, then the patch can't protect it any more.
if not, why don't just remove the synchronized in registerChannel?
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock2.patch, improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism
in Listener and Reader thread
Posted by "jinglong.liujl (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992507#comment-12992507 ]
jinglong.liujl commented on HADOOP-7105:
----------------------------------------
1. Firstly, My test is take on baidu's branch and made comperation between with and without this patch.
We use NNThroughputBenchmark, but it has a little different with community trunk. it add nn.throughput.bench.rpcmode in NNThroughputBenchmark to make rpc test. I'll create an extra issue for it.
2. Our dfs.name.dir is single SATA disk, 7 reader threads, backlog is 10240.
./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
with nn.throughput.bench.rpcmode is true (means use rpc to call namenode function)
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism
in Listener and Reader thread
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991551#comment-12991551 ]
Todd Lipcon commented on HADOOP-7105:
-------------------------------------
Actually, looking more closely at NNThroughputBenchmark, it doesn't even use IPC at all, but rather calls the NN directly.
So, I don't understand how this patch should affect the results of that benchmark.
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism
in Listener and Reader thread
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985904#action_12985904 ]
Todd Lipcon commented on HADOOP-7105:
-------------------------------------
I think this patch could cause a missed notify - the Reader could be slow in waking up from readSelector.select() and then not catch the notify() in finishAdd(), right?
Here's an idea - what if Reader had a non-blocking queue of channels waiting to be registered, and it processed those before entering select()? Then the Listener wouldn't ever have to wait on the Reader to process the accept-queue, but there wouldn't be a chance of missed notifies?
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism
in Listener and Reader thread
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991547#comment-12991547 ]
Todd Lipcon commented on HADOOP-7105:
-------------------------------------
I tried to reproduce the benchmark results here and didn't see the same improvement. I'm using 8 reader threads, backlog 10240, and your patch. Also applied HDFS-1597 to fix a bug in edit log syncing throughput.
How have you set up the dfs.name.dir? Is it a single directory or multiple? Is it on a disk with an nvram-backed write cache? My test was on an SSD.
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7105) [IPC] Improvement of lock
mechanism in Listener and Reader thread
Posted by "luoli (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476805#comment-13476805 ]
luoli commented on HADOOP-7105:
-------------------------------
jinglong, which issue? a link please?
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock2.patch, improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7105) [IPC] Improvement of lock
mechanism in Listener and Reader thread
Posted by "jinglong.liujl (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474994#comment-13474994 ]
jinglong.liujl commented on HADOOP-7105:
----------------------------------------
There're duplicate synchronized locks in registerChannel(), one is registerChannel which's in Reader , the other is in socketchannel.register which's in jdk. It's safty to remove one of it.
As Todd's suggestion, we've add a non-blocking queue in rpc to make accept not block by reader. And this patch will be released in another issue.
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock2.patch, improveListenerLock.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-7105) [IPC] Improvement of lock mechanism
in Listener and Reader thread
Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990937#comment-12990937 ]
dhruba borthakur commented on HADOOP-7105:
------------------------------------------
Looks like a very good fix. +1
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-7105) [IPC] Improvement of lock mechanism
in Listener and Reader thread
Posted by "jinglong.liujl (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jinglong.liujl updated HADOOP-7105:
-----------------------------------
Attachment: improveListenerLock2.patch
Thanks Todd's suggestion very much.
To fix it, I make the "adding " mofication in finishAdd() and notify in a waitlock, and keep them as a atomi operation.
And non-blocking queue is a greate idea, but it should refractor the currently RPC framework ,which is not the purpose of this issue.
In fact, this patch can reduce the cost of "Listener wait for Reader". (Only when finishAdd(), listener should wait for Reader). After our test, remove synchronized lock of registerChannel will improve performance of rpc responce for about 20%.
> [IPC] Improvement of lock mechanism in Listener and Reader thread
> -----------------------------------------------------------------
>
> Key: HADOOP-7105
> URL: https://issues.apache.org/jira/browse/HADOOP-7105
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.21.0
> Reporter: jinglong.liujl
> Attachments: improveListenerLock.patch, improveListenerLock2.patch
>
>
> In many client cocurrent access, single thread Listener will become bottleneck. Many client can't be served, and get connection time out.
> To improve Listener capacity, we make 2 modification.
> 1. Tuning ipc.server.listen.queue.size to a larger value to avoid client retry.
> 2. In currently implement, Listener will call registerChannel(), and finishAdd() in Reader, which will request Reader synchronized lock. Listener will cost too many time in waiting for this lock.
> We have made test,
> ./bin/hadoop org.apache.hadoop.hdfs.NNThroughputBenchmark -op create -threads 10000 -files 10000
> case 1 : Currently
> can not pass. and report
> hadoop-rd101.jx.baidu.com/10.65.25.166:59310. Already tried 0 time(s).
> case 2 : tuning back log to 10240
> average cost : 1285.72 ms
> case 3 : tuning back log to 10240 , and improve lock mechanism in patch
> average cost : 941.32 ms
> performance in average cost will improve 26%
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.