You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by lei liu <li...@gmail.com> on 2013/08/12 12:12:39 UTC

when Standby Namenode is doing checkpoint, the Active NameNode is slow.

When Standby Namenode is doing checkpoint,  upload the image file to Active
NameNode, the Active NameNode is very slow. What is reason result to the
Active NameNode is slow?


Thanks,

LiuLei

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
Hi Jitendra,

I don't use the compression parameter.

My network card is 100M/s, and  I set the "dfs.image.transfer.bandwidthPerSec."
to 50M, so I think Active NameNode  still has 50M bandwidth to be used to
handle RPC request, why the OPS dropped by 50%?


2013/8/15 Jitendra Yadav <je...@gmail.com>

> Hi,
> Looks like you got some pace, did you also tried with compression
> parameter? I think you will get more optimization with it. Also file
> transfer speed depends on our network bandwidth between PNN/SNN and network
> traffic b/w nodes.What's your network conf?
> Thanks
>
>
> On Wed, Aug 14, 2013 at 11:39 AM, lei liu <li...@gmail.com> wrote:
>
>>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
>> performance is below:
>>
>> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
>> speed:1111
>> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
>> speed:1044
>> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
>> speed:1109
>> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
>> speed:539*
>> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
>> speed:495
>> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
>> speed:438
>> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
>> speed:447*
>> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
>> speed:665*
>> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
>> speed:1056
>> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
>> speed:1048
>> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
>> speed:1053
>> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
>> speed:1042
>> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
>> speed:1051
>> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
>> speed:1033
>> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
>> speed:1042
>> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
>> speed:1137
>> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
>> speed:1071
>> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
>> speed:1133
>> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
>> speed:1108
>> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
>> speed:1056
>> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
>> speed:1074
>> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
>> speed:1125
>> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
>> speed:1107
>> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
>> speed:1077
>> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
>> speed:1029
>> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
>> speed:1095
>> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
>> speed:1132
>> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
>> speed:1108
>> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
>> speed:980
>> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
>> speed:1097
>> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
>> speed:1086
>> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
>> speed:1099
>> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
>> speed:1024
>> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
>> speed:960
>> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
>> speed:1013
>> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
>> speed:1097
>> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
>> speed:1085
>> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
>> speed:1049
>> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
>> speed:839
>> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
>> speed:464
>> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
>> speed:483
>> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
>> speed:447*
>> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
>> speed:504*
>> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
>> speed:809
>>
>>
>>
>> The there are below info in Active NameNode:
>> 2013-08-14 12:44:47,301 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
>> to
>> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
>> 2013-08-14 12:48:57,529 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
>> 250.23s at 10280.59 KB/s*
>> 2013-08-14 12:48:57,530 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
>> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>>
>>
>> The there are below info in StadnbyNameNode:
>> 2013-08-14 12:43:57,924 INFO
>> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
>> checkpoint because there have been 2421083 txns since the last checkpoint,
>> which exceeds the configured threshold 1000000
>> 2013-08-14 12:43:57,925 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
>> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
>> using no compression
>> 2013-08-14 12:48:58,044 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
>> 250.75s at 0.00 KB/s
>> 2013-08-14 12:48:58,045 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
>> txid 655178418 to namenode at 10.232.98.77:20021
>>
>>
>>
>> *When Active NameNode download the fsimage, the OPS is less than 500, the
>> OPS dropped by 50%.*
>>
>> The "checkpoint.png" attachment is monitoring info in ganglia.
>>
>>
>> I have two questions:
>>
>>    - *first question:*  The Active NameNode's bandwidth is occupied by
>>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>>    RPC request, why the OPS dropped by 50%?
>>
>>
>>    - *second question:* The fsimage file is 2634222089 bytes, Active
>>    NameNode spend 240s to download the file. Yhe
>>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>>    time should is about 50s.
>>
>>
>> Thansk,
>>
>> LiuLei
>>
>>
>>
>>
>>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
Hi Jitendra,

I don't use the compression parameter.

My network card is 100M/s, and  I set the "dfs.image.transfer.bandwidthPerSec."
to 50M, so I think Active NameNode  still has 50M bandwidth to be used to
handle RPC request, why the OPS dropped by 50%?


2013/8/15 Jitendra Yadav <je...@gmail.com>

> Hi,
> Looks like you got some pace, did you also tried with compression
> parameter? I think you will get more optimization with it. Also file
> transfer speed depends on our network bandwidth between PNN/SNN and network
> traffic b/w nodes.What's your network conf?
> Thanks
>
>
> On Wed, Aug 14, 2013 at 11:39 AM, lei liu <li...@gmail.com> wrote:
>
>>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
>> performance is below:
>>
>> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
>> speed:1111
>> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
>> speed:1044
>> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
>> speed:1109
>> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
>> speed:539*
>> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
>> speed:495
>> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
>> speed:438
>> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
>> speed:447*
>> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
>> speed:665*
>> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
>> speed:1056
>> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
>> speed:1048
>> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
>> speed:1053
>> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
>> speed:1042
>> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
>> speed:1051
>> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
>> speed:1033
>> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
>> speed:1042
>> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
>> speed:1137
>> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
>> speed:1071
>> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
>> speed:1133
>> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
>> speed:1108
>> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
>> speed:1056
>> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
>> speed:1074
>> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
>> speed:1125
>> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
>> speed:1107
>> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
>> speed:1077
>> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
>> speed:1029
>> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
>> speed:1095
>> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
>> speed:1132
>> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
>> speed:1108
>> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
>> speed:980
>> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
>> speed:1097
>> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
>> speed:1086
>> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
>> speed:1099
>> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
>> speed:1024
>> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
>> speed:960
>> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
>> speed:1013
>> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
>> speed:1097
>> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
>> speed:1085
>> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
>> speed:1049
>> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
>> speed:839
>> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
>> speed:464
>> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
>> speed:483
>> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
>> speed:447*
>> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
>> speed:504*
>> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
>> speed:809
>>
>>
>>
>> The there are below info in Active NameNode:
>> 2013-08-14 12:44:47,301 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
>> to
>> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
>> 2013-08-14 12:48:57,529 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
>> 250.23s at 10280.59 KB/s*
>> 2013-08-14 12:48:57,530 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
>> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>>
>>
>> The there are below info in StadnbyNameNode:
>> 2013-08-14 12:43:57,924 INFO
>> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
>> checkpoint because there have been 2421083 txns since the last checkpoint,
>> which exceeds the configured threshold 1000000
>> 2013-08-14 12:43:57,925 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
>> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
>> using no compression
>> 2013-08-14 12:48:58,044 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
>> 250.75s at 0.00 KB/s
>> 2013-08-14 12:48:58,045 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
>> txid 655178418 to namenode at 10.232.98.77:20021
>>
>>
>>
>> *When Active NameNode download the fsimage, the OPS is less than 500, the
>> OPS dropped by 50%.*
>>
>> The "checkpoint.png" attachment is monitoring info in ganglia.
>>
>>
>> I have two questions:
>>
>>    - *first question:*  The Active NameNode's bandwidth is occupied by
>>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>>    RPC request, why the OPS dropped by 50%?
>>
>>
>>    - *second question:* The fsimage file is 2634222089 bytes, Active
>>    NameNode spend 240s to download the file. Yhe
>>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>>    time should is about 50s.
>>
>>
>> Thansk,
>>
>> LiuLei
>>
>>
>>
>>
>>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
Hi Jitendra,

I don't use the compression parameter.

My network card is 100M/s, and  I set the "dfs.image.transfer.bandwidthPerSec."
to 50M, so I think Active NameNode  still has 50M bandwidth to be used to
handle RPC request, why the OPS dropped by 50%?


2013/8/15 Jitendra Yadav <je...@gmail.com>

> Hi,
> Looks like you got some pace, did you also tried with compression
> parameter? I think you will get more optimization with it. Also file
> transfer speed depends on our network bandwidth between PNN/SNN and network
> traffic b/w nodes.What's your network conf?
> Thanks
>
>
> On Wed, Aug 14, 2013 at 11:39 AM, lei liu <li...@gmail.com> wrote:
>
>>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
>> performance is below:
>>
>> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
>> speed:1111
>> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
>> speed:1044
>> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
>> speed:1109
>> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
>> speed:539*
>> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
>> speed:495
>> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
>> speed:438
>> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
>> speed:447*
>> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
>> speed:665*
>> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
>> speed:1056
>> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
>> speed:1048
>> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
>> speed:1053
>> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
>> speed:1042
>> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
>> speed:1051
>> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
>> speed:1033
>> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
>> speed:1042
>> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
>> speed:1137
>> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
>> speed:1071
>> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
>> speed:1133
>> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
>> speed:1108
>> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
>> speed:1056
>> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
>> speed:1074
>> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
>> speed:1125
>> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
>> speed:1107
>> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
>> speed:1077
>> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
>> speed:1029
>> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
>> speed:1095
>> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
>> speed:1132
>> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
>> speed:1108
>> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
>> speed:980
>> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
>> speed:1097
>> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
>> speed:1086
>> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
>> speed:1099
>> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
>> speed:1024
>> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
>> speed:960
>> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
>> speed:1013
>> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
>> speed:1097
>> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
>> speed:1085
>> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
>> speed:1049
>> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
>> speed:839
>> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
>> speed:464
>> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
>> speed:483
>> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
>> speed:447*
>> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
>> speed:504*
>> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
>> speed:809
>>
>>
>>
>> The there are below info in Active NameNode:
>> 2013-08-14 12:44:47,301 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
>> to
>> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
>> 2013-08-14 12:48:57,529 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
>> 250.23s at 10280.59 KB/s*
>> 2013-08-14 12:48:57,530 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
>> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>>
>>
>> The there are below info in StadnbyNameNode:
>> 2013-08-14 12:43:57,924 INFO
>> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
>> checkpoint because there have been 2421083 txns since the last checkpoint,
>> which exceeds the configured threshold 1000000
>> 2013-08-14 12:43:57,925 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
>> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
>> using no compression
>> 2013-08-14 12:48:58,044 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
>> 250.75s at 0.00 KB/s
>> 2013-08-14 12:48:58,045 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
>> txid 655178418 to namenode at 10.232.98.77:20021
>>
>>
>>
>> *When Active NameNode download the fsimage, the OPS is less than 500, the
>> OPS dropped by 50%.*
>>
>> The "checkpoint.png" attachment is monitoring info in ganglia.
>>
>>
>> I have two questions:
>>
>>    - *first question:*  The Active NameNode's bandwidth is occupied by
>>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>>    RPC request, why the OPS dropped by 50%?
>>
>>
>>    - *second question:* The fsimage file is 2634222089 bytes, Active
>>    NameNode spend 240s to download the file. Yhe
>>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>>    time should is about 50s.
>>
>>
>> Thansk,
>>
>> LiuLei
>>
>>
>>
>>
>>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
Hi Jitendra,

I don't use the compression parameter.

My network card is 100M/s, and  I set the "dfs.image.transfer.bandwidthPerSec."
to 50M, so I think Active NameNode  still has 50M bandwidth to be used to
handle RPC request, why the OPS dropped by 50%?


2013/8/15 Jitendra Yadav <je...@gmail.com>

> Hi,
> Looks like you got some pace, did you also tried with compression
> parameter? I think you will get more optimization with it. Also file
> transfer speed depends on our network bandwidth between PNN/SNN and network
> traffic b/w nodes.What's your network conf?
> Thanks
>
>
> On Wed, Aug 14, 2013 at 11:39 AM, lei liu <li...@gmail.com> wrote:
>
>>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
>> performance is below:
>>
>> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
>> speed:1111
>> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
>> speed:1044
>> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
>> speed:1109
>> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
>> speed:539*
>> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
>> speed:495
>> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
>> speed:438
>> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
>> speed:447*
>> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
>> speed:665*
>> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
>> speed:1056
>> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
>> speed:1048
>> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
>> speed:1053
>> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
>> speed:1042
>> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
>> speed:1051
>> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
>> speed:1033
>> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
>> speed:1042
>> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
>> speed:1137
>> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
>> speed:1071
>> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
>> speed:1133
>> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
>> speed:1108
>> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
>> speed:1056
>> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
>> speed:1074
>> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
>> speed:1125
>> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
>> speed:1107
>> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
>> speed:1077
>> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
>> speed:1029
>> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
>> speed:1095
>> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
>> speed:1132
>> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
>> speed:1108
>> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
>> speed:980
>> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
>> speed:1097
>> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
>> speed:1086
>> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
>> speed:1099
>> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
>> speed:1024
>> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
>> speed:960
>> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
>> speed:1013
>> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
>> speed:1097
>> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
>> speed:1085
>> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
>> speed:1049
>> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
>> speed:839
>> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
>> speed:464
>> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
>> speed:483
>> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
>> speed:447*
>> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
>> speed:504*
>> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
>> speed:809
>>
>>
>>
>> The there are below info in Active NameNode:
>> 2013-08-14 12:44:47,301 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
>> to
>> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
>> 2013-08-14 12:48:57,529 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
>> 250.23s at 10280.59 KB/s*
>> 2013-08-14 12:48:57,530 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
>> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>>
>>
>> The there are below info in StadnbyNameNode:
>> 2013-08-14 12:43:57,924 INFO
>> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
>> checkpoint because there have been 2421083 txns since the last checkpoint,
>> which exceeds the configured threshold 1000000
>> 2013-08-14 12:43:57,925 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
>> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
>> using no compression
>> 2013-08-14 12:48:58,044 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
>> 250.75s at 0.00 KB/s
>> 2013-08-14 12:48:58,045 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
>> txid 655178418 to namenode at 10.232.98.77:20021
>>
>>
>>
>> *When Active NameNode download the fsimage, the OPS is less than 500, the
>> OPS dropped by 50%.*
>>
>> The "checkpoint.png" attachment is monitoring info in ganglia.
>>
>>
>> I have two questions:
>>
>>    - *first question:*  The Active NameNode's bandwidth is occupied by
>>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>>    RPC request, why the OPS dropped by 50%?
>>
>>
>>    - *second question:* The fsimage file is 2634222089 bytes, Active
>>    NameNode spend 240s to download the file. Yhe
>>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>>    time should is about 50s.
>>
>>
>> Thansk,
>>
>> LiuLei
>>
>>
>>
>>
>>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,
Looks like you got some pace, did you also tried with compression
parameter? I think you will get more optimization with it. Also file
transfer speed depends on our network bandwidth between PNN/SNN and network
traffic b/w nodes.What's your network conf?
Thanks

On Wed, Aug 14, 2013 at 11:39 AM, lei liu <li...@gmail.com> wrote:

>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
> performance is below:
>
> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
> speed:1111
> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
> speed:1044
> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
> speed:1109
> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
> speed:539*
> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
> speed:495
> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
> speed:438
> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
> speed:447*
> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
> speed:665*
> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
> speed:1056
> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
> speed:1048
> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
> speed:1053
> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
> speed:1042
> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
> speed:1051
> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
> speed:1033
> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
> speed:1042
> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
> speed:1137
> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
> speed:1071
> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
> speed:1133
> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
> speed:1108
> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
> speed:1056
> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
> speed:1074
> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
> speed:1125
> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
> speed:1107
> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
> speed:1077
> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
> speed:1029
> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
> speed:1095
> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
> speed:1132
> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
> speed:1108
> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
> speed:980
> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
> speed:1097
> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
> speed:1086
> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
> speed:1099
> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
> speed:1024
> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
> speed:960
> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
> speed:1013
> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
> speed:1097
> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
> speed:1085
> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
> speed:1049
> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
> speed:839
> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
> speed:464
> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
> speed:483
> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
> speed:447*
> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
> speed:504*
> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
> speed:809
>
>
>
> The there are below info in Active NameNode:
> 2013-08-14 12:44:47,301 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-14 12:48:57,529 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
> 250.23s at 10280.59 KB/s*
> 2013-08-14 12:48:57,530 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>
>
> The there are below info in StadnbyNameNode:
> 2013-08-14 12:43:57,924 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2421083 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-14 12:43:57,925 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
> using no compression
> 2013-08-14 12:48:58,044 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 250.75s at 0.00 KB/s
> 2013-08-14 12:48:58,045 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 655178418 to namenode at 10.232.98.77:20021
>
>
>
> *When Active NameNode download the fsimage, the OPS is less than 500, the
> OPS dropped by 50%.*
>
> The "checkpoint.png" attachment is monitoring info in ganglia.
>
>
> I have two questions:
>
>    - *first question:*  The Active NameNode's bandwidth is occupied by
>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>    RPC request, why the OPS dropped by 50%?
>
>
>    - *second question:* The fsimage file is 2634222089 bytes, Active
>    NameNode spend 240s to download the file. Yhe
>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>    time should is about 50s.
>
>
> Thansk,
>
> LiuLei
>
>
>
>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,
Looks like you got some pace, did you also tried with compression
parameter? I think you will get more optimization with it. Also file
transfer speed depends on our network bandwidth between PNN/SNN and network
traffic b/w nodes.What's your network conf?
Thanks

On Wed, Aug 14, 2013 at 11:39 AM, lei liu <li...@gmail.com> wrote:

>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
> performance is below:
>
> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
> speed:1111
> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
> speed:1044
> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
> speed:1109
> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
> speed:539*
> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
> speed:495
> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
> speed:438
> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
> speed:447*
> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
> speed:665*
> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
> speed:1056
> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
> speed:1048
> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
> speed:1053
> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
> speed:1042
> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
> speed:1051
> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
> speed:1033
> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
> speed:1042
> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
> speed:1137
> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
> speed:1071
> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
> speed:1133
> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
> speed:1108
> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
> speed:1056
> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
> speed:1074
> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
> speed:1125
> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
> speed:1107
> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
> speed:1077
> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
> speed:1029
> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
> speed:1095
> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
> speed:1132
> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
> speed:1108
> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
> speed:980
> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
> speed:1097
> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
> speed:1086
> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
> speed:1099
> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
> speed:1024
> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
> speed:960
> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
> speed:1013
> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
> speed:1097
> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
> speed:1085
> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
> speed:1049
> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
> speed:839
> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
> speed:464
> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
> speed:483
> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
> speed:447*
> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
> speed:504*
> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
> speed:809
>
>
>
> The there are below info in Active NameNode:
> 2013-08-14 12:44:47,301 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-14 12:48:57,529 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
> 250.23s at 10280.59 KB/s*
> 2013-08-14 12:48:57,530 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>
>
> The there are below info in StadnbyNameNode:
> 2013-08-14 12:43:57,924 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2421083 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-14 12:43:57,925 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
> using no compression
> 2013-08-14 12:48:58,044 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 250.75s at 0.00 KB/s
> 2013-08-14 12:48:58,045 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 655178418 to namenode at 10.232.98.77:20021
>
>
>
> *When Active NameNode download the fsimage, the OPS is less than 500, the
> OPS dropped by 50%.*
>
> The "checkpoint.png" attachment is monitoring info in ganglia.
>
>
> I have two questions:
>
>    - *first question:*  The Active NameNode's bandwidth is occupied by
>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>    RPC request, why the OPS dropped by 50%?
>
>
>    - *second question:* The fsimage file is 2634222089 bytes, Active
>    NameNode spend 240s to download the file. Yhe
>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>    time should is about 50s.
>
>
> Thansk,
>
> LiuLei
>
>
>
>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,
Looks like you got some pace, did you also tried with compression
parameter? I think you will get more optimization with it. Also file
transfer speed depends on our network bandwidth between PNN/SNN and network
traffic b/w nodes.What's your network conf?
Thanks

On Wed, Aug 14, 2013 at 11:39 AM, lei liu <li...@gmail.com> wrote:

>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
> performance is below:
>
> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
> speed:1111
> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
> speed:1044
> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
> speed:1109
> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
> speed:539*
> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
> speed:495
> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
> speed:438
> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
> speed:447*
> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
> speed:665*
> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
> speed:1056
> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
> speed:1048
> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
> speed:1053
> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
> speed:1042
> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
> speed:1051
> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
> speed:1033
> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
> speed:1042
> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
> speed:1137
> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
> speed:1071
> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
> speed:1133
> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
> speed:1108
> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
> speed:1056
> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
> speed:1074
> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
> speed:1125
> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
> speed:1107
> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
> speed:1077
> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
> speed:1029
> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
> speed:1095
> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
> speed:1132
> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
> speed:1108
> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
> speed:980
> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
> speed:1097
> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
> speed:1086
> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
> speed:1099
> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
> speed:1024
> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
> speed:960
> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
> speed:1013
> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
> speed:1097
> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
> speed:1085
> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
> speed:1049
> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
> speed:839
> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
> speed:464
> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
> speed:483
> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
> speed:447*
> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
> speed:504*
> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
> speed:809
>
>
>
> The there are below info in Active NameNode:
> 2013-08-14 12:44:47,301 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-14 12:48:57,529 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
> 250.23s at 10280.59 KB/s*
> 2013-08-14 12:48:57,530 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>
>
> The there are below info in StadnbyNameNode:
> 2013-08-14 12:43:57,924 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2421083 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-14 12:43:57,925 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
> using no compression
> 2013-08-14 12:48:58,044 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 250.75s at 0.00 KB/s
> 2013-08-14 12:48:58,045 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 655178418 to namenode at 10.232.98.77:20021
>
>
>
> *When Active NameNode download the fsimage, the OPS is less than 500, the
> OPS dropped by 50%.*
>
> The "checkpoint.png" attachment is monitoring info in ganglia.
>
>
> I have two questions:
>
>    - *first question:*  The Active NameNode's bandwidth is occupied by
>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>    RPC request, why the OPS dropped by 50%?
>
>
>    - *second question:* The fsimage file is 2634222089 bytes, Active
>    NameNode spend 240s to download the file. Yhe
>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>    time should is about 50s.
>
>
> Thansk,
>
> LiuLei
>
>
>
>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,
Looks like you got some pace, did you also tried with compression
parameter? I think you will get more optimization with it. Also file
transfer speed depends on our network bandwidth between PNN/SNN and network
traffic b/w nodes.What's your network conf?
Thanks

On Wed, Aug 14, 2013 at 11:39 AM, lei liu <li...@gmail.com> wrote:

>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
> performance is below:
>
> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
> speed:1111
> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
> speed:1044
> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
> speed:1109
> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
> speed:539*
> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
> speed:495
> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
> speed:438
> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
> speed:447*
> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
> speed:665*
> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
> speed:1056
> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
> speed:1048
> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
> speed:1053
> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
> speed:1042
> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
> speed:1051
> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
> speed:1033
> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
> speed:1042
> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
> speed:1137
> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
> speed:1071
> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
> speed:1133
> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
> speed:1108
> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
> speed:1056
> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
> speed:1074
> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
> speed:1125
> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
> speed:1107
> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
> speed:1077
> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
> speed:1029
> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
> speed:1095
> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
> speed:1132
> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
> speed:1108
> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
> speed:980
> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
> speed:1097
> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
> speed:1086
> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
> speed:1099
> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
> speed:1024
> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
> speed:960
> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
> speed:1013
> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
> speed:1097
> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
> speed:1085
> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
> speed:1049
> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
> speed:839
> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
> speed:464
> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
> speed:483
> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
> speed:447*
> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
> speed:504*
> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
> speed:809
>
>
>
> The there are below info in Active NameNode:
> 2013-08-14 12:44:47,301 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-14 12:48:57,529 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
> 250.23s at 10280.59 KB/s*
> 2013-08-14 12:48:57,530 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>
>
> The there are below info in StadnbyNameNode:
> 2013-08-14 12:43:57,924 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2421083 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-14 12:43:57,925 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
> using no compression
> 2013-08-14 12:48:58,044 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 250.75s at 0.00 KB/s
> 2013-08-14 12:48:58,045 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 655178418 to namenode at 10.232.98.77:20021
>
>
>
> *When Active NameNode download the fsimage, the OPS is less than 500, the
> OPS dropped by 50%.*
>
> The "checkpoint.png" attachment is monitoring info in ganglia.
>
>
> I have two questions:
>
>    - *first question:*  The Active NameNode's bandwidth is occupied by
>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>    RPC request, why the OPS dropped by 50%?
>
>
>    - *second question:* The fsimage file is 2634222089 bytes, Active
>    NameNode spend 240s to download the file. Yhe
>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>    time should is about 50s.
>
>
> Thansk,
>
> LiuLei
>
>
>
>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the performance
is below:

2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
speed:1111
2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
speed:1044
2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
speed:1109
*2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
speed:539*
*2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
speed:495
2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
speed:438
2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
speed:447*
*2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
speed:665*
2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
speed:1056
2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
speed:1048
2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
speed:1053
2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
speed:1042
2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
speed:1051
2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
speed:1033
2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
speed:1042
2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
speed:1137
2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
speed:1071
2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
speed:1133
2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
speed:1108
2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
speed:1056
2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
speed:1074
2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
speed:1125
2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
speed:1107
2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
speed:1077
2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
speed:1029
2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
speed:1095
2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
speed:1132
2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
speed:1108
2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
speed:980
2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
speed:1097
2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
speed:1086
2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
speed:1099
2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
speed:1024
2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
speed:960
2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
speed:1013
2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
speed:1097
2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
speed:1085
2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
speed:1049
2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
speed:839
*2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
speed:464
2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
speed:483
2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
speed:447*
*2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
speed:504*
2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
speed:809



The there are below info in Active NameNode:
2013-08-14 12:44:47,301 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-14 12:48:57,529 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
250.23s at 10280.59 KB/s*
2013-08-14 12:48:57,530 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000655178418 size 2634222089 bytes


The there are below info in StadnbyNameNode:
2013-08-14 12:43:57,924 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2421083 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-14 12:43:57,925 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
using no compression
2013-08-14 12:48:58,044 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
250.75s at 0.00 KB/s
2013-08-14 12:48:58,045 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 655178418 to namenode at 10.232.98.77:20021



*When Active NameNode download the fsimage, the OPS is less than 500, the
OPS dropped by 50%.*

The "checkpoint.png" attachment is monitoring info in ganglia.


I have two questions:

   - *first question:*  The Active NameNode's bandwidth is occupied by
   50%,  Active NameNode  still has 50M bandwidth to be used to handle RPC
   request, why the OPS dropped by 50%?


   - *second question:* The fsimage file is 2634222089 bytes, Active
   NameNode spend 240s to download the file. Yhe
   "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
   time should is about 50s.


Thansk,

LiuLei

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the performance
is below:

2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
speed:1111
2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
speed:1044
2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
speed:1109
*2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
speed:539*
*2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
speed:495
2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
speed:438
2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
speed:447*
*2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
speed:665*
2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
speed:1056
2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
speed:1048
2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
speed:1053
2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
speed:1042
2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
speed:1051
2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
speed:1033
2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
speed:1042
2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
speed:1137
2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
speed:1071
2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
speed:1133
2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
speed:1108
2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
speed:1056
2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
speed:1074
2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
speed:1125
2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
speed:1107
2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
speed:1077
2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
speed:1029
2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
speed:1095
2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
speed:1132
2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
speed:1108
2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
speed:980
2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
speed:1097
2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
speed:1086
2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
speed:1099
2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
speed:1024
2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
speed:960
2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
speed:1013
2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
speed:1097
2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
speed:1085
2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
speed:1049
2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
speed:839
*2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
speed:464
2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
speed:483
2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
speed:447*
*2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
speed:504*
2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
speed:809



The there are below info in Active NameNode:
2013-08-14 12:44:47,301 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-14 12:48:57,529 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
250.23s at 10280.59 KB/s*
2013-08-14 12:48:57,530 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000655178418 size 2634222089 bytes


The there are below info in StadnbyNameNode:
2013-08-14 12:43:57,924 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2421083 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-14 12:43:57,925 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
using no compression
2013-08-14 12:48:58,044 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
250.75s at 0.00 KB/s
2013-08-14 12:48:58,045 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 655178418 to namenode at 10.232.98.77:20021



*When Active NameNode download the fsimage, the OPS is less than 500, the
OPS dropped by 50%.*

The "checkpoint.png" attachment is monitoring info in ganglia.


I have two questions:

   - *first question:*  The Active NameNode's bandwidth is occupied by
   50%,  Active NameNode  still has 50M bandwidth to be used to handle RPC
   request, why the OPS dropped by 50%?


   - *second question:* The fsimage file is 2634222089 bytes, Active
   NameNode spend 240s to download the file. Yhe
   "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
   time should is about 50s.


Thansk,

LiuLei

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the performance
is below:

2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
speed:1111
2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
speed:1044
2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
speed:1109
*2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
speed:539*
*2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
speed:495
2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
speed:438
2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
speed:447*
*2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
speed:665*
2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
speed:1056
2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
speed:1048
2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
speed:1053
2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
speed:1042
2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
speed:1051
2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
speed:1033
2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
speed:1042
2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
speed:1137
2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
speed:1071
2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
speed:1133
2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
speed:1108
2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
speed:1056
2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
speed:1074
2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
speed:1125
2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
speed:1107
2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
speed:1077
2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
speed:1029
2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
speed:1095
2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
speed:1132
2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
speed:1108
2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
speed:980
2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
speed:1097
2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
speed:1086
2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
speed:1099
2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
speed:1024
2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
speed:960
2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
speed:1013
2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
speed:1097
2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
speed:1085
2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
speed:1049
2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
speed:839
*2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
speed:464
2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
speed:483
2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
speed:447*
*2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
speed:504*
2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
speed:809



The there are below info in Active NameNode:
2013-08-14 12:44:47,301 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-14 12:48:57,529 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
250.23s at 10280.59 KB/s*
2013-08-14 12:48:57,530 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000655178418 size 2634222089 bytes


The there are below info in StadnbyNameNode:
2013-08-14 12:43:57,924 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2421083 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-14 12:43:57,925 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
using no compression
2013-08-14 12:48:58,044 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
250.75s at 0.00 KB/s
2013-08-14 12:48:58,045 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 655178418 to namenode at 10.232.98.77:20021



*When Active NameNode download the fsimage, the OPS is less than 500, the
OPS dropped by 50%.*

The "checkpoint.png" attachment is monitoring info in ganglia.


I have two questions:

   - *first question:*  The Active NameNode's bandwidth is occupied by
   50%,  Active NameNode  still has 50M bandwidth to be used to handle RPC
   request, why the OPS dropped by 50%?


   - *second question:* The fsimage file is 2634222089 bytes, Active
   NameNode spend 240s to download the file. Yhe
   "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
   time should is about 50s.


Thansk,

LiuLei

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the performance
is below:

2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
speed:1111
2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
speed:1044
2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
speed:1109
*2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
speed:539*
*2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
speed:495
2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
speed:438
2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
speed:447*
*2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
speed:665*
2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
speed:1056
2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
speed:1048
2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
speed:1053
2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
speed:1042
2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
speed:1051
2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
speed:1033
2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
speed:1042
2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
speed:1137
2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
speed:1071
2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
speed:1133
2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
speed:1108
2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
speed:1056
2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
speed:1074
2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
speed:1125
2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
speed:1107
2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
speed:1077
2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
speed:1029
2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
speed:1095
2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
speed:1132
2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
speed:1108
2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
speed:980
2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
speed:1097
2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
speed:1086
2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
speed:1099
2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
speed:1024
2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
speed:960
2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
speed:1013
2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
speed:1097
2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
speed:1085
2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
speed:1049
2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
speed:839
*2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
speed:464
2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
speed:483
2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
speed:447*
*2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
speed:504*
2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
speed:809



The there are below info in Active NameNode:
2013-08-14 12:44:47,301 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-14 12:48:57,529 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
250.23s at 10280.59 KB/s*
2013-08-14 12:48:57,530 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000655178418 size 2634222089 bytes


The there are below info in StadnbyNameNode:
2013-08-14 12:43:57,924 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2421083 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-14 12:43:57,925 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
using no compression
2013-08-14 12:48:58,044 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
250.75s at 0.00 KB/s
2013-08-14 12:48:58,045 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 655178418 to namenode at 10.232.98.77:20021



*When Active NameNode download the fsimage, the OPS is less than 500, the
OPS dropped by 50%.*

The "checkpoint.png" attachment is monitoring info in ganglia.


I have two questions:

   - *first question:*  The Active NameNode's bandwidth is occupied by
   50%,  Active NameNode  still has 50M bandwidth to be used to handle RPC
   request, why the OPS dropped by 50%?


   - *second question:* The fsimage file is 2634222089 bytes, Active
   NameNode spend 240s to download the file. Yhe
   "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
   time should is about 50s.


Thansk,

LiuLei

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,

I'm agreed with Harsh comment on image file compression and transfer
bandwidth parameter for optimizing checkpoint process.

In addition I'm not able to correlate your performance program log
timings(less then 10) and file transfer logs timing on active/stand by
nodes.

Thanks
On Tue, Aug 13, 2013 at 4:06 PM, Harsh J <ha...@cloudera.com> wrote:

> Perhaps turning on fsimage compression may help. See
> documentation of dfs.image.compress at
>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
> .
>
> You can also try to throttle the bandwidth it uses via
> dfs.image.transfer.bandwidthPerSec.
>
> On Tue, Aug 13, 2013 at 3:45 PM, lei liu <li...@gmail.com> wrote:
> > I write one programm to test NameNode performance. Please see the
> > EditLogPerformance.java
> >
> > I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> > result is below content:
> > 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> > 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> > 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> > 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> > 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> > 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> > 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> > 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> > 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> > 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> > 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> > 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> > 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> > 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> > 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> > 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> > 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> > 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> > 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> > 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> > 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> > 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> > 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> > 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> > 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> > 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> > 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> > 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> > 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> > 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> > 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> > 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> > 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> > 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> > 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> > 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> > 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> > 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> > 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> > 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> > 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> > 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> > 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
> >
> >
> >
> > The speed is less than ten sometimes. I find when Active NameNode
> download
> > the fsimage file, the  speed is less than ten. So I think download
> fsimage
> > file that affects the performance  of Active NameNode.
> >
> >
> > There are below info in Standby NameNode:
> > 2013-08-13 17:48:12,412 INFO
> > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> > checkpoint because there have been 2558038 txns since the last
> checkpoint,
> > which exceeds the configured threshold 1000000
> > 2013-08-13 17:48:12,413 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage:
> > Saving image file
> >
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> > using no compression
> > 2013-08-13 17:49:19,085 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage:
> > Image file of size 3385425100 saved in 66 seconds.
> > 2013-08-13 17:49:19,655 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening
> connection
> > to
> >
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> > 2013-08-13 17:53:21,107 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> > 241.45s at 0.00 KB/s
> > 2013-08-13 17:53:21,107 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image
> with
> > txid 521186406 to namenode at 10.232.98.77:20021
> >
> >
> > There are below info in Active NameNode:
> > 2013-08-13 17:49:19,659 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening
> connection
> > to
> >
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> > 2013-08-13 17:53:20,610 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> > 240.95s at 13720.96 KB/s
> > 2013-08-13 17:53:20,610 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> > fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2013/8/13 Jitendra Yadav <je...@gmail.com>
> >>
> >> Hi,
> >>
> >> Can you please let me know that how you identified the slowness between
> >> primary and standby namnode?
> >>
> >> Also please share the network connection bandwidth between these two
> >> servers.
> >>
> >> Thanks
> >>
> >> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
> >>>
> >>> The fsimage file size is 1658934155
> >>>
> >>>
> >>> 2013/8/13 Harsh J <ha...@cloudera.com>
> >>>>
> >>>> How large are your checkpointed fsimage files?
> >>>>
> >>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> >>>> > When Standby Namenode is doing checkpoint,  upload the image file to
> >>>> > Active
> >>>> > NameNode, the Active NameNode is very slow. What is reason result to
> >>>> > the
> >>>> > Active NameNode is slow?
> >>>> >
> >>>> >
> >>>> > Thanks,
> >>>> >
> >>>> > LiuLei
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Harsh J
> >>>
> >>>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,

I'm agreed with Harsh comment on image file compression and transfer
bandwidth parameter for optimizing checkpoint process.

In addition I'm not able to correlate your performance program log
timings(less then 10) and file transfer logs timing on active/stand by
nodes.

Thanks
On Tue, Aug 13, 2013 at 4:06 PM, Harsh J <ha...@cloudera.com> wrote:

> Perhaps turning on fsimage compression may help. See
> documentation of dfs.image.compress at
>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
> .
>
> You can also try to throttle the bandwidth it uses via
> dfs.image.transfer.bandwidthPerSec.
>
> On Tue, Aug 13, 2013 at 3:45 PM, lei liu <li...@gmail.com> wrote:
> > I write one programm to test NameNode performance. Please see the
> > EditLogPerformance.java
> >
> > I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> > result is below content:
> > 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> > 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> > 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> > 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> > 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> > 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> > 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> > 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> > 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> > 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> > 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> > 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> > 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> > 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> > 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> > 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> > 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> > 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> > 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> > 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> > 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> > 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> > 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> > 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> > 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> > 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> > 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> > 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> > 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> > 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> > 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> > 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> > 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> > 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> > 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> > 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> > 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> > 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> > 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> > 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> > 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> > 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> > 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
> >
> >
> >
> > The speed is less than ten sometimes. I find when Active NameNode
> download
> > the fsimage file, the  speed is less than ten. So I think download
> fsimage
> > file that affects the performance  of Active NameNode.
> >
> >
> > There are below info in Standby NameNode:
> > 2013-08-13 17:48:12,412 INFO
> > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> > checkpoint because there have been 2558038 txns since the last
> checkpoint,
> > which exceeds the configured threshold 1000000
> > 2013-08-13 17:48:12,413 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage:
> > Saving image file
> >
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> > using no compression
> > 2013-08-13 17:49:19,085 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage:
> > Image file of size 3385425100 saved in 66 seconds.
> > 2013-08-13 17:49:19,655 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening
> connection
> > to
> >
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> > 2013-08-13 17:53:21,107 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> > 241.45s at 0.00 KB/s
> > 2013-08-13 17:53:21,107 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image
> with
> > txid 521186406 to namenode at 10.232.98.77:20021
> >
> >
> > There are below info in Active NameNode:
> > 2013-08-13 17:49:19,659 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening
> connection
> > to
> >
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> > 2013-08-13 17:53:20,610 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> > 240.95s at 13720.96 KB/s
> > 2013-08-13 17:53:20,610 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> > fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2013/8/13 Jitendra Yadav <je...@gmail.com>
> >>
> >> Hi,
> >>
> >> Can you please let me know that how you identified the slowness between
> >> primary and standby namnode?
> >>
> >> Also please share the network connection bandwidth between these two
> >> servers.
> >>
> >> Thanks
> >>
> >> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
> >>>
> >>> The fsimage file size is 1658934155
> >>>
> >>>
> >>> 2013/8/13 Harsh J <ha...@cloudera.com>
> >>>>
> >>>> How large are your checkpointed fsimage files?
> >>>>
> >>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> >>>> > When Standby Namenode is doing checkpoint,  upload the image file to
> >>>> > Active
> >>>> > NameNode, the Active NameNode is very slow. What is reason result to
> >>>> > the
> >>>> > Active NameNode is slow?
> >>>> >
> >>>> >
> >>>> > Thanks,
> >>>> >
> >>>> > LiuLei
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Harsh J
> >>>
> >>>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,

I'm agreed with Harsh comment on image file compression and transfer
bandwidth parameter for optimizing checkpoint process.

In addition I'm not able to correlate your performance program log
timings(less then 10) and file transfer logs timing on active/stand by
nodes.

Thanks
On Tue, Aug 13, 2013 at 4:06 PM, Harsh J <ha...@cloudera.com> wrote:

> Perhaps turning on fsimage compression may help. See
> documentation of dfs.image.compress at
>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
> .
>
> You can also try to throttle the bandwidth it uses via
> dfs.image.transfer.bandwidthPerSec.
>
> On Tue, Aug 13, 2013 at 3:45 PM, lei liu <li...@gmail.com> wrote:
> > I write one programm to test NameNode performance. Please see the
> > EditLogPerformance.java
> >
> > I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> > result is below content:
> > 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> > 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> > 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> > 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> > 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> > 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> > 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> > 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> > 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> > 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> > 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> > 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> > 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> > 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> > 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> > 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> > 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> > 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> > 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> > 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> > 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> > 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> > 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> > 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> > 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> > 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> > 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> > 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> > 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> > 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> > 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> > 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> > 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> > 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> > 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> > 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> > 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> > 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> > 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> > 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> > 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> > 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> > 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
> >
> >
> >
> > The speed is less than ten sometimes. I find when Active NameNode
> download
> > the fsimage file, the  speed is less than ten. So I think download
> fsimage
> > file that affects the performance  of Active NameNode.
> >
> >
> > There are below info in Standby NameNode:
> > 2013-08-13 17:48:12,412 INFO
> > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> > checkpoint because there have been 2558038 txns since the last
> checkpoint,
> > which exceeds the configured threshold 1000000
> > 2013-08-13 17:48:12,413 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage:
> > Saving image file
> >
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> > using no compression
> > 2013-08-13 17:49:19,085 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage:
> > Image file of size 3385425100 saved in 66 seconds.
> > 2013-08-13 17:49:19,655 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening
> connection
> > to
> >
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> > 2013-08-13 17:53:21,107 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> > 241.45s at 0.00 KB/s
> > 2013-08-13 17:53:21,107 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image
> with
> > txid 521186406 to namenode at 10.232.98.77:20021
> >
> >
> > There are below info in Active NameNode:
> > 2013-08-13 17:49:19,659 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening
> connection
> > to
> >
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> > 2013-08-13 17:53:20,610 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> > 240.95s at 13720.96 KB/s
> > 2013-08-13 17:53:20,610 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> > fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2013/8/13 Jitendra Yadav <je...@gmail.com>
> >>
> >> Hi,
> >>
> >> Can you please let me know that how you identified the slowness between
> >> primary and standby namnode?
> >>
> >> Also please share the network connection bandwidth between these two
> >> servers.
> >>
> >> Thanks
> >>
> >> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
> >>>
> >>> The fsimage file size is 1658934155
> >>>
> >>>
> >>> 2013/8/13 Harsh J <ha...@cloudera.com>
> >>>>
> >>>> How large are your checkpointed fsimage files?
> >>>>
> >>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> >>>> > When Standby Namenode is doing checkpoint,  upload the image file to
> >>>> > Active
> >>>> > NameNode, the Active NameNode is very slow. What is reason result to
> >>>> > the
> >>>> > Active NameNode is slow?
> >>>> >
> >>>> >
> >>>> > Thanks,
> >>>> >
> >>>> > LiuLei
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Harsh J
> >>>
> >>>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,

I'm agreed with Harsh comment on image file compression and transfer
bandwidth parameter for optimizing checkpoint process.

In addition I'm not able to correlate your performance program log
timings(less then 10) and file transfer logs timing on active/stand by
nodes.

Thanks
On Tue, Aug 13, 2013 at 4:06 PM, Harsh J <ha...@cloudera.com> wrote:

> Perhaps turning on fsimage compression may help. See
> documentation of dfs.image.compress at
>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
> .
>
> You can also try to throttle the bandwidth it uses via
> dfs.image.transfer.bandwidthPerSec.
>
> On Tue, Aug 13, 2013 at 3:45 PM, lei liu <li...@gmail.com> wrote:
> > I write one programm to test NameNode performance. Please see the
> > EditLogPerformance.java
> >
> > I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> > result is below content:
> > 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> > 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> > 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> > 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> > 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> > 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> > 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> > 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> > 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> > 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> > 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> > 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> > 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> > 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> > 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> > 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> > 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> > 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> > 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> > 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> > 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> > 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> > 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> > 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> > 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> > 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> > 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> > 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> > 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> > 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> > 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> > 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> > 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> > 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> > 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> > 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> > 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> > 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> > 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> > 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> > 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> > 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> > 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> > (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
> >
> >
> >
> > The speed is less than ten sometimes. I find when Active NameNode
> download
> > the fsimage file, the  speed is less than ten. So I think download
> fsimage
> > file that affects the performance  of Active NameNode.
> >
> >
> > There are below info in Standby NameNode:
> > 2013-08-13 17:48:12,412 INFO
> > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> > checkpoint because there have been 2558038 txns since the last
> checkpoint,
> > which exceeds the configured threshold 1000000
> > 2013-08-13 17:48:12,413 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage:
> > Saving image file
> >
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> > using no compression
> > 2013-08-13 17:49:19,085 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImage:
> > Image file of size 3385425100 saved in 66 seconds.
> > 2013-08-13 17:49:19,655 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening
> connection
> > to
> >
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> > 2013-08-13 17:53:21,107 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> > 241.45s at 0.00 KB/s
> > 2013-08-13 17:53:21,107 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image
> with
> > txid 521186406 to namenode at 10.232.98.77:20021
> >
> >
> > There are below info in Active NameNode:
> > 2013-08-13 17:49:19,659 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening
> connection
> > to
> >
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> > 2013-08-13 17:53:20,610 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> > 240.95s at 13720.96 KB/s
> > 2013-08-13 17:53:20,610 INFO
> > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> > fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2013/8/13 Jitendra Yadav <je...@gmail.com>
> >>
> >> Hi,
> >>
> >> Can you please let me know that how you identified the slowness between
> >> primary and standby namnode?
> >>
> >> Also please share the network connection bandwidth between these two
> >> servers.
> >>
> >> Thanks
> >>
> >> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
> >>>
> >>> The fsimage file size is 1658934155
> >>>
> >>>
> >>> 2013/8/13 Harsh J <ha...@cloudera.com>
> >>>>
> >>>> How large are your checkpointed fsimage files?
> >>>>
> >>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> >>>> > When Standby Namenode is doing checkpoint,  upload the image file to
> >>>> > Active
> >>>> > NameNode, the Active NameNode is very slow. What is reason result to
> >>>> > the
> >>>> > Active NameNode is slow?
> >>>> >
> >>>> >
> >>>> > Thanks,
> >>>> >
> >>>> > LiuLei
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Harsh J
> >>>
> >>>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Harsh J <ha...@cloudera.com>.
Perhaps turning on fsimage compression may help. See
documentation of dfs.image.compress at
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.

You can also try to throttle the bandwidth it uses via
dfs.image.transfer.bandwidthPerSec.

On Tue, Aug 13, 2013 at 3:45 PM, lei liu <li...@gmail.com> wrote:
> I write one programm to test NameNode performance. Please see the
> EditLogPerformance.java
>
> I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> result is below content:
> 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
>
>
>
> The speed is less than ten sometimes. I find when Active NameNode download
> the fsimage file, the  speed is less than ten. So I think download fsimage
> file that affects the performance  of Active NameNode.
>
>
> There are below info in Standby NameNode:
> 2013-08-13 17:48:12,412 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2558038 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-13 17:48:12,413 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> using no compression
> 2013-08-13 17:49:19,085 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 3385425100 saved in 66 seconds.
> 2013-08-13 17:49:19,655 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 241.45s at 0.00 KB/s
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 521186406 to namenode at 10.232.98.77:20021
>
>
> There are below info in Active NameNode:
> 2013-08-13 17:49:19,659 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 240.95s at 13720.96 KB/s
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
>
>
>
>
>
>
>
>
>
>
>
>
> 2013/8/13 Jitendra Yadav <je...@gmail.com>
>>
>> Hi,
>>
>> Can you please let me know that how you identified the slowness between
>> primary and standby namnode?
>>
>> Also please share the network connection bandwidth between these two
>> servers.
>>
>> Thanks
>>
>> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
>>>
>>> The fsimage file size is 1658934155
>>>
>>>
>>> 2013/8/13 Harsh J <ha...@cloudera.com>
>>>>
>>>> How large are your checkpointed fsimage files?
>>>>
>>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>>> > Active
>>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>>> > the
>>>> > Active NameNode is slow?
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > LiuLei
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>
>



-- 
Harsh J

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Harsh J <ha...@cloudera.com>.
Perhaps turning on fsimage compression may help. See
documentation of dfs.image.compress at
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.

You can also try to throttle the bandwidth it uses via
dfs.image.transfer.bandwidthPerSec.

On Tue, Aug 13, 2013 at 3:45 PM, lei liu <li...@gmail.com> wrote:
> I write one programm to test NameNode performance. Please see the
> EditLogPerformance.java
>
> I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> result is below content:
> 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
>
>
>
> The speed is less than ten sometimes. I find when Active NameNode download
> the fsimage file, the  speed is less than ten. So I think download fsimage
> file that affects the performance  of Active NameNode.
>
>
> There are below info in Standby NameNode:
> 2013-08-13 17:48:12,412 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2558038 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-13 17:48:12,413 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> using no compression
> 2013-08-13 17:49:19,085 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 3385425100 saved in 66 seconds.
> 2013-08-13 17:49:19,655 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 241.45s at 0.00 KB/s
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 521186406 to namenode at 10.232.98.77:20021
>
>
> There are below info in Active NameNode:
> 2013-08-13 17:49:19,659 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 240.95s at 13720.96 KB/s
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
>
>
>
>
>
>
>
>
>
>
>
>
> 2013/8/13 Jitendra Yadav <je...@gmail.com>
>>
>> Hi,
>>
>> Can you please let me know that how you identified the slowness between
>> primary and standby namnode?
>>
>> Also please share the network connection bandwidth between these two
>> servers.
>>
>> Thanks
>>
>> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
>>>
>>> The fsimage file size is 1658934155
>>>
>>>
>>> 2013/8/13 Harsh J <ha...@cloudera.com>
>>>>
>>>> How large are your checkpointed fsimage files?
>>>>
>>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>>> > Active
>>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>>> > the
>>>> > Active NameNode is slow?
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > LiuLei
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>
>



-- 
Harsh J

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Harsh J <ha...@cloudera.com>.
Perhaps turning on fsimage compression may help. See
documentation of dfs.image.compress at
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.

You can also try to throttle the bandwidth it uses via
dfs.image.transfer.bandwidthPerSec.

On Tue, Aug 13, 2013 at 3:45 PM, lei liu <li...@gmail.com> wrote:
> I write one programm to test NameNode performance. Please see the
> EditLogPerformance.java
>
> I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> result is below content:
> 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
>
>
>
> The speed is less than ten sometimes. I find when Active NameNode download
> the fsimage file, the  speed is less than ten. So I think download fsimage
> file that affects the performance  of Active NameNode.
>
>
> There are below info in Standby NameNode:
> 2013-08-13 17:48:12,412 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2558038 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-13 17:48:12,413 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> using no compression
> 2013-08-13 17:49:19,085 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 3385425100 saved in 66 seconds.
> 2013-08-13 17:49:19,655 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 241.45s at 0.00 KB/s
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 521186406 to namenode at 10.232.98.77:20021
>
>
> There are below info in Active NameNode:
> 2013-08-13 17:49:19,659 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 240.95s at 13720.96 KB/s
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
>
>
>
>
>
>
>
>
>
>
>
>
> 2013/8/13 Jitendra Yadav <je...@gmail.com>
>>
>> Hi,
>>
>> Can you please let me know that how you identified the slowness between
>> primary and standby namnode?
>>
>> Also please share the network connection bandwidth between these two
>> servers.
>>
>> Thanks
>>
>> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
>>>
>>> The fsimage file size is 1658934155
>>>
>>>
>>> 2013/8/13 Harsh J <ha...@cloudera.com>
>>>>
>>>> How large are your checkpointed fsimage files?
>>>>
>>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>>> > Active
>>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>>> > the
>>>> > Active NameNode is slow?
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > LiuLei
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>
>



-- 
Harsh J

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Harsh J <ha...@cloudera.com>.
Perhaps turning on fsimage compression may help. See
documentation of dfs.image.compress at
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.

You can also try to throttle the bandwidth it uses via
dfs.image.transfer.bandwidthPerSec.

On Tue, Aug 13, 2013 at 3:45 PM, lei liu <li...@gmail.com> wrote:
> I write one programm to test NameNode performance. Please see the
> EditLogPerformance.java
>
> I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> result is below content:
> 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
>
>
>
> The speed is less than ten sometimes. I find when Active NameNode download
> the fsimage file, the  speed is less than ten. So I think download fsimage
> file that affects the performance  of Active NameNode.
>
>
> There are below info in Standby NameNode:
> 2013-08-13 17:48:12,412 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2558038 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-13 17:48:12,413 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> using no compression
> 2013-08-13 17:49:19,085 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 3385425100 saved in 66 seconds.
> 2013-08-13 17:49:19,655 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 241.45s at 0.00 KB/s
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 521186406 to namenode at 10.232.98.77:20021
>
>
> There are below info in Active NameNode:
> 2013-08-13 17:49:19,659 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 240.95s at 13720.96 KB/s
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
>
>
>
>
>
>
>
>
>
>
>
>
> 2013/8/13 Jitendra Yadav <je...@gmail.com>
>>
>> Hi,
>>
>> Can you please let me know that how you identified the slowness between
>> primary and standby namnode?
>>
>> Also please share the network connection bandwidth between these two
>> servers.
>>
>> Thanks
>>
>> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
>>>
>>> The fsimage file size is 1658934155
>>>
>>>
>>> 2013/8/13 Harsh J <ha...@cloudera.com>
>>>>
>>>> How large are your checkpointed fsimage files?
>>>>
>>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>>> > Active
>>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>>> > the
>>>> > Active NameNode is slow?
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > LiuLei
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>
>



-- 
Harsh J

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
I write one programm to test NameNode performance. Please see the
EditLogPerformance.java

I use 60 threads to execute the EditLogPerformance.javacode,  the testing
result is below content:
2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194



The speed is less than ten sometimes. I find when Active NameNode download
the fsimage file, the  speed is less than ten. So I think download fsimage
file that affects the performance  of Active NameNode.


There are below info in Standby NameNode:
2013-08-13 17:48:12,412 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2558038 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-13 17:48:12,413 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
using no compression
2013-08-13 17:49:19,085 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size
3385425100 saved in 66 seconds.
2013-08-13 17:49:19,655 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
241.45s at 0.00 KB/s
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 521186406 to namenode at 10.232.98.77:20021


There are below info in Active NameNode:
2013-08-13 17:49:19,659 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
240.95s at 13720.96 KB/s
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000521186406 size 3385425100 bytes.












2013/8/13 Jitendra Yadav <je...@gmail.com>

> Hi,
>
> Can you please let me know that how you identified the slowness between
> primary and standby namnode?
>
> Also please share the network connection bandwidth between these two
> servers.
>
> Thanks
>
> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
>
>> The fsimage file size is 1658934155
>>
>>
>> 2013/8/13 Harsh J <ha...@cloudera.com>
>>
>>> How large are your checkpointed fsimage files?
>>>
>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>> Active
>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>> the
>>> > Active NameNode is slow?
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > LiuLei
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
I write one programm to test NameNode performance. Please see the
EditLogPerformance.java

I use 60 threads to execute the EditLogPerformance.javacode,  the testing
result is below content:
2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194



The speed is less than ten sometimes. I find when Active NameNode download
the fsimage file, the  speed is less than ten. So I think download fsimage
file that affects the performance  of Active NameNode.


There are below info in Standby NameNode:
2013-08-13 17:48:12,412 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2558038 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-13 17:48:12,413 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
using no compression
2013-08-13 17:49:19,085 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size
3385425100 saved in 66 seconds.
2013-08-13 17:49:19,655 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
241.45s at 0.00 KB/s
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 521186406 to namenode at 10.232.98.77:20021


There are below info in Active NameNode:
2013-08-13 17:49:19,659 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
240.95s at 13720.96 KB/s
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000521186406 size 3385425100 bytes.












2013/8/13 Jitendra Yadav <je...@gmail.com>

> Hi,
>
> Can you please let me know that how you identified the slowness between
> primary and standby namnode?
>
> Also please share the network connection bandwidth between these two
> servers.
>
> Thanks
>
> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
>
>> The fsimage file size is 1658934155
>>
>>
>> 2013/8/13 Harsh J <ha...@cloudera.com>
>>
>>> How large are your checkpointed fsimage files?
>>>
>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>> Active
>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>> the
>>> > Active NameNode is slow?
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > LiuLei
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
I write one programm to test NameNode performance. Please see the
EditLogPerformance.java

I use 60 threads to execute the EditLogPerformance.javacode,  the testing
result is below content:
2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194



The speed is less than ten sometimes. I find when Active NameNode download
the fsimage file, the  speed is less than ten. So I think download fsimage
file that affects the performance  of Active NameNode.


There are below info in Standby NameNode:
2013-08-13 17:48:12,412 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2558038 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-13 17:48:12,413 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
using no compression
2013-08-13 17:49:19,085 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size
3385425100 saved in 66 seconds.
2013-08-13 17:49:19,655 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
241.45s at 0.00 KB/s
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 521186406 to namenode at 10.232.98.77:20021


There are below info in Active NameNode:
2013-08-13 17:49:19,659 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
240.95s at 13720.96 KB/s
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000521186406 size 3385425100 bytes.












2013/8/13 Jitendra Yadav <je...@gmail.com>

> Hi,
>
> Can you please let me know that how you identified the slowness between
> primary and standby namnode?
>
> Also please share the network connection bandwidth between these two
> servers.
>
> Thanks
>
> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
>
>> The fsimage file size is 1658934155
>>
>>
>> 2013/8/13 Harsh J <ha...@cloudera.com>
>>
>>> How large are your checkpointed fsimage files?
>>>
>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>> Active
>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>> the
>>> > Active NameNode is slow?
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > LiuLei
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
I write one programm to test NameNode performance. Please see the
EditLogPerformance.java

I use 60 threads to execute the EditLogPerformance.javacode,  the testing
result is below content:
2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194



The speed is less than ten sometimes. I find when Active NameNode download
the fsimage file, the  speed is less than ten. So I think download fsimage
file that affects the performance  of Active NameNode.


There are below info in Standby NameNode:
2013-08-13 17:48:12,412 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2558038 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-13 17:48:12,413 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
using no compression
2013-08-13 17:49:19,085 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size
3385425100 saved in 66 seconds.
2013-08-13 17:49:19,655 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
241.45s at 0.00 KB/s
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 521186406 to namenode at 10.232.98.77:20021


There are below info in Active NameNode:
2013-08-13 17:49:19,659 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
240.95s at 13720.96 KB/s
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000521186406 size 3385425100 bytes.












2013/8/13 Jitendra Yadav <je...@gmail.com>

> Hi,
>
> Can you please let me know that how you identified the slowness between
> primary and standby namnode?
>
> Also please share the network connection bandwidth between these two
> servers.
>
> Thanks
>
> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:
>
>> The fsimage file size is 1658934155
>>
>>
>> 2013/8/13 Harsh J <ha...@cloudera.com>
>>
>>> How large are your checkpointed fsimage files?
>>>
>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>> Active
>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>> the
>>> > Active NameNode is slow?
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > LiuLei
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,

Can you please let me know that how you identified the slowness between
primary and standby namnode?

Also please share the network connection bandwidth between these two
servers.

Thanks
On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:

> The fsimage file size is 1658934155
>
>
> 2013/8/13 Harsh J <ha...@cloudera.com>
>
>> How large are your checkpointed fsimage files?
>>
>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>> > When Standby Namenode is doing checkpoint,  upload the image file to
>> Active
>> > NameNode, the Active NameNode is very slow. What is reason result to the
>> > Active NameNode is slow?
>> >
>> >
>> > Thanks,
>> >
>> > LiuLei
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,

Can you please let me know that how you identified the slowness between
primary and standby namnode?

Also please share the network connection bandwidth between these two
servers.

Thanks
On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:

> The fsimage file size is 1658934155
>
>
> 2013/8/13 Harsh J <ha...@cloudera.com>
>
>> How large are your checkpointed fsimage files?
>>
>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>> > When Standby Namenode is doing checkpoint,  upload the image file to
>> Active
>> > NameNode, the Active NameNode is very slow. What is reason result to the
>> > Active NameNode is slow?
>> >
>> >
>> > Thanks,
>> >
>> > LiuLei
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,

Can you please let me know that how you identified the slowness between
primary and standby namnode?

Also please share the network connection bandwidth between these two
servers.

Thanks
On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:

> The fsimage file size is 1658934155
>
>
> 2013/8/13 Harsh J <ha...@cloudera.com>
>
>> How large are your checkpointed fsimage files?
>>
>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>> > When Standby Namenode is doing checkpoint,  upload the image file to
>> Active
>> > NameNode, the Active NameNode is very slow. What is reason result to the
>> > Active NameNode is slow?
>> >
>> >
>> > Thanks,
>> >
>> > LiuLei
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Jitendra Yadav <je...@gmail.com>.
Hi,

Can you please let me know that how you identified the slowness between
primary and standby namnode?

Also please share the network connection bandwidth between these two
servers.

Thanks
On Tue, Aug 13, 2013 at 11:52 AM, lei liu <li...@gmail.com> wrote:

> The fsimage file size is 1658934155
>
>
> 2013/8/13 Harsh J <ha...@cloudera.com>
>
>> How large are your checkpointed fsimage files?
>>
>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
>> > When Standby Namenode is doing checkpoint,  upload the image file to
>> Active
>> > NameNode, the Active NameNode is very slow. What is reason result to the
>> > Active NameNode is slow?
>> >
>> >
>> > Thanks,
>> >
>> > LiuLei
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
The fsimage file size is 1658934155


2013/8/13 Harsh J <ha...@cloudera.com>

> How large are your checkpointed fsimage files?
>
> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> > When Standby Namenode is doing checkpoint,  upload the image file to
> Active
> > NameNode, the Active NameNode is very slow. What is reason result to the
> > Active NameNode is slow?
> >
> >
> > Thanks,
> >
> > LiuLei
> >
>
>
>
> --
> Harsh J
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
The fsimage file size is 1658934155


2013/8/13 Harsh J <ha...@cloudera.com>

> How large are your checkpointed fsimage files?
>
> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> > When Standby Namenode is doing checkpoint,  upload the image file to
> Active
> > NameNode, the Active NameNode is very slow. What is reason result to the
> > Active NameNode is slow?
> >
> >
> > Thanks,
> >
> > LiuLei
> >
>
>
>
> --
> Harsh J
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
The fsimage file size is 1658934155


2013/8/13 Harsh J <ha...@cloudera.com>

> How large are your checkpointed fsimage files?
>
> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> > When Standby Namenode is doing checkpoint,  upload the image file to
> Active
> > NameNode, the Active NameNode is very slow. What is reason result to the
> > Active NameNode is slow?
> >
> >
> > Thanks,
> >
> > LiuLei
> >
>
>
>
> --
> Harsh J
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by lei liu <li...@gmail.com>.
The fsimage file size is 1658934155


2013/8/13 Harsh J <ha...@cloudera.com>

> How large are your checkpointed fsimage files?
>
> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> > When Standby Namenode is doing checkpoint,  upload the image file to
> Active
> > NameNode, the Active NameNode is very slow. What is reason result to the
> > Active NameNode is slow?
> >
> >
> > Thanks,
> >
> > LiuLei
> >
>
>
>
> --
> Harsh J
>

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Harsh J <ha...@cloudera.com>.
How large are your checkpointed fsimage files?

On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> When Standby Namenode is doing checkpoint,  upload the image file to Active
> NameNode, the Active NameNode is very slow. What is reason result to the
> Active NameNode is slow?
>
>
> Thanks,
>
> LiuLei
>



-- 
Harsh J

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Harsh J <ha...@cloudera.com>.
How large are your checkpointed fsimage files?

On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> When Standby Namenode is doing checkpoint,  upload the image file to Active
> NameNode, the Active NameNode is very slow. What is reason result to the
> Active NameNode is slow?
>
>
> Thanks,
>
> LiuLei
>



-- 
Harsh J

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Harsh J <ha...@cloudera.com>.
How large are your checkpointed fsimage files?

On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> When Standby Namenode is doing checkpoint,  upload the image file to Active
> NameNode, the Active NameNode is very slow. What is reason result to the
> Active NameNode is slow?
>
>
> Thanks,
>
> LiuLei
>



-- 
Harsh J

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

Posted by Harsh J <ha...@cloudera.com>.
How large are your checkpointed fsimage files?

On Mon, Aug 12, 2013 at 3:42 PM, lei liu <li...@gmail.com> wrote:
> When Standby Namenode is doing checkpoint,  upload the image file to Active
> NameNode, the Active NameNode is very slow. What is reason result to the
> Active NameNode is slow?
>
>
> Thanks,
>
> LiuLei
>



-- 
Harsh J