You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Christian Saar <sa...@adacor.com> on 2008/07/01 17:19:48 UTC

Repeated Exceptions in SecondaryNamenode Log

Hallo All,

We have this Exception in our Logs:

> 2008-07-01 17:12:02,392 ERROR org.apache.hadoop.dfs.NameNode.Secondary: Exception in doCheckpoint:
> 2008-07-01 17:12:02,392 ERROR org.apache.hadoop.dfs.NameNode.Secondary: java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>         at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>         at java.net.Socket.connect(Socket.java:519)
>         at java.net.Socket.connect(Socket.java:469)
>         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>         at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>         at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
>         at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
>         at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
>         at org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:149)
>         at org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:188)
>         at org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.java:245)
>         at org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:310)
>         at org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>         at java.lang.Thread.run(Thread.java:619)

but the PrimaryNamenode looks good:

> 2008-07-01 17:17:02,034 INFO org.apache.hadoop.fs.FSNamesystem: Roll Edit Log from 172.20.11.102

anybody know how i can find the problem here?

-- 


mit freundlichen Gr��en

Christian Saar

................................................
Adacor Hosting GmbH
Kaiserleistrasse 51
D-63067 Offenbach am Main

Telefon  +49 (0)69 905089 2110
Telefax  +49 (0)69 905089 29
Email    saar@adacor.com

Zentrale:
Telefon +49 (0)69 905089 0
Telefax +49 (0)69 905089 29
Web     http://www.adacor.com

Amtsgericht Frankfurt am Main HRB 56690
Gesch�ftsf�hrer: Thomas Wittbecker, Andreas Bachmann, Patrick Fend

-------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail
in error) please notify the sender immediately and destroy this
e-mail. Any unauthorised copying, disclosure or distribution of the
contents of this e-mail is strictly prohibited.
-------------------------------------------------------------------

Re: Repeated Exceptions in SecondaryNamenode Log

Posted by Christian Saar <sa...@adacor.com>.
Thanks to you !

that was the problem.



박병선[3팀] schrieb:
> Could you check 'dfs.http.address' in configuration file?
> In my case(0.16.4), it was the problem.
> 
> I know it should set to the namenode, if your secondary namenode is separated.
> 
> 
> 
> -----Original Message-----
> From: Christian Saar [mailto:saar@adacor.com] 
> Sent: Wednesday, July 02, 2008 3:14 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Repeated Exceptions in SecondaryNamenode Log
> 
> Thanks vor answer,
> 
> I use Version 0.17 no update from older versions.
> 
> current:
> 
> 527  1. Jul 18:24 fsimage
> 
> image:
> 
> 157  1. Jul 18:24 fsimage
> 
> fs.checkpoint.dir is allways emty.
> 
> I think I had temporarily a rights problem in that path.
> 
> New Setup is an option.
> 
> greetings Christian
> 
> Konstantin Shvachko schrieb:
>> Which version of hadoop are you on?
>>
>> Could you please take a look at your main name-node storage directory 
>> and check whether the size of file current/fsimage is as expected 
>> compared to previous images?
>>
>> There was a bug (fixed in 0.16.3), which would create a bad image file 
>> if there is a transfer error. So be careful.
>> http://issues.apache.org/jira/browse/HADOOP-3069
>>
>> Thanks,
>> --Konstantin
>>
>> Christian Saar wrote:
>>> Hallo again,
>>>
>>> I have logged the network traffic with tcpdump and the 
>>> secondarynamenode connects to the namenode!?:
>>>
>>>> 18:40:39.323920 IP 172.20.11.102.53230 > 172.20.11.101.9000: S
>>>> 2827286710:2827286710(0) win 5840 <mss 1460,sackOK,timestamp
>>>> 1045811384 0,nop,wscale 7>
>>>> 18:40:39.324021 IP 172.20.11.101.9000 > 172.20.11.102.53230: S
>>>> 174737400:174737400(0) ack 2827286711 win 5792 <mss 
>>>> 1460,sackOK,timestamp 123707 1045811384,nop,wscale 7> 
>>>> 18:40:39.324030 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 1 
>>>> win 46 <nop,nop,timestamp 1045811384 123707>
>>>> 18:40:39.324647 IP 172.20.11.102.53230 > 172.20.11.101.9000: P
>>>> 1:168(167) ack 1 win 46 <nop,nop,timestamp 1045811384 123707>
>>>> 18:40:39.324747 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>>> 18:40:39.324756 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>>> 18:40:39.324769 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>>> 18:40:39.531873 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384> 
>>>> 18:40:39.531880 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>>> 20 win 46 <nop,nop,timestamp 1045811592 123915>
>>>> 18:40:39.531901 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>>>> 18:40:39.531905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>>> 20 win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 
>>>> {1:20}> 18:40:39.531910 IP 172.20.11.101.9000 > 172.20.11.102.53230: 
>>>> P
>>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>>>> 18:40:39.531914 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>>> 20 win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 
>>>> {1:20}>
>>>> 18:40:39.532245 IP 172.20.11.102.53230 > 172.20.11.101.9000: P
>>>> 168:193(25) ack 20 win 46 <nop,nop,timestamp 1045811592 123915>
>>>> 18:40:39.532311 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>>> 193 win 54 <nop,nop,timestamp 123915 1045811592> 18:40:39.532350 IP 
>>>> 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>>> 193 win 54 <nop,nop,timestamp 123915 1045811592,nop,nop,sack 1 
>>>> {168:193}>
>>>> 18:40:39.533398 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>>> 20:39(19) ack 193 win 54 <nop,nop,timestamp 123916 1045811592>
>>>> 18:40:39.573609 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>>> 39 win 46 <nop,nop,timestamp 1045811633 123916>
>>>> 18:40:41.485795 IP 172.20.11.102.53230 > 172.20.11.101.9000: F
>>>> 193:193(0) ack 39 win 46 <nop,nop,timestamp 1045813546 123916>
>>>> 18:40:41.485898 IP 172.20.11.101.9000 > 172.20.11.102.53230: F
>>>> 39:39(0) ack 194 win 54 <nop,nop,timestamp 125869 1045813546>
>>>> 18:40:41.485905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>>> 40 win 46 <nop,nop,timestamp 1045813546 125869>
>>>
>>>
>>>
>>> Christian Saar schrieb:
>>>> Hallo All,
>>>>
>>>> We have this Exception in our Logs:
>>>>
>>>>> 2008-07-01 17:12:02,392 ERROR
>>>>> org.apache.hadoop.dfs.NameNode.Secondary: Exception in doCheckpoint:
>>>>> 2008-07-01 17:12:02,392 ERROR
>>>>> org.apache.hadoop.dfs.NameNode.Secondary: java.net.ConnectException:
>>>>> Connection refused
>>>>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>>>>         at
>>>>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>>>>>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>>>         at java.net.Socket.connect(Socket.java:519)
>>>>>         at java.net.Socket.connect(Socket.java:469)
>>>>>         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
>>>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>>>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>>>>>         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>>>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>>>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>>>>>         at
>>>>> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpUR
>>>>> LConnection.java:788)
>>>>>
>>>>>         at
>>>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLCon
>>>>> nection.java:729)
>>>>>
>>>>>         at
>>>>> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnecti
>>>>> on.java:654)
>>>>>
>>>>>         at
>>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLC
>>>>> onnection.java:977)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage
>>>>> .java:149)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage
>>>>> .java:188)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNod
>>>>> e.java:245)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameN
>>>>> ode.java:310)
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>>>>>         at java.lang.Thread.run(Thread.java:619)
>>>> but the PrimaryNamenode looks good:
>>>>
>>>>> 2008-07-01 17:17:02,034 INFO org.apache.hadoop.fs.FSNamesystem: 
>>>>> Roll Edit Log from 172.20.11.102
>>>> anybody know how i can find the problem here?
>>>>
> 

-- 


mit freundlichen Grüßen

Christian Saar

................................................
Adacor Hosting GmbH
Kaiserleistrasse 51
D-63067 Offenbach am Main

Telefon  +49 (0)69 905089 2110
Telefax  +49 (0)69 905089 29
Email    saar@adacor.com

Zentrale:
Telefon +49 (0)69 905089 0
Telefax +49 (0)69 905089 29
Web     http://www.adacor.com

Amtsgericht Frankfurt am Main HRB 56690
Geschäftsführer: Thomas Wittbecker, Andreas Bachmann, Patrick Fend

-------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail
in error) please notify the sender immediately and destroy this
e-mail. Any unauthorised copying, disclosure or distribution of the
contents of this e-mail is strictly prohibited.
-------------------------------------------------------------------

RE: Repeated Exceptions in SecondaryNamenode Log

Posted by 박병선, , 3팀, , pa...@nhncorp.com.
Could you check 'dfs.http.address' in configuration file?
In my case(0.16.4), it was the problem.

I know it should set to the namenode, if your secondary namenode is separated.



-----Original Message-----
From: Christian Saar [mailto:saar@adacor.com] 
Sent: Wednesday, July 02, 2008 3:14 AM
To: core-user@hadoop.apache.org
Subject: Re: Repeated Exceptions in SecondaryNamenode Log

Thanks vor answer,

I use Version 0.17 no update from older versions.

current:

527  1. Jul 18:24 fsimage

image:

157  1. Jul 18:24 fsimage

fs.checkpoint.dir is allways emty.

I think I had temporarily a rights problem in that path.

New Setup is an option.

greetings Christian

Konstantin Shvachko schrieb:
> Which version of hadoop are you on?
> 
> Could you please take a look at your main name-node storage directory 
> and check whether the size of file current/fsimage is as expected 
> compared to previous images?
> 
> There was a bug (fixed in 0.16.3), which would create a bad image file 
> if there is a transfer error. So be careful.
> http://issues.apache.org/jira/browse/HADOOP-3069
> 
> Thanks,
> --Konstantin
> 
> Christian Saar wrote:
>> Hallo again,
>>
>> I have logged the network traffic with tcpdump and the 
>> secondarynamenode connects to the namenode!?:
>>
>>> 18:40:39.323920 IP 172.20.11.102.53230 > 172.20.11.101.9000: S
>>> 2827286710:2827286710(0) win 5840 <mss 1460,sackOK,timestamp
>>> 1045811384 0,nop,wscale 7>
>>> 18:40:39.324021 IP 172.20.11.101.9000 > 172.20.11.102.53230: S
>>> 174737400:174737400(0) ack 2827286711 win 5792 <mss 
>>> 1460,sackOK,timestamp 123707 1045811384,nop,wscale 7> 
>>> 18:40:39.324030 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 1 
>>> win 46 <nop,nop,timestamp 1045811384 123707>
>>> 18:40:39.324647 IP 172.20.11.102.53230 > 172.20.11.101.9000: P
>>> 1:168(167) ack 1 win 46 <nop,nop,timestamp 1045811384 123707>
>>> 18:40:39.324747 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>> 18:40:39.324756 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>> 18:40:39.324769 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>> 18:40:39.531873 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384> 
>>> 18:40:39.531880 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>> 20 win 46 <nop,nop,timestamp 1045811592 123915>
>>> 18:40:39.531901 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>>> 18:40:39.531905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>> 20 win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 
>>> {1:20}> 18:40:39.531910 IP 172.20.11.101.9000 > 172.20.11.102.53230: 
>>> P
>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>>> 18:40:39.531914 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>> 20 win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 
>>> {1:20}>
>>> 18:40:39.532245 IP 172.20.11.102.53230 > 172.20.11.101.9000: P
>>> 168:193(25) ack 20 win 46 <nop,nop,timestamp 1045811592 123915>
>>> 18:40:39.532311 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 193 win 54 <nop,nop,timestamp 123915 1045811592> 18:40:39.532350 IP 
>>> 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 193 win 54 <nop,nop,timestamp 123915 1045811592,nop,nop,sack 1 
>>> {168:193}>
>>> 18:40:39.533398 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>> 20:39(19) ack 193 win 54 <nop,nop,timestamp 123916 1045811592>
>>> 18:40:39.573609 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>> 39 win 46 <nop,nop,timestamp 1045811633 123916>
>>> 18:40:41.485795 IP 172.20.11.102.53230 > 172.20.11.101.9000: F
>>> 193:193(0) ack 39 win 46 <nop,nop,timestamp 1045813546 123916>
>>> 18:40:41.485898 IP 172.20.11.101.9000 > 172.20.11.102.53230: F
>>> 39:39(0) ack 194 win 54 <nop,nop,timestamp 125869 1045813546>
>>> 18:40:41.485905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 
>>> 40 win 46 <nop,nop,timestamp 1045813546 125869>
>>
>>
>>
>>
>> Christian Saar schrieb:
>>> Hallo All,
>>>
>>> We have this Exception in our Logs:
>>>
>>>> 2008-07-01 17:12:02,392 ERROR
>>>> org.apache.hadoop.dfs.NameNode.Secondary: Exception in doCheckpoint:
>>>> 2008-07-01 17:12:02,392 ERROR
>>>> org.apache.hadoop.dfs.NameNode.Secondary: java.net.ConnectException:
>>>> Connection refused
>>>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>>>         at
>>>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>>>>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>>         at java.net.Socket.connect(Socket.java:519)
>>>>         at java.net.Socket.connect(Socket.java:469)
>>>>         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
>>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>>>>         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>>>>         at
>>>> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpUR
>>>> LConnection.java:788)
>>>>
>>>>         at
>>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLCon
>>>> nection.java:729)
>>>>
>>>>         at
>>>> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnecti
>>>> on.java:654)
>>>>
>>>>         at
>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLC
>>>> onnection.java:977)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage
>>>> .java:149)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage
>>>> .java:188)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNod
>>>> e.java:245)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameN
>>>> ode.java:310)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>>>>         at java.lang.Thread.run(Thread.java:619)
>>> but the PrimaryNamenode looks good:
>>>
>>>> 2008-07-01 17:17:02,034 INFO org.apache.hadoop.fs.FSNamesystem: 
>>>> Roll Edit Log from 172.20.11.102
>>> anybody know how i can find the problem here?
>>>
>>

-- 


mit freundlichen Grüßen

Christian Saar

................................................
Adacor Hosting GmbH
Kaiserleistrasse 51
D-63067 Offenbach am Main

Telefon  +49 (0)69 905089 2110
Telefax  +49 (0)69 905089 29
Email    saar@adacor.com

Zentrale:
Telefon +49 (0)69 905089 0
Telefax +49 (0)69 905089 29
Web     http://www.adacor.com

Amtsgericht Frankfurt am Main HRB 56690
Geschäftsführer: Thomas Wittbecker, Andreas Bachmann, Patrick Fend

-------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the contents of this e-mail is strictly prohibited.
-------------------------------------------------------------------

Re: Repeated Exceptions in SecondaryNamenode Log

Posted by Christian Saar <sa...@adacor.com>.
Thanks vor answer,

I use Version 0.17 no update from older versions.

current:

527  1. Jul 18:24 fsimage

image:

157  1. Jul 18:24 fsimage

fs.checkpoint.dir is allways emty.

I think I had temporarily a rights problem in that path.

New Setup is an option.

greetings Christian

Konstantin Shvachko schrieb:
> Which version of hadoop are you on?
> 
> Could you please take a look at your main name-node storage directory
> and check whether the size of file current/fsimage is as expected compared
> to previous images?
> 
> There was a bug (fixed in 0.16.3), which would create a bad image file
> if there is a transfer error. So be careful.
> http://issues.apache.org/jira/browse/HADOOP-3069
> 
> Thanks,
> --Konstantin
> 
> Christian Saar wrote:
>> Hallo again,
>>
>> I have logged the network traffic with tcpdump and the secondarynamenode
>> connects to the namenode!?:
>>
>>> 18:40:39.323920 IP 172.20.11.102.53230 > 172.20.11.101.9000: S
>>> 2827286710:2827286710(0) win 5840 <mss 1460,sackOK,timestamp
>>> 1045811384 0,nop,wscale 7>
>>> 18:40:39.324021 IP 172.20.11.101.9000 > 172.20.11.102.53230: S
>>> 174737400:174737400(0) ack 2827286711 win 5792 <mss
>>> 1460,sackOK,timestamp 123707 1045811384,nop,wscale 7>
>>> 18:40:39.324030 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 1
>>> win 46 <nop,nop,timestamp 1045811384 123707>
>>> 18:40:39.324647 IP 172.20.11.102.53230 > 172.20.11.101.9000: P
>>> 1:168(167) ack 1 win 46 <nop,nop,timestamp 1045811384 123707>
>>> 18:40:39.324747 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>> 18:40:39.324756 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>> 18:40:39.324769 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 168 win 54 <nop,nop,timestamp 123708 1045811384>
>>> 18:40:39.531873 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>>> 18:40:39.531880 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20
>>> win 46 <nop,nop,timestamp 1045811592 123915>
>>> 18:40:39.531901 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>>> 18:40:39.531905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20
>>> win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 {1:20}>
>>> 18:40:39.531910 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>> 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>>> 18:40:39.531914 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20
>>> win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 {1:20}>
>>> 18:40:39.532245 IP 172.20.11.102.53230 > 172.20.11.101.9000: P
>>> 168:193(25) ack 20 win 46 <nop,nop,timestamp 1045811592 123915>
>>> 18:40:39.532311 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 193 win 54 <nop,nop,timestamp 123915 1045811592>
>>> 18:40:39.532350 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack
>>> 193 win 54 <nop,nop,timestamp 123915 1045811592,nop,nop,sack 1
>>> {168:193}>
>>> 18:40:39.533398 IP 172.20.11.101.9000 > 172.20.11.102.53230: P
>>> 20:39(19) ack 193 win 54 <nop,nop,timestamp 123916 1045811592>
>>> 18:40:39.573609 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 39
>>> win 46 <nop,nop,timestamp 1045811633 123916>
>>> 18:40:41.485795 IP 172.20.11.102.53230 > 172.20.11.101.9000: F
>>> 193:193(0) ack 39 win 46 <nop,nop,timestamp 1045813546 123916>
>>> 18:40:41.485898 IP 172.20.11.101.9000 > 172.20.11.102.53230: F
>>> 39:39(0) ack 194 win 54 <nop,nop,timestamp 125869 1045813546>
>>> 18:40:41.485905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 40
>>> win 46 <nop,nop,timestamp 1045813546 125869>
>>
>>
>>
>>
>> Christian Saar schrieb:
>>> Hallo All,
>>>
>>> We have this Exception in our Logs:
>>>
>>>> 2008-07-01 17:12:02,392 ERROR
>>>> org.apache.hadoop.dfs.NameNode.Secondary: Exception in doCheckpoint:
>>>> 2008-07-01 17:12:02,392 ERROR
>>>> org.apache.hadoop.dfs.NameNode.Secondary: java.net.ConnectException:
>>>> Connection refused
>>>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>>>         at
>>>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>>>>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>>         at java.net.Socket.connect(Socket.java:519)
>>>>         at java.net.Socket.connect(Socket.java:469)
>>>>         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
>>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>>>>         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>>>>         at
>>>> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
>>>>
>>>>         at
>>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
>>>>
>>>>         at
>>>> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
>>>>
>>>>         at
>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:149)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:188)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.java:245)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:310)
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>>>>         at java.lang.Thread.run(Thread.java:619)
>>> but the PrimaryNamenode looks good:
>>>
>>>> 2008-07-01 17:17:02,034 INFO org.apache.hadoop.fs.FSNamesystem: Roll
>>>> Edit Log from 172.20.11.102
>>> anybody know how i can find the problem here?
>>>
>>

-- 


mit freundlichen Gr��en

Christian Saar

................................................
Adacor Hosting GmbH
Kaiserleistrasse 51
D-63067 Offenbach am Main

Telefon  +49 (0)69 905089 2110
Telefax  +49 (0)69 905089 29
Email    saar@adacor.com

Zentrale:
Telefon +49 (0)69 905089 0
Telefax +49 (0)69 905089 29
Web     http://www.adacor.com

Amtsgericht Frankfurt am Main HRB 56690
Gesch�ftsf�hrer: Thomas Wittbecker, Andreas Bachmann, Patrick Fend

-------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail
in error) please notify the sender immediately and destroy this
e-mail. Any unauthorised copying, disclosure or distribution of the
contents of this e-mail is strictly prohibited.
-------------------------------------------------------------------

Re: Repeated Exceptions in SecondaryNamenode Log

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Which version of hadoop are you on?

Could you please take a look at your main name-node storage directory
and check whether the size of file current/fsimage is as expected compared
to previous images?

There was a bug (fixed in 0.16.3), which would create a bad image file
if there is a transfer error. So be careful.
http://issues.apache.org/jira/browse/HADOOP-3069

Thanks,
--Konstantin

Christian Saar wrote:
> Hallo again,
> 
> I have logged the network traffic with tcpdump and the secondarynamenode
> connects to the namenode!?:
> 
>> 18:40:39.323920 IP 172.20.11.102.53230 > 172.20.11.101.9000: S 2827286710:2827286710(0) win 5840 <mss 1460,sackOK,timestamp 1045811384 0,nop,wscale 7>
>> 18:40:39.324021 IP 172.20.11.101.9000 > 172.20.11.102.53230: S 174737400:174737400(0) ack 2827286711 win 5792 <mss 1460,sackOK,timestamp 123707 1045811384,nop,wscale 7>
>> 18:40:39.324030 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 1 win 46 <nop,nop,timestamp 1045811384 123707>
>> 18:40:39.324647 IP 172.20.11.102.53230 > 172.20.11.101.9000: P 1:168(167) ack 1 win 46 <nop,nop,timestamp 1045811384 123707>
>> 18:40:39.324747 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54 <nop,nop,timestamp 123708 1045811384>
>> 18:40:39.324756 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54 <nop,nop,timestamp 123708 1045811384>
>> 18:40:39.324769 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54 <nop,nop,timestamp 123708 1045811384>
>> 18:40:39.531873 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>> 18:40:39.531880 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp 1045811592 123915>
>> 18:40:39.531901 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>> 18:40:39.531905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 {1:20}>
>> 18:40:39.531910 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
>> 18:40:39.531914 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 {1:20}>
>> 18:40:39.532245 IP 172.20.11.102.53230 > 172.20.11.101.9000: P 168:193(25) ack 20 win 46 <nop,nop,timestamp 1045811592 123915>
>> 18:40:39.532311 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 193 win 54 <nop,nop,timestamp 123915 1045811592>
>> 18:40:39.532350 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 193 win 54 <nop,nop,timestamp 123915 1045811592,nop,nop,sack 1 {168:193}>
>> 18:40:39.533398 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 20:39(19) ack 193 win 54 <nop,nop,timestamp 123916 1045811592>
>> 18:40:39.573609 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 39 win 46 <nop,nop,timestamp 1045811633 123916>
>> 18:40:41.485795 IP 172.20.11.102.53230 > 172.20.11.101.9000: F 193:193(0) ack 39 win 46 <nop,nop,timestamp 1045813546 123916>
>> 18:40:41.485898 IP 172.20.11.101.9000 > 172.20.11.102.53230: F 39:39(0) ack 194 win 54 <nop,nop,timestamp 125869 1045813546>
>> 18:40:41.485905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 40 win 46 <nop,nop,timestamp 1045813546 125869>
> 
> 
> 
> 
> Christian Saar schrieb:
>> Hallo All,
>>
>> We have this Exception in our Logs:
>>
>>> 2008-07-01 17:12:02,392 ERROR org.apache.hadoop.dfs.NameNode.Secondary: Exception in doCheckpoint:
>>> 2008-07-01 17:12:02,392 ERROR org.apache.hadoop.dfs.NameNode.Secondary: java.net.ConnectException: Connection refused
>>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>>         at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>>>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>         at java.net.Socket.connect(Socket.java:519)
>>>         at java.net.Socket.connect(Socket.java:469)
>>>         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>>>         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>>>         at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
>>>         at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
>>>         at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
>>>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
>>>         at org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:149)
>>>         at org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:188)
>>>         at org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.java:245)
>>>         at org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:310)
>>>         at org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>>>         at java.lang.Thread.run(Thread.java:619)
>> but the PrimaryNamenode looks good:
>>
>>> 2008-07-01 17:17:02,034 INFO org.apache.hadoop.fs.FSNamesystem: Roll Edit Log from 172.20.11.102
>> anybody know how i can find the problem here?
>>
> 

Re: Repeated Exceptions in SecondaryNamenode Log

Posted by Christian Saar <sa...@adacor.com>.
Hallo again,

I have logged the network traffic with tcpdump and the secondarynamenode
connects to the namenode!?:

> 18:40:39.323920 IP 172.20.11.102.53230 > 172.20.11.101.9000: S 2827286710:2827286710(0) win 5840 <mss 1460,sackOK,timestamp 1045811384 0,nop,wscale 7>
> 18:40:39.324021 IP 172.20.11.101.9000 > 172.20.11.102.53230: S 174737400:174737400(0) ack 2827286711 win 5792 <mss 1460,sackOK,timestamp 123707 1045811384,nop,wscale 7>
> 18:40:39.324030 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 1 win 46 <nop,nop,timestamp 1045811384 123707>
> 18:40:39.324647 IP 172.20.11.102.53230 > 172.20.11.101.9000: P 1:168(167) ack 1 win 46 <nop,nop,timestamp 1045811384 123707>
> 18:40:39.324747 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54 <nop,nop,timestamp 123708 1045811384>
> 18:40:39.324756 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54 <nop,nop,timestamp 123708 1045811384>
> 18:40:39.324769 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54 <nop,nop,timestamp 123708 1045811384>
> 18:40:39.531873 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
> 18:40:39.531880 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp 1045811592 123915>
> 18:40:39.531901 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
> 18:40:39.531905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 {1:20}>
> 18:40:39.531910 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168 win 54 <nop,nop,timestamp 123915 1045811384>
> 18:40:39.531914 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp 1045811592 123915,nop,nop,sack 1 {1:20}>
> 18:40:39.532245 IP 172.20.11.102.53230 > 172.20.11.101.9000: P 168:193(25) ack 20 win 46 <nop,nop,timestamp 1045811592 123915>
> 18:40:39.532311 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 193 win 54 <nop,nop,timestamp 123915 1045811592>
> 18:40:39.532350 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 193 win 54 <nop,nop,timestamp 123915 1045811592,nop,nop,sack 1 {168:193}>
> 18:40:39.533398 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 20:39(19) ack 193 win 54 <nop,nop,timestamp 123916 1045811592>
> 18:40:39.573609 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 39 win 46 <nop,nop,timestamp 1045811633 123916>
> 18:40:41.485795 IP 172.20.11.102.53230 > 172.20.11.101.9000: F 193:193(0) ack 39 win 46 <nop,nop,timestamp 1045813546 123916>
> 18:40:41.485898 IP 172.20.11.101.9000 > 172.20.11.102.53230: F 39:39(0) ack 194 win 54 <nop,nop,timestamp 125869 1045813546>
> 18:40:41.485905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 40 win 46 <nop,nop,timestamp 1045813546 125869>




Christian Saar schrieb:
> Hallo All,
> 
> We have this Exception in our Logs:
> 
>> 2008-07-01 17:12:02,392 ERROR org.apache.hadoop.dfs.NameNode.Secondary: Exception in doCheckpoint:
>> 2008-07-01 17:12:02,392 ERROR org.apache.hadoop.dfs.NameNode.Secondary: java.net.ConnectException: Connection refused
>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>         at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>         at java.net.Socket.connect(Socket.java:519)
>>         at java.net.Socket.connect(Socket.java:469)
>>         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>>         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>>         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>>         at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>>         at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
>>         at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
>>         at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
>>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
>>         at org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:149)
>>         at org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:188)
>>         at org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.java:245)
>>         at org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:310)
>>         at org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>>         at java.lang.Thread.run(Thread.java:619)
> 
> but the PrimaryNamenode looks good:
> 
>> 2008-07-01 17:17:02,034 INFO org.apache.hadoop.fs.FSNamesystem: Roll Edit Log from 172.20.11.102
> 
> anybody know how i can find the problem here?
> 

-- 


mit freundlichen Gr��en

Christian Saar

................................................
Adacor Hosting GmbH
Kaiserleistrasse 51
D-63067 Offenbach am Main

Telefon  +49 (0)69 905089 2110
Telefax  +49 (0)69 905089 29
Email    saar@adacor.com

Zentrale:
Telefon +49 (0)69 905089 0
Telefax +49 (0)69 905089 29
Web     http://www.adacor.com

Amtsgericht Frankfurt am Main HRB 56690
Gesch�ftsf�hrer: Thomas Wittbecker, Andreas Bachmann, Patrick Fend

-------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail
in error) please notify the sender immediately and destroy this
e-mail. Any unauthorised copying, disclosure or distribution of the
contents of this e-mail is strictly prohibited.
-------------------------------------------------------------------