You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Harris <Mi...@Telespree.com> on 2007/12/03 19:06:17 UTC

DFS Datanodes are suddenly "not formatted"

I have a problem with the datanodes. I shutdown DFS and Mapred on Friday
for my cluster and then when I started them up on Monday it remained in
safe mode listing two of the datanodes with no blocks. Then when I
checked the logs on the datanodes the log said that the data directory
was not formatted. It preceded to format them and I suppose erased all
blocks stored there. I did not have a high enough replication factor for
both of these to go down so my DFS was ruined. Is this because the
datanodes are storing data in the tmp directory? ... If so how can I
change that directory?

 

2007-12-03 09:24:08,299 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null

2007-12-03 09:24:08,395 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:09,453 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
time(s).

2007-12-03 09:24:10,476 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
time(s).

2007-12-03 09:24:11,478 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
time(s).

2007-12-03 09:24:12,683 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
time(s).

2007-12-03 09:24:13,716 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
time(s).

2007-12-03 09:24:14,806 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
time(s).

2007-12-03 09:24:15,855 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
time(s).

2007-12-03 09:24:16,916 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
time(s).

2007-12-03 09:24:18,295 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
time(s).

2007-12-03 09:24:19,298 INFO org.apache.hadoop.ipc.RPC: Server at
mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

2007-12-03 09:24:20,391 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:21,403 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
time(s).

2007-12-03 09:24:22,431 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
time(s).

2007-12-03 09:24:23,515 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
time(s).

2007-12-03 09:24:24,544 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
time(s).

2007-12-03 09:24:26,065 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
time(s).

2007-12-03 09:24:27,068 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
time(s).

2007-12-03 09:24:28,230 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
time(s).

2007-12-03 09:24:29,411 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
time(s).

2007-12-03 09:24:30,431 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
time(s).

2007-12-03 09:24:31,504 INFO org.apache.hadoop.ipc.RPC: Server at
mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

2007-12-03 09:24:32,508 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Storage
directory /tmp/hadoop-hadoop/dfs/data is not formatted.

2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Formatting
...

2007-12-03 09:24:52,741 INFO org.apache.hadoop.dfs.DataNode: Opened
server at 50010

2007-12-03 09:24:52,794 INFO org.mortbay.util.Credential: Checking
Resource aliases

2007-12-03 09:24:52,827 INFO org.mortbay.http.HttpServer: Version
Jetty/5.1.4

2007-12-03 09:24:53,086 INFO org.mortbay.util.Container: Started
org.mortbay.jetty.servlet.WebApplicationHandler@1f3ce5c

2007-12-03 09:24:53,116 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]

2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]

2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]

2007-12-03 09:24:53,118 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50075

2007-12-03 09:24:53,118 INFO org.mortbay.util.Container: Started
org.mortbay.jetty.Server@ee22f7

2007-12-03 09:24:53,148 INFO org.apache.hadoop.dfs.DataNode: New storage
id DS-1588572895-172.18.2.23-50010-1196702693143 is assigned to
data-node 172.18.2.23:50010

2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: In
DataNode.run, data =
FSDataset{dirpath='/tmp/hadoop-hadoop/dfs/data/current'}

2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: using
BLOCKREPORT_INTERVAL of 3463518msec

2007-12-03 09:31:23,420 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:31:23,447 INFO org.apache.hadoop.dfs.DataNode:
SHUTDOWN_MSG:

 

Thanks,

Michael


Re: DFS Datanodes are suddenly "not formatted"

Posted by Michael Bieniosek <mi...@powerset.com>.
In your hadoop-site.xml, you can set

  <property>
    <name>hadoop.tmp.dir</name>
    <value>/hadoop</value>
  </property>

This will put all the hadoop stuff in /hadoop.  By default, this directory is /tmp/hadoop-$USER, which is probably worth a bug report.

-Michael

On 12/6/07 10:31 AM, "Michael Harris" <Mi...@Telespree.com> wrote:

No one has responded to my question... this incident has really eroded
my trust in the stability of Hadoop and the safety of the data stored in
the DFS. Am I just doing something really obvious/stupid or do people
need more information before they can look into the problem? Is this
related to the fact that the Namenode took a very long time to respond
to the Datanode's requests (the Namenode is temporarily running in a VM
so its quite slow).

-Michael

-----Original Message-----
From: Michael Harris [mailto:MichaelH@Telespree.com]
Sent: Monday, December 03, 2007 10:06 AM
To: hadoop-user@lucene.apache.org
Subject: DFS Datanodes are suddenly "not formatted"

I have a problem with the datanodes. I shutdown DFS and Mapred on Friday
for my cluster and then when I started them up on Monday it remained in
safe mode listing two of the datanodes with no blocks. Then when I
checked the logs on the datanodes the log said that the data directory
was not formatted. It preceded to format them and I suppose erased all
blocks stored there. I did not have a high enough replication factor for
both of these to go down so my DFS was ruined. Is this because the
datanodes are storing data in the tmp directory? ... If so how can I
change that directory?



2007-12-03 09:24:08,299 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null

2007-12-03 09:24:08,395 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:09,453 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
time(s).

2007-12-03 09:24:10,476 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
time(s).

2007-12-03 09:24:11,478 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
time(s).

2007-12-03 09:24:12,683 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
time(s).

2007-12-03 09:24:13,716 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
time(s).

2007-12-03 09:24:14,806 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
time(s).

2007-12-03 09:24:15,855 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
time(s).

2007-12-03 09:24:16,916 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
time(s).

2007-12-03 09:24:18,295 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
time(s).

2007-12-03 09:24:19,298 INFO org.apache.hadoop.ipc.RPC: Server at
mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

2007-12-03 09:24:20,391 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:21,403 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
time(s).

2007-12-03 09:24:22,431 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
time(s).

2007-12-03 09:24:23,515 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
time(s).

2007-12-03 09:24:24,544 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
time(s).

2007-12-03 09:24:26,065 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
time(s).

2007-12-03 09:24:27,068 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
time(s).

2007-12-03 09:24:28,230 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
time(s).

2007-12-03 09:24:29,411 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
time(s).

2007-12-03 09:24:30,431 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
time(s).

2007-12-03 09:24:31,504 INFO org.apache.hadoop.ipc.RPC: Server at
mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

2007-12-03 09:24:32,508 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Storage
directory /tmp/hadoop-hadoop/dfs/data is not formatted.

2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Formatting
...

2007-12-03 09:24:52,741 INFO org.apache.hadoop.dfs.DataNode: Opened
server at 50010

2007-12-03 09:24:52,794 INFO org.mortbay.util.Credential: Checking
Resource aliases

2007-12-03 09:24:52,827 INFO org.mortbay.http.HttpServer: Version
Jetty/5.1.4

2007-12-03 09:24:53,086 INFO org.mortbay.util.Container: Started
org.mortbay.jetty.servlet.WebApplicationHandler@1f3ce5c

2007-12-03 09:24:53,116 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]

2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]

2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]

2007-12-03 09:24:53,118 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50075

2007-12-03 09:24:53,118 INFO org.mortbay.util.Container: Started
org.mortbay.jetty.Server@ee22f7

2007-12-03 09:24:53,148 INFO org.apache.hadoop.dfs.DataNode: New storage
id DS-1588572895-172.18.2.23-50010-1196702693143 is assigned to
data-node 172.18.2.23:50010

2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: In
DataNode.run, data =
FSDataset{dirpath='/tmp/hadoop-hadoop/dfs/data/current'}

2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: using
BLOCKREPORT_INTERVAL of 3463518msec

2007-12-03 09:31:23,420 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:31:23,447 INFO org.apache.hadoop.dfs.DataNode:
SHUTDOWN_MSG:



Thanks,

Michael




RE: DFS Datanodes are suddenly "not formatted"

Posted by Michael Harris <Mi...@Telespree.com>.
No one has responded to my question... this incident has really eroded
my trust in the stability of Hadoop and the safety of the data stored in
the DFS. Am I just doing something really obvious/stupid or do people
need more information before they can look into the problem? Is this
related to the fact that the Namenode took a very long time to respond
to the Datanode's requests (the Namenode is temporarily running in a VM
so its quite slow).

-Michael

-----Original Message-----
From: Michael Harris [mailto:MichaelH@Telespree.com] 
Sent: Monday, December 03, 2007 10:06 AM
To: hadoop-user@lucene.apache.org
Subject: DFS Datanodes are suddenly "not formatted"

I have a problem with the datanodes. I shutdown DFS and Mapred on Friday
for my cluster and then when I started them up on Monday it remained in
safe mode listing two of the datanodes with no blocks. Then when I
checked the logs on the datanodes the log said that the data directory
was not formatted. It preceded to format them and I suppose erased all
blocks stored there. I did not have a high enough replication factor for
both of these to go down so my DFS was ruined. Is this because the
datanodes are storing data in the tmp directory? ... If so how can I
change that directory?

 

2007-12-03 09:24:08,299 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null

2007-12-03 09:24:08,395 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:09,453 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
time(s).

2007-12-03 09:24:10,476 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
time(s).

2007-12-03 09:24:11,478 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
time(s).

2007-12-03 09:24:12,683 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
time(s).

2007-12-03 09:24:13,716 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
time(s).

2007-12-03 09:24:14,806 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
time(s).

2007-12-03 09:24:15,855 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
time(s).

2007-12-03 09:24:16,916 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
time(s).

2007-12-03 09:24:18,295 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
time(s).

2007-12-03 09:24:19,298 INFO org.apache.hadoop.ipc.RPC: Server at
mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

2007-12-03 09:24:20,391 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:21,403 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 2
time(s).

2007-12-03 09:24:22,431 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 3
time(s).

2007-12-03 09:24:23,515 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 4
time(s).

2007-12-03 09:24:24,544 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 5
time(s).

2007-12-03 09:24:26,065 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 6
time(s).

2007-12-03 09:24:27,068 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 7
time(s).

2007-12-03 09:24:28,230 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 8
time(s).

2007-12-03 09:24:29,411 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 9
time(s).

2007-12-03 09:24:30,431 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 10
time(s).

2007-12-03 09:24:31,504 INFO org.apache.hadoop.ipc.RPC: Server at
mh0.telespree.com/172.18.1.80:54310 not available yet, Zzzzz...

2007-12-03 09:24:32,508 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Storage
directory /tmp/hadoop-hadoop/dfs/data is not formatted.

2007-12-03 09:24:49,604 INFO org.apache.hadoop.dfs.Storage: Formatting
...

2007-12-03 09:24:52,741 INFO org.apache.hadoop.dfs.DataNode: Opened
server at 50010

2007-12-03 09:24:52,794 INFO org.mortbay.util.Credential: Checking
Resource aliases

2007-12-03 09:24:52,827 INFO org.mortbay.http.HttpServer: Version
Jetty/5.1.4

2007-12-03 09:24:53,086 INFO org.mortbay.util.Container: Started
org.mortbay.jetty.servlet.WebApplicationHandler@1f3ce5c

2007-12-03 09:24:53,116 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]

2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]

2007-12-03 09:24:53,117 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]

2007-12-03 09:24:53,118 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50075

2007-12-03 09:24:53,118 INFO org.mortbay.util.Container: Started
org.mortbay.jetty.Server@ee22f7

2007-12-03 09:24:53,148 INFO org.apache.hadoop.dfs.DataNode: New storage
id DS-1588572895-172.18.2.23-50010-1196702693143 is assigned to
data-node 172.18.2.23:50010

2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: In
DataNode.run, data =
FSDataset{dirpath='/tmp/hadoop-hadoop/dfs/data/current'}

2007-12-03 09:24:53,149 INFO org.apache.hadoop.dfs.DataNode: using
BLOCKREPORT_INTERVAL of 3463518msec

2007-12-03 09:31:23,420 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: mh0.telespree.com/172.18.1.80:54310. Already tried 1
time(s).

2007-12-03 09:31:23,447 INFO org.apache.hadoop.dfs.DataNode:
SHUTDOWN_MSG:

 

Thanks,

Michael