You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Peter Sheridan <ps...@millennialmedia.com> on 2013/02/01 17:40:22 UTC

Advice on post mortem of data loss (v 1.0.3)

Yesterday, I bounced my DFS cluster.  We realized that "ulimit –u" was, in extreme cases, preventing the name node from creating threads.  This had only started occurring within the last day or so.  When I brought the name node back up, it had essentially been rolled back by one week, and I lost all changes which had been made since then.

There are a few other factors to consider.

  1.  I had 3 locations for dfs.name.dir — one local and two NFS.  (I originally thought this was 2 local and one NFS when I set it up.)  On 1/24, the day which we effectively rolled back to, the second NFS mount started showing as FAILED on dfshealth.jsp.  We had seen this before without issue, so I didn't consider it critical.
  2.  When I brought the name node back up, because of discovering the above, I had changed dfs.name.dir to 2 local drives and one NFS, excluding the one which had failed.

Reviewing the name node log from the day with the NFS outage, I see:

2013-01-24 16:33:11,794 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit log.
java.io.IOException: Input/output error
        at sun.nio.ch.FileChannelImpl.force0(Native Method)
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.flushAndSync(FSEditLog.java:215)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:89)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1015)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1666)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
2013-01-24 16:33:11,794 WARN org.apache.hadoop.hdfs.server.common.Storage: Removing storage dir /rdisks/xxxxxxxxxxxxxx


Unfortunately, since I wasn't expecting anything terrible to happen, I didn't look too closely at the file system while the name node was down.  When I brought it up, the time stamp on the previous checkpoint directory in the dfs.name.dir was right around the above error message.  The current directory basically had an fsimage and an empty edits log with the current time stamps.

So: what happened?  Should this failure have led to my data loss?  I would have thought the local directory would be fine in this scenario.  Did I have any other options for data recovery?

Thanks.


--Pete


Re: Advice on post mortem of data loss (v 1.0.3)

Posted by Suresh Srinivas <su...@hortonworks.com>.
Sorry to hear you are having issues. Few questions and comments inline.


On Fri, Feb 1, 2013 at 8:40 AM, Peter Sheridan <
psheridan@millennialmedia.com> wrote:

>  Yesterday, I bounced my DFS cluster.  We realized that "ulimit –u" was,
> in extreme cases, preventing the name node from creating threads.  This had
> only started occurring within the last day or so.  When I brought the name
> node back up, it had essentially been rolled back by one week, and I lost
> all changes which had been made since then.
>
>  There are a few other factors to consider.
>
>    1. I had 3 locations for dfs.name.dir — one local and two NFS.  (I
>    originally thought this was 2 local and one NFS when I set it up.)  On
>    1/24, the day which we effectively rolled back to, the second NFS mount
>    started showing as FAILED on dfshealth.jsp.  We had seen this before
>    without issue, so I didn't consider it critical.
>
> What do you mean by "rolled back to"?
I understand this so far has you have three dirs: l1, nfs1 and nfs2. (l for
local disk and nfs for NFS). nfs2 was shown as failed.

>
>    1. When I brought the name node back up, because of discovering the
>    above, I had changed dfs.name.dir to 2 local drives and one NFS, excluding
>    the one which had failed.
>
> When you brought the namenode backup, with the changed configuration you
have l1, l2 and nfs1. Given you have not seen any failures, l1 and nfs1
have the latest edits so far. Correct? How did you add l2? Can you describe
this procedure in detail?


> Reviewing the name node log from the day with the NFS outage, I see:
>

When you say NFS outage here, this is the failure corresponding to nfs2
from above. Is that correct?


>
>  2013-01-24 16:33:11,794 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit
> log.
> java.io.IOException: Input/output error
>         at sun.nio.ch.FileChannelImpl.force0(Native Method)
>         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.flushAndSync(FSEditLog.java:215)
>         at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:89)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1015)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1666)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> 2013-01-24 16:33:11,794 WARN org.apache.hadoop.hdfs.server.common.Storage:
> Removing storage dir /rdisks/xxxxxxxxxxxxxx
>
>
>  Unfortunately, since I wasn't expecting anything terrible to happen, I
> didn't look too closely at the file system while the name node was down.
>  When I brought it up, the time stamp on the previous checkpoint directory
> in the dfs.name.dir was right around the above error message.  The current
> directory basically had an fsimage and an empty edits log with the current
> time stamps.
>

Which storage directory are you talking about here?

>
>  So: what happened?  Should this failure have led to my data loss?  I
> would have thought the local directory would be fine in this scenario.  Did
> I have any other options for data recovery?
>

I am not sure how you concluded that you lost a week's data and the
namenode rolled back by one week? Please share the namenode logs
corresponding to the restart.

This is how it should have worked.
- When nfs2 was removed, on both l1 and nfs1 a timestamp is recorded,
corresponding to removal of a storage directory.
- If there is any checkpointing that happened, it would have also
incremented the timestamp.
- When the namenode starts up, it chooses l1 and nfs1 because the recorded
timestamp is the latest on these directories and loads fsimage and edits
from those directories. Namenode also performs checkpoint and writes new
consolidated image on l1, l2 and nfs1 and creates empty editlog on l1, l2
and nfs1.

If you provide more details on how l2 was added, we may be able to
understand what happened.

Regards,
Suresh


-- 
http://hortonworks.com/download/

Re: Advice on post mortem of data loss (v 1.0.3)

Posted by Suresh Srinivas <su...@hortonworks.com>.
Sorry to hear you are having issues. Few questions and comments inline.


On Fri, Feb 1, 2013 at 8:40 AM, Peter Sheridan <
psheridan@millennialmedia.com> wrote:

>  Yesterday, I bounced my DFS cluster.  We realized that "ulimit –u" was,
> in extreme cases, preventing the name node from creating threads.  This had
> only started occurring within the last day or so.  When I brought the name
> node back up, it had essentially been rolled back by one week, and I lost
> all changes which had been made since then.
>
>  There are a few other factors to consider.
>
>    1. I had 3 locations for dfs.name.dir — one local and two NFS.  (I
>    originally thought this was 2 local and one NFS when I set it up.)  On
>    1/24, the day which we effectively rolled back to, the second NFS mount
>    started showing as FAILED on dfshealth.jsp.  We had seen this before
>    without issue, so I didn't consider it critical.
>
> What do you mean by "rolled back to"?
I understand this so far has you have three dirs: l1, nfs1 and nfs2. (l for
local disk and nfs for NFS). nfs2 was shown as failed.

>
>    1. When I brought the name node back up, because of discovering the
>    above, I had changed dfs.name.dir to 2 local drives and one NFS, excluding
>    the one which had failed.
>
> When you brought the namenode backup, with the changed configuration you
have l1, l2 and nfs1. Given you have not seen any failures, l1 and nfs1
have the latest edits so far. Correct? How did you add l2? Can you describe
this procedure in detail?


> Reviewing the name node log from the day with the NFS outage, I see:
>

When you say NFS outage here, this is the failure corresponding to nfs2
from above. Is that correct?


>
>  2013-01-24 16:33:11,794 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit
> log.
> java.io.IOException: Input/output error
>         at sun.nio.ch.FileChannelImpl.force0(Native Method)
>         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.flushAndSync(FSEditLog.java:215)
>         at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:89)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1015)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1666)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> 2013-01-24 16:33:11,794 WARN org.apache.hadoop.hdfs.server.common.Storage:
> Removing storage dir /rdisks/xxxxxxxxxxxxxx
>
>
>  Unfortunately, since I wasn't expecting anything terrible to happen, I
> didn't look too closely at the file system while the name node was down.
>  When I brought it up, the time stamp on the previous checkpoint directory
> in the dfs.name.dir was right around the above error message.  The current
> directory basically had an fsimage and an empty edits log with the current
> time stamps.
>

Which storage directory are you talking about here?

>
>  So: what happened?  Should this failure have led to my data loss?  I
> would have thought the local directory would be fine in this scenario.  Did
> I have any other options for data recovery?
>

I am not sure how you concluded that you lost a week's data and the
namenode rolled back by one week? Please share the namenode logs
corresponding to the restart.

This is how it should have worked.
- When nfs2 was removed, on both l1 and nfs1 a timestamp is recorded,
corresponding to removal of a storage directory.
- If there is any checkpointing that happened, it would have also
incremented the timestamp.
- When the namenode starts up, it chooses l1 and nfs1 because the recorded
timestamp is the latest on these directories and loads fsimage and edits
from those directories. Namenode also performs checkpoint and writes new
consolidated image on l1, l2 and nfs1 and creates empty editlog on l1, l2
and nfs1.

If you provide more details on how l2 was added, we may be able to
understand what happened.

Regards,
Suresh


-- 
http://hortonworks.com/download/

Re: Advice on post mortem of data loss (v 1.0.3)

Posted by Suresh Srinivas <su...@hortonworks.com>.
Sorry to hear you are having issues. Few questions and comments inline.


On Fri, Feb 1, 2013 at 8:40 AM, Peter Sheridan <
psheridan@millennialmedia.com> wrote:

>  Yesterday, I bounced my DFS cluster.  We realized that "ulimit –u" was,
> in extreme cases, preventing the name node from creating threads.  This had
> only started occurring within the last day or so.  When I brought the name
> node back up, it had essentially been rolled back by one week, and I lost
> all changes which had been made since then.
>
>  There are a few other factors to consider.
>
>    1. I had 3 locations for dfs.name.dir — one local and two NFS.  (I
>    originally thought this was 2 local and one NFS when I set it up.)  On
>    1/24, the day which we effectively rolled back to, the second NFS mount
>    started showing as FAILED on dfshealth.jsp.  We had seen this before
>    without issue, so I didn't consider it critical.
>
> What do you mean by "rolled back to"?
I understand this so far has you have three dirs: l1, nfs1 and nfs2. (l for
local disk and nfs for NFS). nfs2 was shown as failed.

>
>    1. When I brought the name node back up, because of discovering the
>    above, I had changed dfs.name.dir to 2 local drives and one NFS, excluding
>    the one which had failed.
>
> When you brought the namenode backup, with the changed configuration you
have l1, l2 and nfs1. Given you have not seen any failures, l1 and nfs1
have the latest edits so far. Correct? How did you add l2? Can you describe
this procedure in detail?


> Reviewing the name node log from the day with the NFS outage, I see:
>

When you say NFS outage here, this is the failure corresponding to nfs2
from above. Is that correct?


>
>  2013-01-24 16:33:11,794 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit
> log.
> java.io.IOException: Input/output error
>         at sun.nio.ch.FileChannelImpl.force0(Native Method)
>         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.flushAndSync(FSEditLog.java:215)
>         at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:89)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1015)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1666)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> 2013-01-24 16:33:11,794 WARN org.apache.hadoop.hdfs.server.common.Storage:
> Removing storage dir /rdisks/xxxxxxxxxxxxxx
>
>
>  Unfortunately, since I wasn't expecting anything terrible to happen, I
> didn't look too closely at the file system while the name node was down.
>  When I brought it up, the time stamp on the previous checkpoint directory
> in the dfs.name.dir was right around the above error message.  The current
> directory basically had an fsimage and an empty edits log with the current
> time stamps.
>

Which storage directory are you talking about here?

>
>  So: what happened?  Should this failure have led to my data loss?  I
> would have thought the local directory would be fine in this scenario.  Did
> I have any other options for data recovery?
>

I am not sure how you concluded that you lost a week's data and the
namenode rolled back by one week? Please share the namenode logs
corresponding to the restart.

This is how it should have worked.
- When nfs2 was removed, on both l1 and nfs1 a timestamp is recorded,
corresponding to removal of a storage directory.
- If there is any checkpointing that happened, it would have also
incremented the timestamp.
- When the namenode starts up, it chooses l1 and nfs1 because the recorded
timestamp is the latest on these directories and loads fsimage and edits
from those directories. Namenode also performs checkpoint and writes new
consolidated image on l1, l2 and nfs1 and creates empty editlog on l1, l2
and nfs1.

If you provide more details on how l2 was added, we may be able to
understand what happened.

Regards,
Suresh


-- 
http://hortonworks.com/download/

Re: Advice on post mortem of data loss (v 1.0.3)

Posted by Suresh Srinivas <su...@hortonworks.com>.
Sorry to hear you are having issues. Few questions and comments inline.


On Fri, Feb 1, 2013 at 8:40 AM, Peter Sheridan <
psheridan@millennialmedia.com> wrote:

>  Yesterday, I bounced my DFS cluster.  We realized that "ulimit –u" was,
> in extreme cases, preventing the name node from creating threads.  This had
> only started occurring within the last day or so.  When I brought the name
> node back up, it had essentially been rolled back by one week, and I lost
> all changes which had been made since then.
>
>  There are a few other factors to consider.
>
>    1. I had 3 locations for dfs.name.dir — one local and two NFS.  (I
>    originally thought this was 2 local and one NFS when I set it up.)  On
>    1/24, the day which we effectively rolled back to, the second NFS mount
>    started showing as FAILED on dfshealth.jsp.  We had seen this before
>    without issue, so I didn't consider it critical.
>
> What do you mean by "rolled back to"?
I understand this so far has you have three dirs: l1, nfs1 and nfs2. (l for
local disk and nfs for NFS). nfs2 was shown as failed.

>
>    1. When I brought the name node back up, because of discovering the
>    above, I had changed dfs.name.dir to 2 local drives and one NFS, excluding
>    the one which had failed.
>
> When you brought the namenode backup, with the changed configuration you
have l1, l2 and nfs1. Given you have not seen any failures, l1 and nfs1
have the latest edits so far. Correct? How did you add l2? Can you describe
this procedure in detail?


> Reviewing the name node log from the day with the NFS outage, I see:
>

When you say NFS outage here, this is the failure corresponding to nfs2
from above. Is that correct?


>
>  2013-01-24 16:33:11,794 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit
> log.
> java.io.IOException: Input/output error
>         at sun.nio.ch.FileChannelImpl.force0(Native Method)
>         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.flushAndSync(FSEditLog.java:215)
>         at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:89)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1015)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1666)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> 2013-01-24 16:33:11,794 WARN org.apache.hadoop.hdfs.server.common.Storage:
> Removing storage dir /rdisks/xxxxxxxxxxxxxx
>
>
>  Unfortunately, since I wasn't expecting anything terrible to happen, I
> didn't look too closely at the file system while the name node was down.
>  When I brought it up, the time stamp on the previous checkpoint directory
> in the dfs.name.dir was right around the above error message.  The current
> directory basically had an fsimage and an empty edits log with the current
> time stamps.
>

Which storage directory are you talking about here?

>
>  So: what happened?  Should this failure have led to my data loss?  I
> would have thought the local directory would be fine in this scenario.  Did
> I have any other options for data recovery?
>

I am not sure how you concluded that you lost a week's data and the
namenode rolled back by one week? Please share the namenode logs
corresponding to the restart.

This is how it should have worked.
- When nfs2 was removed, on both l1 and nfs1 a timestamp is recorded,
corresponding to removal of a storage directory.
- If there is any checkpointing that happened, it would have also
incremented the timestamp.
- When the namenode starts up, it chooses l1 and nfs1 because the recorded
timestamp is the latest on these directories and loads fsimage and edits
from those directories. Namenode also performs checkpoint and writes new
consolidated image on l1, l2 and nfs1 and creates empty editlog on l1, l2
and nfs1.

If you provide more details on how l2 was added, we may be able to
understand what happened.

Regards,
Suresh


-- 
http://hortonworks.com/download/