You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Bertrand Dechoux <de...@gmail.com> on 2012/09/28 11:21:08 UTC

dfs.name.dir replication and disk not available

Hi,

I was wondering about the safety of multiples dfs.name.dir directories.
On one hand, we want all copies to be synchronised but on the other hand if
a hard drive fail we would like the namenode still to be operational.
How does that work? I know there is the source but I was hoping for a
higher level description.

Regards

Bertrand

PS : I am interested about the behaviour of the last stable version ie
1.0.3. Not in the old issues that were solved.

Re: dfs.name.dir replication and disk not available

Posted by Bertrand Dechoux <de...@gmail.com>.

That's definitely clearer (and it makes sense).

Thanks a lot.

Bertrand

On Fri, Sep 28, 2012 at 11:56 AM, Harsh J <ha...@cloudera.com> wrote:

> I don't know how much of this is 1.x compatible:
>
> - When a transaction is logged and sync'd, and a single edits storage
> location fails during write, then only that storage location is
> ejected out of the regular write list and skipped over (with states
> being updated for that location in the UI, etc., immediately). The
> rest remain active and NN lives on. The transaction is marked
> successful and returns back to client with a similar code. Further
> transactions too skip the ejected storage.
> - If no edit streams remain after removal (i.e. last remaining disk is
> removed due to a write error), then the transaction is failed and the
> NN dies down, to prevent data loss due to lack of persistence.
> - Hence, a transaction at the NN can be marked complete and return a
> success, iff, at least one location successfully wrote the edit log
> for it.
> - If dfs.name.dir.restore is enabled, then the NN checks if its
> ejected storages are healthy again and re adds them. The check, IIRC,
> is done during checkpoint or checkpoint-like operations currently.
>
> I guess all of this is in 1.x too, but I haven't validated it
> recently. It is certainly in the versions I've been using and
> supporting at my org for quite a while now. The restore especially
> comes in handy for customers/users with fairly common NFS mount
> related issues, not requiring them to restart NN each time the NN
> ejects the NFS out for a bad write. Although, for that to happen, a
> soft mount is necessary and recommended, rather than a hard mount,
> which would hang the NameNode and invalidate its whole "still
> available despite a few volumes failing" feature.
>
> Does this help Bertrand? Happy to answer any further questions.
>
> On Fri, Sep 28, 2012 at 2:51 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
> > Hi,
> >
> > I was wondering about the safety of multiples dfs.name.dir directories.
> > On one hand, we want all copies to be synchronised but on the other hand
> if
> > a hard drive fail we would like the namenode still to be operational.
> > How does that work? I know there is the source but I was hoping for a
> higher
> > level description.
> >
> > Regards
> >
> > Bertrand
> >
> > PS : I am interested about the behaviour of the last stable version ie
> > 1.0.3. Not in the old issues that were solved.
>
>
>
> --
> Harsh J
>



-- 
Bertrand Dechoux

Re: dfs.name.dir replication and disk not available

Posted by Bertrand Dechoux <de...@gmail.com>.

That's definitely clearer (and it makes sense).

Thanks a lot.

Bertrand

On Fri, Sep 28, 2012 at 11:56 AM, Harsh J <ha...@cloudera.com> wrote:

> I don't know how much of this is 1.x compatible:
>
> - When a transaction is logged and sync'd, and a single edits storage
> location fails during write, then only that storage location is
> ejected out of the regular write list and skipped over (with states
> being updated for that location in the UI, etc., immediately). The
> rest remain active and NN lives on. The transaction is marked
> successful and returns back to client with a similar code. Further
> transactions too skip the ejected storage.
> - If no edit streams remain after removal (i.e. last remaining disk is
> removed due to a write error), then the transaction is failed and the
> NN dies down, to prevent data loss due to lack of persistence.
> - Hence, a transaction at the NN can be marked complete and return a
> success, iff, at least one location successfully wrote the edit log
> for it.
> - If dfs.name.dir.restore is enabled, then the NN checks if its
> ejected storages are healthy again and re adds them. The check, IIRC,
> is done during checkpoint or checkpoint-like operations currently.
>
> I guess all of this is in 1.x too, but I haven't validated it
> recently. It is certainly in the versions I've been using and
> supporting at my org for quite a while now. The restore especially
> comes in handy for customers/users with fairly common NFS mount
> related issues, not requiring them to restart NN each time the NN
> ejects the NFS out for a bad write. Although, for that to happen, a
> soft mount is necessary and recommended, rather than a hard mount,
> which would hang the NameNode and invalidate its whole "still
> available despite a few volumes failing" feature.
>
> Does this help Bertrand? Happy to answer any further questions.
>
> On Fri, Sep 28, 2012 at 2:51 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
> > Hi,
> >
> > I was wondering about the safety of multiples dfs.name.dir directories.
> > On one hand, we want all copies to be synchronised but on the other hand
> if
> > a hard drive fail we would like the namenode still to be operational.
> > How does that work? I know there is the source but I was hoping for a
> higher
> > level description.
> >
> > Regards
> >
> > Bertrand
> >
> > PS : I am interested about the behaviour of the last stable version ie
> > 1.0.3. Not in the old issues that were solved.
>
>
>
> --
> Harsh J
>



-- 
Bertrand Dechoux

Re: dfs.name.dir replication and disk not available

Posted by Bertrand Dechoux <de...@gmail.com>.

That's definitely clearer (and it makes sense).

Thanks a lot.

Bertrand

On Fri, Sep 28, 2012 at 11:56 AM, Harsh J <ha...@cloudera.com> wrote:

> I don't know how much of this is 1.x compatible:
>
> - When a transaction is logged and sync'd, and a single edits storage
> location fails during write, then only that storage location is
> ejected out of the regular write list and skipped over (with states
> being updated for that location in the UI, etc., immediately). The
> rest remain active and NN lives on. The transaction is marked
> successful and returns back to client with a similar code. Further
> transactions too skip the ejected storage.
> - If no edit streams remain after removal (i.e. last remaining disk is
> removed due to a write error), then the transaction is failed and the
> NN dies down, to prevent data loss due to lack of persistence.
> - Hence, a transaction at the NN can be marked complete and return a
> success, iff, at least one location successfully wrote the edit log
> for it.
> - If dfs.name.dir.restore is enabled, then the NN checks if its
> ejected storages are healthy again and re adds them. The check, IIRC,
> is done during checkpoint or checkpoint-like operations currently.
>
> I guess all of this is in 1.x too, but I haven't validated it
> recently. It is certainly in the versions I've been using and
> supporting at my org for quite a while now. The restore especially
> comes in handy for customers/users with fairly common NFS mount
> related issues, not requiring them to restart NN each time the NN
> ejects the NFS out for a bad write. Although, for that to happen, a
> soft mount is necessary and recommended, rather than a hard mount,
> which would hang the NameNode and invalidate its whole "still
> available despite a few volumes failing" feature.
>
> Does this help Bertrand? Happy to answer any further questions.
>
> On Fri, Sep 28, 2012 at 2:51 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
> > Hi,
> >
> > I was wondering about the safety of multiples dfs.name.dir directories.
> > On one hand, we want all copies to be synchronised but on the other hand
> if
> > a hard drive fail we would like the namenode still to be operational.
> > How does that work? I know there is the source but I was hoping for a
> higher
> > level description.
> >
> > Regards
> >
> > Bertrand
> >
> > PS : I am interested about the behaviour of the last stable version ie
> > 1.0.3. Not in the old issues that were solved.
>
>
>
> --
> Harsh J
>



-- 
Bertrand Dechoux

Re: dfs.name.dir replication and disk not available

Posted by Bertrand Dechoux <de...@gmail.com>.

That's definitely clearer (and it makes sense).

Thanks a lot.

Bertrand

On Fri, Sep 28, 2012 at 11:56 AM, Harsh J <ha...@cloudera.com> wrote:

> I don't know how much of this is 1.x compatible:
>
> - When a transaction is logged and sync'd, and a single edits storage
> location fails during write, then only that storage location is
> ejected out of the regular write list and skipped over (with states
> being updated for that location in the UI, etc., immediately). The
> rest remain active and NN lives on. The transaction is marked
> successful and returns back to client with a similar code. Further
> transactions too skip the ejected storage.
> - If no edit streams remain after removal (i.e. last remaining disk is
> removed due to a write error), then the transaction is failed and the
> NN dies down, to prevent data loss due to lack of persistence.
> - Hence, a transaction at the NN can be marked complete and return a
> success, iff, at least one location successfully wrote the edit log
> for it.
> - If dfs.name.dir.restore is enabled, then the NN checks if its
> ejected storages are healthy again and re adds them. The check, IIRC,
> is done during checkpoint or checkpoint-like operations currently.
>
> I guess all of this is in 1.x too, but I haven't validated it
> recently. It is certainly in the versions I've been using and
> supporting at my org for quite a while now. The restore especially
> comes in handy for customers/users with fairly common NFS mount
> related issues, not requiring them to restart NN each time the NN
> ejects the NFS out for a bad write. Although, for that to happen, a
> soft mount is necessary and recommended, rather than a hard mount,
> which would hang the NameNode and invalidate its whole "still
> available despite a few volumes failing" feature.
>
> Does this help Bertrand? Happy to answer any further questions.
>
> On Fri, Sep 28, 2012 at 2:51 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
> > Hi,
> >
> > I was wondering about the safety of multiples dfs.name.dir directories.
> > On one hand, we want all copies to be synchronised but on the other hand
> if
> > a hard drive fail we would like the namenode still to be operational.
> > How does that work? I know there is the source but I was hoping for a
> higher
> > level description.
> >
> > Regards
> >
> > Bertrand
> >
> > PS : I am interested about the behaviour of the last stable version ie
> > 1.0.3. Not in the old issues that were solved.
>
>
>
> --
> Harsh J
>



-- 
Bertrand Dechoux

Re: dfs.name.dir replication and disk not available

Posted by Harsh J <ha...@cloudera.com>.

I don't know how much of this is 1.x compatible:

- When a transaction is logged and sync'd, and a single edits storage
location fails during write, then only that storage location is
ejected out of the regular write list and skipped over (with states
being updated for that location in the UI, etc., immediately). The
rest remain active and NN lives on. The transaction is marked
successful and returns back to client with a similar code. Further
transactions too skip the ejected storage.
- If no edit streams remain after removal (i.e. last remaining disk is
removed due to a write error), then the transaction is failed and the
NN dies down, to prevent data loss due to lack of persistence.
- Hence, a transaction at the NN can be marked complete and return a
success, iff, at least one location successfully wrote the edit log
for it.
- If dfs.name.dir.restore is enabled, then the NN checks if its
ejected storages are healthy again and re adds them. The check, IIRC,
is done during checkpoint or checkpoint-like operations currently.

I guess all of this is in 1.x too, but I haven't validated it
recently. It is certainly in the versions I've been using and
supporting at my org for quite a while now. The restore especially
comes in handy for customers/users with fairly common NFS mount
related issues, not requiring them to restart NN each time the NN
ejects the NFS out for a bad write. Although, for that to happen, a
soft mount is necessary and recommended, rather than a hard mount,
which would hang the NameNode and invalidate its whole "still
available despite a few volumes failing" feature.

Does this help Bertrand? Happy to answer any further questions.

On Fri, Sep 28, 2012 at 2:51 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> Hi,
>
> I was wondering about the safety of multiples dfs.name.dir directories.
> On one hand, we want all copies to be synchronised but on the other hand if
> a hard drive fail we would like the namenode still to be operational.
> How does that work? I know there is the source but I was hoping for a higher
> level description.
>
> Regards
>
> Bertrand
>
> PS : I am interested about the behaviour of the last stable version ie
> 1.0.3. Not in the old issues that were solved.

-- 
Harsh J

Re: dfs.name.dir replication and disk not available

Posted by Harsh J <ha...@cloudera.com>.

I don't know how much of this is 1.x compatible:

- When a transaction is logged and sync'd, and a single edits storage
location fails during write, then only that storage location is
ejected out of the regular write list and skipped over (with states
being updated for that location in the UI, etc., immediately). The
rest remain active and NN lives on. The transaction is marked
successful and returns back to client with a similar code. Further
transactions too skip the ejected storage.
- If no edit streams remain after removal (i.e. last remaining disk is
removed due to a write error), then the transaction is failed and the
NN dies down, to prevent data loss due to lack of persistence.
- Hence, a transaction at the NN can be marked complete and return a
success, iff, at least one location successfully wrote the edit log
for it.
- If dfs.name.dir.restore is enabled, then the NN checks if its
ejected storages are healthy again and re adds them. The check, IIRC,
is done during checkpoint or checkpoint-like operations currently.

I guess all of this is in 1.x too, but I haven't validated it
recently. It is certainly in the versions I've been using and
supporting at my org for quite a while now. The restore especially
comes in handy for customers/users with fairly common NFS mount
related issues, not requiring them to restart NN each time the NN
ejects the NFS out for a bad write. Although, for that to happen, a
soft mount is necessary and recommended, rather than a hard mount,
which would hang the NameNode and invalidate its whole "still
available despite a few volumes failing" feature.

Does this help Bertrand? Happy to answer any further questions.

On Fri, Sep 28, 2012 at 2:51 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> Hi,
>
> I was wondering about the safety of multiples dfs.name.dir directories.
> On one hand, we want all copies to be synchronised but on the other hand if
> a hard drive fail we would like the namenode still to be operational.
> How does that work? I know there is the source but I was hoping for a higher
> level description.
>
> Regards
>
> Bertrand
>
> PS : I am interested about the behaviour of the last stable version ie
> 1.0.3. Not in the old issues that were solved.

-- 
Harsh J

Re: dfs.name.dir replication and disk not available

Posted by Harsh J <ha...@cloudera.com>.

I don't know how much of this is 1.x compatible:

- When a transaction is logged and sync'd, and a single edits storage
location fails during write, then only that storage location is
ejected out of the regular write list and skipped over (with states
being updated for that location in the UI, etc., immediately). The
rest remain active and NN lives on. The transaction is marked
successful and returns back to client with a similar code. Further
transactions too skip the ejected storage.
- If no edit streams remain after removal (i.e. last remaining disk is
removed due to a write error), then the transaction is failed and the
NN dies down, to prevent data loss due to lack of persistence.
- Hence, a transaction at the NN can be marked complete and return a
success, iff, at least one location successfully wrote the edit log
for it.
- If dfs.name.dir.restore is enabled, then the NN checks if its
ejected storages are healthy again and re adds them. The check, IIRC,
is done during checkpoint or checkpoint-like operations currently.

I guess all of this is in 1.x too, but I haven't validated it
recently. It is certainly in the versions I've been using and
supporting at my org for quite a while now. The restore especially
comes in handy for customers/users with fairly common NFS mount
related issues, not requiring them to restart NN each time the NN
ejects the NFS out for a bad write. Although, for that to happen, a
soft mount is necessary and recommended, rather than a hard mount,
which would hang the NameNode and invalidate its whole "still
available despite a few volumes failing" feature.

Does this help Bertrand? Happy to answer any further questions.

On Fri, Sep 28, 2012 at 2:51 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> Hi,
>
> I was wondering about the safety of multiples dfs.name.dir directories.
> On one hand, we want all copies to be synchronised but on the other hand if
> a hard drive fail we would like the namenode still to be operational.
> How does that work? I know there is the source but I was hoping for a higher
> level description.
>
> Regards
>
> Bertrand
>
> PS : I am interested about the behaviour of the last stable version ie
> 1.0.3. Not in the old issues that were solved.

-- 
Harsh J

Re: dfs.name.dir replication and disk not available

Posted by Harsh J <ha...@cloudera.com>.

I don't know how much of this is 1.x compatible:

- When a transaction is logged and sync'd, and a single edits storage
location fails during write, then only that storage location is
ejected out of the regular write list and skipped over (with states
being updated for that location in the UI, etc., immediately). The
rest remain active and NN lives on. The transaction is marked
successful and returns back to client with a similar code. Further
transactions too skip the ejected storage.
- If no edit streams remain after removal (i.e. last remaining disk is
removed due to a write error), then the transaction is failed and the
NN dies down, to prevent data loss due to lack of persistence.
- Hence, a transaction at the NN can be marked complete and return a
success, iff, at least one location successfully wrote the edit log
for it.
- If dfs.name.dir.restore is enabled, then the NN checks if its
ejected storages are healthy again and re adds them. The check, IIRC,
is done during checkpoint or checkpoint-like operations currently.

I guess all of this is in 1.x too, but I haven't validated it
recently. It is certainly in the versions I've been using and
supporting at my org for quite a while now. The restore especially
comes in handy for customers/users with fairly common NFS mount
related issues, not requiring them to restart NN each time the NN
ejects the NFS out for a bad write. Although, for that to happen, a
soft mount is necessary and recommended, rather than a hard mount,
which would hang the NameNode and invalidate its whole "still
available despite a few volumes failing" feature.

Does this help Bertrand? Happy to answer any further questions.

On Fri, Sep 28, 2012 at 2:51 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> Hi,
>
> I was wondering about the safety of multiples dfs.name.dir directories.
> On one hand, we want all copies to be synchronised but on the other hand if
> a hard drive fail we would like the namenode still to be operational.
> How does that work? I know there is the source but I was hoping for a higher
> level description.
>
> Regards
>
> Bertrand
>
> PS : I am interested about the behaviour of the last stable version ie
> 1.0.3. Not in the old issues that were solved.

-- 
Harsh J