You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by miriyala srinivas <sr...@gmail.com> on 2015/09/07 07:08:58 UTC

Who will Responsible for Handling DFS Write Pipe line Failure

Hi All,

I am just started Learning fundamentals of  HDFS  and its internal
mechanism , concepts used here are very impressive and looks simple but
makes me confusing and my question is *who will responsible for handling
DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
failed in the pipeline)*? if any data node failed during the pipe line
write then the entire pipe line will get stopped? or new data node added to
the existing pipe line? how this entire mechanism works?I really appreciate
if someone with good knowledge of HDFS can explains to me.

Note:I read bunch of documents but none seems to be explained what i am
looking for.

thanks
srinivas

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by Daniel Schulz <da...@hotmail.com>.

Hi Srinivas,

In Hadoop most DFS accesses are two staged: first query the NameNode (NN), then go down to the DataNodes (DN). So most of the time first access Master nodes for metadata; then access Worker nodes for payload data.

(1) In your scenario, you want to write the file named "HomerQuotes.txt" with Replication Factor 3 (RF=3). First, you query NN for the desired DNs to store your text file at. NN will respond with, lets assume, DN_01. Fine — let's go down to Worker nodes.

(2) Now your text file HomerQuotes.txt will be send to the IP addresses or host names NN just sent you in (1). You transmit your file completely now to DN_1. When it arrives there DN_1 reports back to NN. As the RF of this file is supposed to be 3 but only one replica exists, DN_1 will re-distribute your file across the cluster twice. If, and only if, all those three replications/copy jobs succeeded, DN_01 will report back to the client success. Otherwise a failure is reported.

In real-world Hadoop, a client like "$ hdfs dfs" or WebHdfs is doing these stages for you. But this is, what is going on under the bonnet.

I hope this helps. Otherwise feel free to contact us for more questions.

Best regards, Daniel.

> On 07 Sep 2015, at 07:09, miriyala srinivas <sr...@gmail.com> wrote:
> 
> Hi All,
> 
> I am just started Learning fundamentals of  HDFS  and its internal mechanism , concepts used here are very impressive and looks simple but makes me confusing and my question is who will responsible for handling DFS write failure in pipe line (assume replication factor is 3 and 2nd DN failed in the pipeline)? if any data node failed during the pipe line write then the entire pipe line will get stopped? or new data node added to the existing pipe line? how this entire mechanism works?I really appreciate if someone with good knowledge of HDFS can explains to me.
> 
> Note:I read bunch of documents but none seems to be explained what i am looking for.
> 
> thanks
> srinivas

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by Daniel Schulz <da...@hotmail.com>.

Hi Srinivas,

In Hadoop most DFS accesses are two staged: first query the NameNode (NN), then go down to the DataNodes (DN). So most of the time first access Master nodes for metadata; then access Worker nodes for payload data.

(1) In your scenario, you want to write the file named "HomerQuotes.txt" with Replication Factor 3 (RF=3). First, you query NN for the desired DNs to store your text file at. NN will respond with, lets assume, DN_01. Fine — let's go down to Worker nodes.

(2) Now your text file HomerQuotes.txt will be send to the IP addresses or host names NN just sent you in (1). You transmit your file completely now to DN_1. When it arrives there DN_1 reports back to NN. As the RF of this file is supposed to be 3 but only one replica exists, DN_1 will re-distribute your file across the cluster twice. If, and only if, all those three replications/copy jobs succeeded, DN_01 will report back to the client success. Otherwise a failure is reported.

In real-world Hadoop, a client like "$ hdfs dfs" or WebHdfs is doing these stages for you. But this is, what is going on under the bonnet.

I hope this helps. Otherwise feel free to contact us for more questions.

Best regards, Daniel.

> On 07 Sep 2015, at 07:09, miriyala srinivas <sr...@gmail.com> wrote:
> 
> Hi All,
> 
> I am just started Learning fundamentals of  HDFS  and its internal mechanism , concepts used here are very impressive and looks simple but makes me confusing and my question is who will responsible for handling DFS write failure in pipe line (assume replication factor is 3 and 2nd DN failed in the pipeline)? if any data node failed during the pipe line write then the entire pipe line will get stopped? or new data node added to the existing pipe line? how this entire mechanism works?I really appreciate if someone with good knowledge of HDFS can explains to me.
> 
> Note:I read bunch of documents but none seems to be explained what i am looking for.
> 
> thanks
> srinivas

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by Harsh J <ha...@cloudera.com>.

These 2-part blog posts from Yongjun should help you understand the HDFS
file write recovery process better:
http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
 and
http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/

On Mon, Sep 7, 2015 at 10:39 AM miriyala srinivas <sr...@gmail.com>
wrote:

> Hi All,
>
> I am just started Learning fundamentals of  HDFS  and its internal
> mechanism , concepts used here are very impressive and looks simple but
> makes me confusing and my question is *who will responsible for handling
> DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
> failed in the pipeline)*? if any data node failed during the pipe line
> write then the entire pipe line will get stopped? or new data node added to
> the existing pipe line? how this entire mechanism works?I really appreciate
> if someone with good knowledge of HDFS can explains to me.
>
> Note:I read bunch of documents but none seems to be explained what i am
> looking for.
>
> thanks
> srinivas
>

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by Daniel Schulz <da...@hotmail.com>.

Hi Srinivas,

In Hadoop most DFS accesses are two staged: first query the NameNode (NN), then go down to the DataNodes (DN). So most of the time first access Master nodes for metadata; then access Worker nodes for payload data.

(1) In your scenario, you want to write the file named "HomerQuotes.txt" with Replication Factor 3 (RF=3). First, you query NN for the desired DNs to store your text file at. NN will respond with, lets assume, DN_01. Fine — let's go down to Worker nodes.

(2) Now your text file HomerQuotes.txt will be send to the IP addresses or host names NN just sent you in (1). You transmit your file completely now to DN_1. When it arrives there DN_1 reports back to NN. As the RF of this file is supposed to be 3 but only one replica exists, DN_1 will re-distribute your file across the cluster twice. If, and only if, all those three replications/copy jobs succeeded, DN_01 will report back to the client success. Otherwise a failure is reported.

In real-world Hadoop, a client like "$ hdfs dfs" or WebHdfs is doing these stages for you. But this is, what is going on under the bonnet.

I hope this helps. Otherwise feel free to contact us for more questions.

Best regards, Daniel.

> On 07 Sep 2015, at 07:09, miriyala srinivas <sr...@gmail.com> wrote:
> 
> Hi All,
> 
> I am just started Learning fundamentals of  HDFS  and its internal mechanism , concepts used here are very impressive and looks simple but makes me confusing and my question is who will responsible for handling DFS write failure in pipe line (assume replication factor is 3 and 2nd DN failed in the pipeline)? if any data node failed during the pipe line write then the entire pipe line will get stopped? or new data node added to the existing pipe line? how this entire mechanism works?I really appreciate if someone with good knowledge of HDFS can explains to me.
> 
> Note:I read bunch of documents but none seems to be explained what i am looking for.
> 
> thanks
> srinivas

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by Harsh J <ha...@cloudera.com>.

These 2-part blog posts from Yongjun should help you understand the HDFS
file write recovery process better:
http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
 and
http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/

On Mon, Sep 7, 2015 at 10:39 AM miriyala srinivas <sr...@gmail.com>
wrote:

> Hi All,
>
> I am just started Learning fundamentals of  HDFS  and its internal
> mechanism , concepts used here are very impressive and looks simple but
> makes me confusing and my question is *who will responsible for handling
> DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
> failed in the pipeline)*? if any data node failed during the pipe line
> write then the entire pipe line will get stopped? or new data node added to
> the existing pipe line? how this entire mechanism works?I really appreciate
> if someone with good knowledge of HDFS can explains to me.
>
> Note:I read bunch of documents but none seems to be explained what i am
> looking for.
>
> thanks
> srinivas
>

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by Harsh J <ha...@cloudera.com>.

These 2-part blog posts from Yongjun should help you understand the HDFS
file write recovery process better:
http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
 and
http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/

On Mon, Sep 7, 2015 at 10:39 AM miriyala srinivas <sr...@gmail.com>
wrote:

> Hi All,
>
> I am just started Learning fundamentals of  HDFS  and its internal
> mechanism , concepts used here are very impressive and looks simple but
> makes me confusing and my question is *who will responsible for handling
> DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
> failed in the pipeline)*? if any data node failed during the pipe line
> write then the entire pipe line will get stopped? or new data node added to
> the existing pipe line? how this entire mechanism works?I really appreciate
> if someone with good knowledge of HDFS can explains to me.
>
> Note:I read bunch of documents but none seems to be explained what i am
> looking for.
>
> thanks
> srinivas
>

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by Daniel Schulz <da...@hotmail.com>.

Hi Srinivas,

In Hadoop most DFS accesses are two staged: first query the NameNode (NN), then go down to the DataNodes (DN). So most of the time first access Master nodes for metadata; then access Worker nodes for payload data.

(1) In your scenario, you want to write the file named "HomerQuotes.txt" with Replication Factor 3 (RF=3). First, you query NN for the desired DNs to store your text file at. NN will respond with, lets assume, DN_01. Fine — let's go down to Worker nodes.

(2) Now your text file HomerQuotes.txt will be send to the IP addresses or host names NN just sent you in (1). You transmit your file completely now to DN_1. When it arrives there DN_1 reports back to NN. As the RF of this file is supposed to be 3 but only one replica exists, DN_1 will re-distribute your file across the cluster twice. If, and only if, all those three replications/copy jobs succeeded, DN_01 will report back to the client success. Otherwise a failure is reported.

In real-world Hadoop, a client like "$ hdfs dfs" or WebHdfs is doing these stages for you. But this is, what is going on under the bonnet.

I hope this helps. Otherwise feel free to contact us for more questions.

Best regards, Daniel.

> On 07 Sep 2015, at 07:09, miriyala srinivas <sr...@gmail.com> wrote:
> 
> Hi All,
> 
> I am just started Learning fundamentals of  HDFS  and its internal mechanism , concepts used here are very impressive and looks simple but makes me confusing and my question is who will responsible for handling DFS write failure in pipe line (assume replication factor is 3 and 2nd DN failed in the pipeline)? if any data node failed during the pipe line write then the entire pipe line will get stopped? or new data node added to the existing pipe line? how this entire mechanism works?I really appreciate if someone with good knowledge of HDFS can explains to me.
> 
> Note:I read bunch of documents but none seems to be explained what i am looking for.
> 
> thanks
> srinivas

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by Harsh J <ha...@cloudera.com>.

These 2-part blog posts from Yongjun should help you understand the HDFS
file write recovery process better:
http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
 and
http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/

On Mon, Sep 7, 2015 at 10:39 AM miriyala srinivas <sr...@gmail.com>
wrote:

> Hi All,
>
> I am just started Learning fundamentals of  HDFS  and its internal
> mechanism , concepts used here are very impressive and looks simple but
> makes me confusing and my question is *who will responsible for handling
> DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
> failed in the pipeline)*? if any data node failed during the pipe line
> write then the entire pipe line will get stopped? or new data node added to
> the existing pipe line? how this entire mechanism works?I really appreciate
> if someone with good knowledge of HDFS can explains to me.
>
> Note:I read bunch of documents but none seems to be explained what i am
> looking for.
>
> thanks
> srinivas
>

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by miriyala srinivas <sr...@gmail.com>.

@Harsh
thanks for sharing link.

On Tue, Sep 8, 2015 at 6:56 AM, Harsh J <ha...@cloudera.com> wrote:

> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (harsh@cloudera.com) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DDskzNGpvGEtqzVfF%252FHII0TWD32sZJwX1X0ntoIpp9JvtNevAVHjiQfIT1eEAKBENJ1oKtL%252BVEzL106vJbLC2%252BD2Zjk0KGy9L26amEFTXGtV0dl5AzoHHqzklSJy92giPNfVub5TBTsw%253D%26key%3D0d%252B0eZbKZ6qlbkuthm2%252Boh5Zw1YFLki4y2RCuoRVvns%253D&tc_serial=22562045659&tc_rand=80405818&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=22562045659&tc_rand=80405818&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
> These 2-part blog posts from Yongjun should help you understand the HDFS
> file write recovery process better:
> http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
>  and
> http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
>
> On Mon, Sep 7, 2015 at 10:39 AM miriyala srinivas <sr...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am just started Learning fundamentals of  HDFS  and its internal
>> mechanism , concepts used here are very impressive and looks simple but
>> makes me confusing and my question is *who will responsible for handling
>> DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
>> failed in the pipeline)*? if any data node failed during the pipe line
>> write then the entire pipe line will get stopped? or new data node added to
>> the existing pipe line? how this entire mechanism works?I really appreciate
>> if someone with good knowledge of HDFS can explains to me.
>>
>> Note:I read bunch of documents but none seems to be explained what i am
>> looking for.
>>
>> thanks
>> srinivas
>>
>
>

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by miriyala srinivas <sr...@gmail.com>.

@Harsh
thanks for sharing link.

On Tue, Sep 8, 2015 at 6:56 AM, Harsh J <ha...@cloudera.com> wrote:

> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (harsh@cloudera.com) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DDskzNGpvGEtqzVfF%252FHII0TWD32sZJwX1X0ntoIpp9JvtNevAVHjiQfIT1eEAKBENJ1oKtL%252BVEzL106vJbLC2%252BD2Zjk0KGy9L26amEFTXGtV0dl5AzoHHqzklSJy92giPNfVub5TBTsw%253D%26key%3D0d%252B0eZbKZ6qlbkuthm2%252Boh5Zw1YFLki4y2RCuoRVvns%253D&tc_serial=22562045659&tc_rand=80405818&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=22562045659&tc_rand=80405818&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
> These 2-part blog posts from Yongjun should help you understand the HDFS
> file write recovery process better:
> http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
>  and
> http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
>
> On Mon, Sep 7, 2015 at 10:39 AM miriyala srinivas <sr...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am just started Learning fundamentals of  HDFS  and its internal
>> mechanism , concepts used here are very impressive and looks simple but
>> makes me confusing and my question is *who will responsible for handling
>> DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
>> failed in the pipeline)*? if any data node failed during the pipe line
>> write then the entire pipe line will get stopped? or new data node added to
>> the existing pipe line? how this entire mechanism works?I really appreciate
>> if someone with good knowledge of HDFS can explains to me.
>>
>> Note:I read bunch of documents but none seems to be explained what i am
>> looking for.
>>
>> thanks
>> srinivas
>>
>
>

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by miriyala srinivas <sr...@gmail.com>.

@Harsh
thanks for sharing link.

On Tue, Sep 8, 2015 at 6:56 AM, Harsh J <ha...@cloudera.com> wrote:

> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (harsh@cloudera.com) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DDskzNGpvGEtqzVfF%252FHII0TWD32sZJwX1X0ntoIpp9JvtNevAVHjiQfIT1eEAKBENJ1oKtL%252BVEzL106vJbLC2%252BD2Zjk0KGy9L26amEFTXGtV0dl5AzoHHqzklSJy92giPNfVub5TBTsw%253D%26key%3D0d%252B0eZbKZ6qlbkuthm2%252Boh5Zw1YFLki4y2RCuoRVvns%253D&tc_serial=22562045659&tc_rand=80405818&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=22562045659&tc_rand=80405818&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
> These 2-part blog posts from Yongjun should help you understand the HDFS
> file write recovery process better:
> http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
>  and
> http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
>
> On Mon, Sep 7, 2015 at 10:39 AM miriyala srinivas <sr...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am just started Learning fundamentals of  HDFS  and its internal
>> mechanism , concepts used here are very impressive and looks simple but
>> makes me confusing and my question is *who will responsible for handling
>> DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
>> failed in the pipeline)*? if any data node failed during the pipe line
>> write then the entire pipe line will get stopped? or new data node added to
>> the existing pipe line? how this entire mechanism works?I really appreciate
>> if someone with good knowledge of HDFS can explains to me.
>>
>> Note:I read bunch of documents but none seems to be explained what i am
>> looking for.
>>
>> thanks
>> srinivas
>>
>
>

Re: Who will Responsible for Handling DFS Write Pipe line Failure

Posted by miriyala srinivas <sr...@gmail.com>.

@Harsh
thanks for sharing link.

On Tue, Sep 8, 2015 at 6:56 AM, Harsh J <ha...@cloudera.com> wrote:

> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (harsh@cloudera.com) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DDskzNGpvGEtqzVfF%252FHII0TWD32sZJwX1X0ntoIpp9JvtNevAVHjiQfIT1eEAKBENJ1oKtL%252BVEzL106vJbLC2%252BD2Zjk0KGy9L26amEFTXGtV0dl5AzoHHqzklSJy92giPNfVub5TBTsw%253D%26key%3D0d%252B0eZbKZ6qlbkuthm2%252Boh5Zw1YFLki4y2RCuoRVvns%253D&tc_serial=22562045659&tc_rand=80405818&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=22562045659&tc_rand=80405818&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
> These 2-part blog posts from Yongjun should help you understand the HDFS
> file write recovery process better:
> http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
>  and
> http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
>
> On Mon, Sep 7, 2015 at 10:39 AM miriyala srinivas <sr...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am just started Learning fundamentals of  HDFS  and its internal
>> mechanism , concepts used here are very impressive and looks simple but
>> makes me confusing and my question is *who will responsible for handling
>> DFS write failure in pipe line (assume replication factor is 3 and 2nd DN
>> failed in the pipeline)*? if any data node failed during the pipe line
>> write then the entire pipe line will get stopped? or new data node added to
>> the existing pipe line? how this entire mechanism works?I really appreciate
>> if someone with good knowledge of HDFS can explains to me.
>>
>> Note:I read bunch of documents but none seems to be explained what i am
>> looking for.
>>
>> thanks
>> srinivas
>>
>
>