You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by "Kartashov, Andy" <An...@mpac.ca> on 2012/11/19 15:27:10 UTC

a question on NameNode

Guys,

I am learning that NN doesn't persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.

So, how does it work?

Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.

When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

Andy Kartashov
MPAC
IT Architecture, Co-op
1340 Pickering Parkway, Pickering, L1V 0C4
* Phone : (905) 837 6269
* Mobile: (416) 722 1787
andy.kartashov@mpac.ca<ma...@mpac.ca>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Thank you Kai and Tariq.

From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, November 19, 2012 10:20 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode

Hello Andy,

    If you have not disabled the speculative execution then your second assumption is correct.

Regards,
    Mohammad Tariq

On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <An...@mpac.ca>> wrote:
Thank you Kai.. One more question please.

Does MapReduce run tasks of redundant blocks ?

Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3

Will MR attempt:

a.       to start 3 Map tasks (one per replicated block) end execute them all

b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully

c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

Thanks,

From: Kai Voigt [mailto:k@123.org<ma...@123.org>]
Sent: Monday, November 19, 2012 10:01 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: a question on NameNode

Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>>:

So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Thank you Kai and Tariq.

From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, November 19, 2012 10:20 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode

Hello Andy,

    If you have not disabled the speculative execution then your second assumption is correct.

Regards,
    Mohammad Tariq

On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <An...@mpac.ca>> wrote:
Thank you Kai.. One more question please.

Does MapReduce run tasks of redundant blocks ?

Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3

Will MR attempt:

a.       to start 3 Map tasks (one per replicated block) end execute them all

b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully

c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

Thanks,

From: Kai Voigt [mailto:k@123.org<ma...@123.org>]
Sent: Monday, November 19, 2012 10:01 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: a question on NameNode

Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>>:

So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Thank you Kai and Tariq.

From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, November 19, 2012 10:20 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode

Hello Andy,

    If you have not disabled the speculative execution then your second assumption is correct.

Regards,
    Mohammad Tariq

On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <An...@mpac.ca>> wrote:
Thank you Kai.. One more question please.

Does MapReduce run tasks of redundant blocks ?

Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3

Will MR attempt:

a.       to start 3 Map tasks (one per replicated block) end execute them all

b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully

c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

Thanks,

From: Kai Voigt [mailto:k@123.org<ma...@123.org>]
Sent: Monday, November 19, 2012 10:01 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: a question on NameNode

Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>>:

So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Thank you Kai and Tariq.

From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, November 19, 2012 10:20 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode

Hello Andy,

    If you have not disabled the speculative execution then your second assumption is correct.

Regards,
    Mohammad Tariq

On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <An...@mpac.ca>> wrote:
Thank you Kai.. One more question please.

Does MapReduce run tasks of redundant blocks ?

Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3

Will MR attempt:

a.       to start 3 Map tasks (one per replicated block) end execute them all

b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully

c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

Thanks,

From: Kai Voigt [mailto:k@123.org<ma...@123.org>]
Sent: Monday, November 19, 2012 10:01 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: a question on NameNode

Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>>:

So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Re: a question on NameNode

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Andy,

    If you have not disabled the speculative execution then your second
assumption is correct.

Regards,
    Mohammad Tariq



On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Thank you Kai.. One more question please.
>
>
>
> Does MapReduce run tasks of redundant blocks ?
>
>
>
> Say you have only 1 block of data replicated 3 times, one block over each
> of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1
> (replica #2) – DN3
>
>
>
> Will MR attempt:
>
>
>
> a.       to start 3 Map tasks (one per replicated block) end execute them
> all
>
> b.      to start 3 Map tasks (one per replicated block) end drop the
> other two as soon as one of the three executed successfully
>
> c.       will start only 1 Map task (for just one block avoiding all
> replicated ones) and will attempt to start (another one of the replicated
> blocks) when and only when the initially task running (say on DN1)failed
>
>
>
> Thanks,
>
>
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Monday, November 19, 2012 10:01 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: a question on NameNode
>
>
>
>
>
> Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:
>
>
>
>   So, what if DN2 is down, i.e. it is not sending any blocks’ report.
> Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no
> home and that (without replication) it has no way of reconstructing the
> file A.txt. It must spit the error then.
>
>
>
> One major feature of HDFS is its redundancy. Blocks are stored more than
> once (three times by default), so chances are good that another DataNode
> will have that block and report it during the safe mode phase. So the file
> will be accessible.
>
>
>
> Kai
>
>
>
> --
>
> Kai Voigt
>
> k@123.org
>
>
>
>
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Hi,

Am 19.11.2012 um 16:14 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> Does MapReduce run tasks of redundant blocks ?
>  
> Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 (replica #2) – DN3
>  
> Will MR attempt:
>  
> a.       to start 3 Map tasks (one per replicated block) end execute them all
> b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully
> c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

the JobTracker will schedule the map task on one node only initially. There's no need to launch the task on all nodes that have a local copy of the block.

If a task fails during its execution (node failure, e.g.), the JobTracker will launch the task again on another node with that block.

There's another advanced feature called Speculative Execution. If a task is progressing slowly through a phase (maybe due to flaky hardware), the JobTracker will launch the task in parallel on another node. The node finishing first will be used to get the task's output. The slow task will be killed.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Andy,

    If you have not disabled the speculative execution then your second
assumption is correct.

Regards,
    Mohammad Tariq



On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Thank you Kai.. One more question please.
>
>
>
> Does MapReduce run tasks of redundant blocks ?
>
>
>
> Say you have only 1 block of data replicated 3 times, one block over each
> of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1
> (replica #2) – DN3
>
>
>
> Will MR attempt:
>
>
>
> a.       to start 3 Map tasks (one per replicated block) end execute them
> all
>
> b.      to start 3 Map tasks (one per replicated block) end drop the
> other two as soon as one of the three executed successfully
>
> c.       will start only 1 Map task (for just one block avoiding all
> replicated ones) and will attempt to start (another one of the replicated
> blocks) when and only when the initially task running (say on DN1)failed
>
>
>
> Thanks,
>
>
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Monday, November 19, 2012 10:01 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: a question on NameNode
>
>
>
>
>
> Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:
>
>
>
>   So, what if DN2 is down, i.e. it is not sending any blocks’ report.
> Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no
> home and that (without replication) it has no way of reconstructing the
> file A.txt. It must spit the error then.
>
>
>
> One major feature of HDFS is its redundancy. Blocks are stored more than
> once (three times by default), so chances are good that another DataNode
> will have that block and report it during the safe mode phase. So the file
> will be accessible.
>
>
>
> Kai
>
>
>
> --
>
> Kai Voigt
>
> k@123.org
>
>
>
>
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Hi,

Am 19.11.2012 um 16:14 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> Does MapReduce run tasks of redundant blocks ?
>  
> Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 (replica #2) – DN3
>  
> Will MR attempt:
>  
> a.       to start 3 Map tasks (one per replicated block) end execute them all
> b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully
> c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

the JobTracker will schedule the map task on one node only initially. There's no need to launch the task on all nodes that have a local copy of the block.

If a task fails during its execution (node failure, e.g.), the JobTracker will launch the task again on another node with that block.

There's another advanced feature called Speculative Execution. If a task is progressing slowly through a phase (maybe due to flaky hardware), the JobTracker will launch the task in parallel on another node. The node finishing first will be used to get the task's output. The slow task will be killed.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Hi,

Am 19.11.2012 um 16:14 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> Does MapReduce run tasks of redundant blocks ?
>  
> Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 (replica #2) – DN3
>  
> Will MR attempt:
>  
> a.       to start 3 Map tasks (one per replicated block) end execute them all
> b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully
> c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

the JobTracker will schedule the map task on one node only initially. There's no need to launch the task on all nodes that have a local copy of the block.

If a task fails during its execution (node failure, e.g.), the JobTracker will launch the task again on another node with that block.

There's another advanced feature called Speculative Execution. If a task is progressing slowly through a phase (maybe due to flaky hardware), the JobTracker will launch the task in parallel on another node. The node finishing first will be used to get the task's output. The slow task will be killed.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Ted Dunning <td...@maprtech.com>.

IT sounds like you could benefit from reading the basic papers on
map-reduce in general.  Hadoop is a reasonable facsimile of the original
Google systems.

Try looking at this: http://research.google.com/archive/mapreduce.html

On Mon, Nov 19, 2012 at 7:14 AM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Thank you Kai.. One more question please.
>
>
>
> Does MapReduce run tasks of redundant blocks ?
>
>
>
> Say you have only 1 block of data replicated 3 times, one block over each
> of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1
> (replica #2) – DN3
>
>
>
> Will MR attempt:
>
>
>
> a.       to start 3 Map tasks (one per replicated block) end execute them
> all
>
> b.      to start 3 Map tasks (one per replicated block) end drop the
> other two as soon as one of the three executed successfully
>
> c.       will start only 1 Map task (for just one block avoiding all
> replicated ones) and will attempt to start (another one of the replicated
> blocks) when and only when the initially task running (say on DN1)failed
>
>
>
> Thanks,
>
>
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Monday, November 19, 2012 10:01 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: a question on NameNode
>
>
>
>
>
> Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:
>
>
>
>   So, what if DN2 is down, i.e. it is not sending any blocks’ report.
> Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no
> home and that (without replication) it has no way of reconstructing the
> file A.txt. It must spit the error then.
>
>
>
> One major feature of HDFS is its redundancy. Blocks are stored more than
> once (three times by default), so chances are good that another DataNode
> will have that block and report it during the safe mode phase. So the file
> will be accessible.
>
>
>
> Kai
>
>
>
> --
>
> Kai Voigt
>
> k@123.org
>
>
>
>
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Hi,

Am 19.11.2012 um 16:14 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> Does MapReduce run tasks of redundant blocks ?
>  
> Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 (replica #2) – DN3
>  
> Will MR attempt:
>  
> a.       to start 3 Map tasks (one per replicated block) end execute them all
> b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully
> c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

the JobTracker will schedule the map task on one node only initially. There's no need to launch the task on all nodes that have a local copy of the block.

If a task fails during its execution (node failure, e.g.), the JobTracker will launch the task again on another node with that block.

There's another advanced feature called Speculative Execution. If a task is progressing slowly through a phase (maybe due to flaky hardware), the JobTracker will launch the task in parallel on another node. The node finishing first will be used to get the task's output. The slow task will be killed.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Ted Dunning <td...@maprtech.com>.

IT sounds like you could benefit from reading the basic papers on
map-reduce in general.  Hadoop is a reasonable facsimile of the original
Google systems.

Try looking at this: http://research.google.com/archive/mapreduce.html

On Mon, Nov 19, 2012 at 7:14 AM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Thank you Kai.. One more question please.
>
>
>
> Does MapReduce run tasks of redundant blocks ?
>
>
>
> Say you have only 1 block of data replicated 3 times, one block over each
> of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1
> (replica #2) – DN3
>
>
>
> Will MR attempt:
>
>
>
> a.       to start 3 Map tasks (one per replicated block) end execute them
> all
>
> b.      to start 3 Map tasks (one per replicated block) end drop the
> other two as soon as one of the three executed successfully
>
> c.       will start only 1 Map task (for just one block avoiding all
> replicated ones) and will attempt to start (another one of the replicated
> blocks) when and only when the initially task running (say on DN1)failed
>
>
>
> Thanks,
>
>
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Monday, November 19, 2012 10:01 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: a question on NameNode
>
>
>
>
>
> Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:
>
>
>
>   So, what if DN2 is down, i.e. it is not sending any blocks’ report.
> Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no
> home and that (without replication) it has no way of reconstructing the
> file A.txt. It must spit the error then.
>
>
>
> One major feature of HDFS is its redundancy. Blocks are stored more than
> once (three times by default), so chances are good that another DataNode
> will have that block and report it during the safe mode phase. So the file
> will be accessible.
>
>
>
> Kai
>
>
>
> --
>
> Kai Voigt
>
> k@123.org
>
>
>
>
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on NameNode

Posted by Ted Dunning <td...@maprtech.com>.

IT sounds like you could benefit from reading the basic papers on
map-reduce in general.  Hadoop is a reasonable facsimile of the original
Google systems.

Try looking at this: http://research.google.com/archive/mapreduce.html

On Mon, Nov 19, 2012 at 7:14 AM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Thank you Kai.. One more question please.
>
>
>
> Does MapReduce run tasks of redundant blocks ?
>
>
>
> Say you have only 1 block of data replicated 3 times, one block over each
> of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1
> (replica #2) – DN3
>
>
>
> Will MR attempt:
>
>
>
> a.       to start 3 Map tasks (one per replicated block) end execute them
> all
>
> b.      to start 3 Map tasks (one per replicated block) end drop the
> other two as soon as one of the three executed successfully
>
> c.       will start only 1 Map task (for just one block avoiding all
> replicated ones) and will attempt to start (another one of the replicated
> blocks) when and only when the initially task running (say on DN1)failed
>
>
>
> Thanks,
>
>
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Monday, November 19, 2012 10:01 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: a question on NameNode
>
>
>
>
>
> Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:
>
>
>
>   So, what if DN2 is down, i.e. it is not sending any blocks’ report.
> Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no
> home and that (without replication) it has no way of reconstructing the
> file A.txt. It must spit the error then.
>
>
>
> One major feature of HDFS is its redundancy. Blocks are stored more than
> once (three times by default), so chances are good that another DataNode
> will have that block and report it during the safe mode phase. So the file
> will be accessible.
>
>
>
> Kai
>
>
>
> --
>
> Kai Voigt
>
> k@123.org
>
>
>
>
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on NameNode

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Andy,

    If you have not disabled the speculative execution then your second
assumption is correct.

Regards,
    Mohammad Tariq



On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Thank you Kai.. One more question please.
>
>
>
> Does MapReduce run tasks of redundant blocks ?
>
>
>
> Say you have only 1 block of data replicated 3 times, one block over each
> of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1
> (replica #2) – DN3
>
>
>
> Will MR attempt:
>
>
>
> a.       to start 3 Map tasks (one per replicated block) end execute them
> all
>
> b.      to start 3 Map tasks (one per replicated block) end drop the
> other two as soon as one of the three executed successfully
>
> c.       will start only 1 Map task (for just one block avoiding all
> replicated ones) and will attempt to start (another one of the replicated
> blocks) when and only when the initially task running (say on DN1)failed
>
>
>
> Thanks,
>
>
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Monday, November 19, 2012 10:01 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: a question on NameNode
>
>
>
>
>
> Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:
>
>
>
>   So, what if DN2 is down, i.e. it is not sending any blocks’ report.
> Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no
> home and that (without replication) it has no way of reconstructing the
> file A.txt. It must spit the error then.
>
>
>
> One major feature of HDFS is its redundancy. Blocks are stored more than
> once (three times by default), so chances are good that another DataNode
> will have that block and report it during the safe mode phase. So the file
> will be accessible.
>
>
>
> Kai
>
>
>
> --
>
> Kai Voigt
>
> k@123.org
>
>
>
>
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on NameNode

Posted by Ted Dunning <td...@maprtech.com>.

IT sounds like you could benefit from reading the basic papers on
map-reduce in general.  Hadoop is a reasonable facsimile of the original
Google systems.

Try looking at this: http://research.google.com/archive/mapreduce.html

On Mon, Nov 19, 2012 at 7:14 AM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Thank you Kai.. One more question please.
>
>
>
> Does MapReduce run tasks of redundant blocks ?
>
>
>
> Say you have only 1 block of data replicated 3 times, one block over each
> of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1
> (replica #2) – DN3
>
>
>
> Will MR attempt:
>
>
>
> a.       to start 3 Map tasks (one per replicated block) end execute them
> all
>
> b.      to start 3 Map tasks (one per replicated block) end drop the
> other two as soon as one of the three executed successfully
>
> c.       will start only 1 Map task (for just one block avoiding all
> replicated ones) and will attempt to start (another one of the replicated
> blocks) when and only when the initially task running (say on DN1)failed
>
>
>
> Thanks,
>
>
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Monday, November 19, 2012 10:01 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: a question on NameNode
>
>
>
>
>
> Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:
>
>
>
>   So, what if DN2 is down, i.e. it is not sending any blocks’ report.
> Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no
> home and that (without replication) it has no way of reconstructing the
> file A.txt. It must spit the error then.
>
>
>
> One major feature of HDFS is its redundancy. Blocks are stored more than
> once (three times by default), so chances are good that another DataNode
> will have that block and report it during the safe mode phase. So the file
> will be accessible.
>
>
>
> Kai
>
>
>
> --
>
> Kai Voigt
>
> k@123.org
>
>
>
>
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on NameNode

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Andy,

    If you have not disabled the speculative execution then your second
assumption is correct.

Regards,
    Mohammad Tariq



On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Thank you Kai.. One more question please.
>
>
>
> Does MapReduce run tasks of redundant blocks ?
>
>
>
> Say you have only 1 block of data replicated 3 times, one block over each
> of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1
> (replica #2) – DN3
>
>
>
> Will MR attempt:
>
>
>
> a.       to start 3 Map tasks (one per replicated block) end execute them
> all
>
> b.      to start 3 Map tasks (one per replicated block) end drop the
> other two as soon as one of the three executed successfully
>
> c.       will start only 1 Map task (for just one block avoiding all
> replicated ones) and will attempt to start (another one of the replicated
> blocks) when and only when the initially task running (say on DN1)failed
>
>
>
> Thanks,
>
>
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Monday, November 19, 2012 10:01 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: a question on NameNode
>
>
>
>
>
> Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:
>
>
>
>   So, what if DN2 is down, i.e. it is not sending any blocks’ report.
> Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no
> home and that (without replication) it has no way of reconstructing the
> file A.txt. It must spit the error then.
>
>
>
> One major feature of HDFS is its redundancy. Blocks are stored more than
> once (three times by default), so chances are good that another DataNode
> will have that block and report it during the safe mode phase. So the file
> will be accessible.
>
>
>
> Kai
>
>
>
> --
>
> Kai Voigt
>
> k@123.org
>
>
>
>
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Thank you Kai.. One more question please.

Does MapReduce run tasks of redundant blocks ?

Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3

Will MR attempt:


a.       to start 3 Map tasks (one per replicated block) end execute them all

b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully

c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

Thanks,

From: Kai Voigt [mailto:k@123.org]
Sent: Monday, November 19, 2012 10:01 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode


Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>>:


So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>




NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Thank you Kai.. One more question please.

Does MapReduce run tasks of redundant blocks ?

Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3

Will MR attempt:


a.       to start 3 Map tasks (one per replicated block) end execute them all

b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully

c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

Thanks,

From: Kai Voigt [mailto:k@123.org]
Sent: Monday, November 19, 2012 10:01 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode


Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>>:


So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>




NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Thank you Kai.. One more question please.

Does MapReduce run tasks of redundant blocks ?

Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3

Will MR attempt:


a.       to start 3 Map tasks (one per replicated block) end execute them all

b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully

c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

Thanks,

From: Kai Voigt [mailto:k@123.org]
Sent: Monday, November 19, 2012 10:01 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode


Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>>:


So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>




NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Thank you Kai.. One more question please.

Does MapReduce run tasks of redundant blocks ?

Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3

Will MR attempt:


a.       to start 3 Map tasks (one per replicated block) end execute them all

b.      to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully

c.       will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed

Thanks,

From: Kai Voigt [mailto:k@123.org]
Sent: Monday, November 19, 2012 10:01 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode


Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>>:


So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>




NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> So, what if DN2 is down, i.e. it is not sending any blocks’ report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> So, what if DN2 is down, i.e. it is not sending any blocks’ report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> So, what if DN2 is down, i.e. it is not sending any blocks’ report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> So, what if DN2 is down, i.e. it is not sending any blocks’ report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.

Kai

-- 
Kai Voigt
k@123.org

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Awesome, thanks.

So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

Thanks,
AK

From: Kai Voigt [mailto:k@123.org]
Sent: Monday, November 19, 2012 9:31 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode

Hi,

Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <An...@mpac.ca>>:

I am learning that NN doesn't persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.

So, how does it work?

Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.

When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Awesome, thanks.

So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

Thanks,
AK

From: Kai Voigt [mailto:k@123.org]
Sent: Monday, November 19, 2012 9:31 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode

Hi,

Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <An...@mpac.ca>>:

I am learning that NN doesn't persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.

So, how does it work?

Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.

When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Awesome, thanks.

So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

Thanks,
AK

From: Kai Voigt [mailto:k@123.org]
Sent: Monday, November 19, 2012 9:31 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode

Hi,

Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <An...@mpac.ca>>:

I am learning that NN doesn't persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.

So, how does it work?

Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.

When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

RE: a question on NameNode

Posted by "Kartashov, Andy" <An...@mpac.ca>.

Awesome, thanks.

So, what if DN2 is down, i.e. it is not sending any blocks' report.  Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.

Thanks,
AK

From: Kai Voigt [mailto:k@123.org]
Sent: Monday, November 19, 2012 9:31 AM
To: user@hadoop.apache.org
Subject: Re: a question on NameNode

Hi,

Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <An...@mpac.ca>>:

I am learning that NN doesn't persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.

So, how does it work?

Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.

When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.

Kai

--
Kai Voigt
k@123.org<ma...@123.org>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Hi,

Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> I am learning that NN doesn’t persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.
>  
> So, how does it work?
>  
> Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.
>  
> When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Hi,

Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> I am learning that NN doesn’t persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.
>  
> So, how does it work?
>  
> Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.
>  
> When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Hi,

Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> I am learning that NN doesn’t persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.
>  
> So, how does it work?
>  
> Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.
>  
> When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.

Kai

-- 
Kai Voigt
k@123.org

Re: a question on NameNode

Posted by Kai Voigt <k...@123.org>.

Hi,

Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <An...@mpac.ca>:

> I am learning that NN doesn’t persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.
>  
> So, how does it work?
>  
> Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.
>  
> When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??

when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.

Kai

-- 
Kai Voigt
k@123.org