You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by "Kartashov, Andy" <An...@mpac.ca> on 2012/11/19 15:37:07 UTC

RE: a question on MapReduce

Guys,

Sometimes when I run my MR job I see that Reduce tasks kick in as early as when Map task reached only about 20%. How can the MR be possibly so sure and start running Reduce at this point? What if a Mapper  produce more keys that Reduce function already finished with?

Andy Kartashov
MPAC
IT Architecture, Co-op
1340 Pickering Parkway, Pickering, L1V 0C4
* Phone : (905) 837 6269
* Mobile: (416) 722 1787
andy.kartashov@mpac.ca<ma...@mpac.ca>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

RE: a question on MapReduce

Posted by "Kartashov, Andy" <An...@mpac.ca>.
Hehe,... good to know. Thanks.


From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, November 19, 2012 9:50 AM
To: user@hadoop.apache.org
Subject: Re: a question on MapReduce

Hello Andy,

  Reduce phase starts only once the Map phase is 100% complete. The reduce progress you see actually signifies other intermediate processes like shuffle and sort. Don't get confused with it like I did initially :)

Regards,
    Mohammad Tariq


On Mon, Nov 19, 2012 at 8:07 PM, Kartashov, Andy <An...@mpac.ca>> wrote:
Guys,

Sometimes when I run my MR job I see that Reduce tasks kick in as early as when Map task reached only about 20%. How can the MR be possibly so sure and start running Reduce at this point? What if a Mapper  produce more keys that Reduce function already finished with?

Andy Kartashov
MPAC
IT Architecture, Co-op
1340 Pickering Parkway, Pickering, L1V 0C4
* Phone : (905) 837 6269
* Mobile: (416) 722 1787
andy.kartashov@mpac.ca<ma...@mpac.ca>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: a question on MapReduce

Posted by "Kartashov, Andy" <An...@mpac.ca>.
Hehe,... good to know. Thanks.


From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, November 19, 2012 9:50 AM
To: user@hadoop.apache.org
Subject: Re: a question on MapReduce

Hello Andy,

  Reduce phase starts only once the Map phase is 100% complete. The reduce progress you see actually signifies other intermediate processes like shuffle and sort. Don't get confused with it like I did initially :)

Regards,
    Mohammad Tariq


On Mon, Nov 19, 2012 at 8:07 PM, Kartashov, Andy <An...@mpac.ca>> wrote:
Guys,

Sometimes when I run my MR job I see that Reduce tasks kick in as early as when Map task reached only about 20%. How can the MR be possibly so sure and start running Reduce at this point? What if a Mapper  produce more keys that Reduce function already finished with?

Andy Kartashov
MPAC
IT Architecture, Co-op
1340 Pickering Parkway, Pickering, L1V 0C4
* Phone : (905) 837 6269
* Mobile: (416) 722 1787
andy.kartashov@mpac.ca<ma...@mpac.ca>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: a question on MapReduce

Posted by "Kartashov, Andy" <An...@mpac.ca>.
Hehe,... good to know. Thanks.


From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, November 19, 2012 9:50 AM
To: user@hadoop.apache.org
Subject: Re: a question on MapReduce

Hello Andy,

  Reduce phase starts only once the Map phase is 100% complete. The reduce progress you see actually signifies other intermediate processes like shuffle and sort. Don't get confused with it like I did initially :)

Regards,
    Mohammad Tariq


On Mon, Nov 19, 2012 at 8:07 PM, Kartashov, Andy <An...@mpac.ca>> wrote:
Guys,

Sometimes when I run my MR job I see that Reduce tasks kick in as early as when Map task reached only about 20%. How can the MR be possibly so sure and start running Reduce at this point? What if a Mapper  produce more keys that Reduce function already finished with?

Andy Kartashov
MPAC
IT Architecture, Co-op
1340 Pickering Parkway, Pickering, L1V 0C4
* Phone : (905) 837 6269
* Mobile: (416) 722 1787
andy.kartashov@mpac.ca<ma...@mpac.ca>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: a question on MapReduce

Posted by "Kartashov, Andy" <An...@mpac.ca>.
Hehe,... good to know. Thanks.


From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, November 19, 2012 9:50 AM
To: user@hadoop.apache.org
Subject: Re: a question on MapReduce

Hello Andy,

  Reduce phase starts only once the Map phase is 100% complete. The reduce progress you see actually signifies other intermediate processes like shuffle and sort. Don't get confused with it like I did initially :)

Regards,
    Mohammad Tariq


On Mon, Nov 19, 2012 at 8:07 PM, Kartashov, Andy <An...@mpac.ca>> wrote:
Guys,

Sometimes when I run my MR job I see that Reduce tasks kick in as early as when Map task reached only about 20%. How can the MR be possibly so sure and start running Reduce at this point? What if a Mapper  produce more keys that Reduce function already finished with?

Andy Kartashov
MPAC
IT Architecture, Co-op
1340 Pickering Parkway, Pickering, L1V 0C4
* Phone : (905) 837 6269
* Mobile: (416) 722 1787
andy.kartashov@mpac.ca<ma...@mpac.ca>

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Re: a question on MapReduce

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

  Reduce phase starts only once the Map phase is 100% complete. The reduce
progress you see actually signifies other intermediate processes like
shuffle and sort. Don't get confused with it like I did initially :)

Regards,
    Mohammad Tariq



On Mon, Nov 19, 2012 at 8:07 PM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Guys,
>
>
>
> Sometimes when I run my MR job I see that Reduce tasks kick in as early as
> when Map task reached only about 20%. How can the MR be possibly so sure
> and start running Reduce at this point? What if a Mapper  produce more keys
> that Reduce function already finished with?
>
>
>
> Andy Kartashov
>
> *MPAC*
>
> IT Architecture, Co-op
>
> 1340 Pickering Parkway, Pickering, L1V 0C4
>
> ( Phone : (905) 837 6269
>
> ( Mobile: (416) 722 1787
>
> *andy.kartashov@mpac.ca*
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on MapReduce

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

  Reduce phase starts only once the Map phase is 100% complete. The reduce
progress you see actually signifies other intermediate processes like
shuffle and sort. Don't get confused with it like I did initially :)

Regards,
    Mohammad Tariq



On Mon, Nov 19, 2012 at 8:07 PM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Guys,
>
>
>
> Sometimes when I run my MR job I see that Reduce tasks kick in as early as
> when Map task reached only about 20%. How can the MR be possibly so sure
> and start running Reduce at this point? What if a Mapper  produce more keys
> that Reduce function already finished with?
>
>
>
> Andy Kartashov
>
> *MPAC*
>
> IT Architecture, Co-op
>
> 1340 Pickering Parkway, Pickering, L1V 0C4
>
> ( Phone : (905) 837 6269
>
> ( Mobile: (416) 722 1787
>
> *andy.kartashov@mpac.ca*
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on MapReduce

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

  Reduce phase starts only once the Map phase is 100% complete. The reduce
progress you see actually signifies other intermediate processes like
shuffle and sort. Don't get confused with it like I did initially :)

Regards,
    Mohammad Tariq



On Mon, Nov 19, 2012 at 8:07 PM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Guys,
>
>
>
> Sometimes when I run my MR job I see that Reduce tasks kick in as early as
> when Map task reached only about 20%. How can the MR be possibly so sure
> and start running Reduce at this point? What if a Mapper  produce more keys
> that Reduce function already finished with?
>
>
>
> Andy Kartashov
>
> *MPAC*
>
> IT Architecture, Co-op
>
> 1340 Pickering Parkway, Pickering, L1V 0C4
>
> ( Phone : (905) 837 6269
>
> ( Mobile: (416) 722 1787
>
> *andy.kartashov@mpac.ca*
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a question on MapReduce

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Andy,

  Reduce phase starts only once the Map phase is 100% complete. The reduce
progress you see actually signifies other intermediate processes like
shuffle and sort. Don't get confused with it like I did initially :)

Regards,
    Mohammad Tariq



On Mon, Nov 19, 2012 at 8:07 PM, Kartashov, Andy <An...@mpac.ca>wrote:

>  Guys,
>
>
>
> Sometimes when I run my MR job I see that Reduce tasks kick in as early as
> when Map task reached only about 20%. How can the MR be possibly so sure
> and start running Reduce at this point? What if a Mapper  produce more keys
> that Reduce function already finished with?
>
>
>
> Andy Kartashov
>
> *MPAC*
>
> IT Architecture, Co-op
>
> 1340 Pickering Parkway, Pickering, L1V 0C4
>
> ( Phone : (905) 837 6269
>
> ( Mobile: (416) 722 1787
>
> *andy.kartashov@mpac.ca*
>
>
>  NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>