You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by "Kartashov, Andy" <An...@mpac.ca> on 2012/11/09 17:13:35 UTC

a few questions

Guys,

A few questions please.

1. When I tried to run Oozie examples I was told to copy to copy /examples folder into HDFS. However when I tried to run oozie job I was told that the source file was not found. Well, until I cd'ed into the local directory on Linux and re-run the job successfully.
What was the point copying the examples into HDFS if they are started from local Linux FS.
This opens up a similar question. When I create my jobs I run .jar file from local FS and not from HDFS. I have though that was the way. Can you actually tell hadoop where to run the MR job from: local or HDFS?

2. I noticed that Linux created group "hadoop" and adds hdfs/mapreduce user to it after the installation. However, inside HDFS the group is called "supergroup". Can someone elaborate on that?

Thanks,
AK47
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Re: a few questions

Posted by Matt Goeke <go...@gmail.com>.

Inline: HTH


On Fri, Nov 9, 2012 at 10:13 AM, Kartashov, Andy <An...@mpac.ca>wrote:

> Guys,
>
> A few questions please.
>
> 1. When I tried to run Oozie examples I was told to copy to copy /examples
> folder into HDFS. However when I tried to run oozie job I was told that the
> source file was not found. Well, until I cd'ed into the local directory on
> Linux and re-run the job successfully.
> What was the point copying the examples into HDFS if they are started from
> local Linux FS.
> This opens up a similar question. When I create my jobs I run .jar file
> from local FS and not from HDFS.  I have though that was the way. Can you
> actually tell hadoop where to run the MR job from: local or HDFS?
>
>
You can actually specify that your hdfs instance run off of file:/// OR
have your MR use the local runner and that may be what you are seeing in
this case. Normally with Oozie the workflow.xml needs to be mounted in HDFS
and the local config over CLI/HTTP request needs to point the job config to
the folder they are in.


> 2. I noticed that Linux created group "hadoop" and adds hdfs/mapreduce
> user to it after the installation. However, inside HDFS the group is called
> "supergroup". Can someone elaborate on that?
>
>
Supergroup in HDFS refers to whomever the superuser is currently in HDFS*
(normally it is the hdfs user because that is the user that starts the
daemon in certain distros). You can always go in and overwrite this using
chown and a local group that adds more clarity to your permissions
structure. If you are using CDH (which it sounds like from your
explanation) then the startup scripts use the hdfs user to start the
namenode/datanodes and mapreduce to start the jobtracker/tasktrackers.

*I can't remember if mapreduce is included in this group by default but I
don't think so

Thanks,
> AK47
> NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a few questions

Posted by Matt Goeke <go...@gmail.com>.

Inline: HTH


On Fri, Nov 9, 2012 at 10:13 AM, Kartashov, Andy <An...@mpac.ca>wrote:

> Guys,
>
> A few questions please.
>
> 1. When I tried to run Oozie examples I was told to copy to copy /examples
> folder into HDFS. However when I tried to run oozie job I was told that the
> source file was not found. Well, until I cd'ed into the local directory on
> Linux and re-run the job successfully.
> What was the point copying the examples into HDFS if they are started from
> local Linux FS.
> This opens up a similar question. When I create my jobs I run .jar file
> from local FS and not from HDFS.  I have though that was the way. Can you
> actually tell hadoop where to run the MR job from: local or HDFS?
>
>
You can actually specify that your hdfs instance run off of file:/// OR
have your MR use the local runner and that may be what you are seeing in
this case. Normally with Oozie the workflow.xml needs to be mounted in HDFS
and the local config over CLI/HTTP request needs to point the job config to
the folder they are in.


> 2. I noticed that Linux created group "hadoop" and adds hdfs/mapreduce
> user to it after the installation. However, inside HDFS the group is called
> "supergroup". Can someone elaborate on that?
>
>
Supergroup in HDFS refers to whomever the superuser is currently in HDFS*
(normally it is the hdfs user because that is the user that starts the
daemon in certain distros). You can always go in and overwrite this using
chown and a local group that adds more clarity to your permissions
structure. If you are using CDH (which it sounds like from your
explanation) then the startup scripts use the hdfs user to start the
namenode/datanodes and mapreduce to start the jobtracker/tasktrackers.

*I can't remember if mapreduce is included in this group by default but I
don't think so

Thanks,
> AK47
> NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a few questions

Posted by Matt Goeke <go...@gmail.com>.

Inline: HTH


On Fri, Nov 9, 2012 at 10:13 AM, Kartashov, Andy <An...@mpac.ca>wrote:

> Guys,
>
> A few questions please.
>
> 1. When I tried to run Oozie examples I was told to copy to copy /examples
> folder into HDFS. However when I tried to run oozie job I was told that the
> source file was not found. Well, until I cd'ed into the local directory on
> Linux and re-run the job successfully.
> What was the point copying the examples into HDFS if they are started from
> local Linux FS.
> This opens up a similar question. When I create my jobs I run .jar file
> from local FS and not from HDFS.  I have though that was the way. Can you
> actually tell hadoop where to run the MR job from: local or HDFS?
>
>
You can actually specify that your hdfs instance run off of file:/// OR
have your MR use the local runner and that may be what you are seeing in
this case. Normally with Oozie the workflow.xml needs to be mounted in HDFS
and the local config over CLI/HTTP request needs to point the job config to
the folder they are in.


> 2. I noticed that Linux created group "hadoop" and adds hdfs/mapreduce
> user to it after the installation. However, inside HDFS the group is called
> "supergroup". Can someone elaborate on that?
>
>
Supergroup in HDFS refers to whomever the superuser is currently in HDFS*
(normally it is the hdfs user because that is the user that starts the
daemon in certain distros). You can always go in and overwrite this using
chown and a local group that adds more clarity to your permissions
structure. If you are using CDH (which it sounds like from your
explanation) then the startup scripts use the hdfs user to start the
namenode/datanodes and mapreduce to start the jobtracker/tasktrackers.

*I can't remember if mapreduce is included in this group by default but I
don't think so

Thanks,
> AK47
> NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>

Re: a few questions

Posted by Matt Goeke <go...@gmail.com>.

Inline: HTH


On Fri, Nov 9, 2012 at 10:13 AM, Kartashov, Andy <An...@mpac.ca>wrote:

> Guys,
>
> A few questions please.
>
> 1. When I tried to run Oozie examples I was told to copy to copy /examples
> folder into HDFS. However when I tried to run oozie job I was told that the
> source file was not found. Well, until I cd'ed into the local directory on
> Linux and re-run the job successfully.
> What was the point copying the examples into HDFS if they are started from
> local Linux FS.
> This opens up a similar question. When I create my jobs I run .jar file
> from local FS and not from HDFS.  I have though that was the way. Can you
> actually tell hadoop where to run the MR job from: local or HDFS?
>
>
You can actually specify that your hdfs instance run off of file:/// OR
have your MR use the local runner and that may be what you are seeing in
this case. Normally with Oozie the workflow.xml needs to be mounted in HDFS
and the local config over CLI/HTTP request needs to point the job config to
the folder they are in.


> 2. I noticed that Linux created group "hadoop" and adds hdfs/mapreduce
> user to it after the installation. However, inside HDFS the group is called
> "supergroup". Can someone elaborate on that?
>
>
Supergroup in HDFS refers to whomever the superuser is currently in HDFS*
(normally it is the hdfs user because that is the user that starts the
daemon in certain distros). You can always go in and overwrite this using
chown and a local group that adds more clarity to your permissions
structure. If you are using CDH (which it sounds like from your
explanation) then the startup scripts use the hdfs user to start the
namenode/datanodes and mapreduce to start the jobtracker/tasktrackers.

*I can't remember if mapreduce is included in this group by default but I
don't think so

Thanks,
> AK47
> NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>