You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Panshul Whisper <ou...@gmail.com> on 2013/03/28 13:35:48 UTC

perform copyfromlocal

Hello,

sorry for a novice question, but I have the following question:

1. How do I give a pig script file to a workflow if the file is stored on
the local filesystem.
2. If i need to perform a copyfomlocal before i execute the pig script,
what action type should I use? Please give an example if possible.
3. I am using CDH4 Hue interface for creating workflow. Any pointers with
that perspective will also help.

Thanking You,
-- 
Regards,
Ouch Whisper
010101010101

Re: perform copyfromlocal

Posted by Robert Kanter <rk...@cloudera.com>.
Hi Panshul,

1. The Pig script has to be in HDFS.  The Oozie server nor the map reduce
job it launches won't have access to your local machine to pick up the
script.

2. I'm guessing you want a way for your Oozie workflow to automatically
upload the script from your local machine to HDFS before the Pig action?
 The simplest way to do something like this is probably using a shell
script (not an Oozie action) that uploads the Pig script to HDFS using the
coppyfromlocal command and then submits the Oozie job.

3. If you're using Hue, I believe it will ask you for the Pig script (in
HDFS) when you create a Pig action in your workflow; so you'd have to have
it already in HDFS anyway.  You may want to re-ask this question in the Hue
user mailing list to get a better answer.


- Robert




On Thu, Mar 28, 2013 at 5:35 AM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> sorry for a novice question, but I have the following question:
>
> 1. How do I give a pig script file to a workflow if the file is stored on
> the local filesystem.
> 2. If i need to perform a copyfomlocal before i execute the pig script,
> what action type should I use? Please give an example if possible.
> 3. I am using CDH4 Hue interface for creating workflow. Any pointers with
> that perspective will also help.
>
> Thanking You,
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: perform copyfromlocal

Posted by Robert Kanter <rk...@cloudera.com>.
I believe that Hadoop supports S3, so it may work, but I've never tried
using S3 for that; the easiest way to see if that will work is for you to
try it.  You'll likely have to set
oozie.service.HadoopAccessorService.supported.filesystems to "hdfs,s3" in
oozie-site.xml.

- Robert


On Fri, Mar 29, 2013 at 12:19 PM, Panshul Whisper <ou...@gmail.com>wrote:

> Hello,
>
> Thank you for the responses.
> I get the point that it is not possible to load a pig script file from
> local file system from within a oozie workflow, not even while using the
> Hue - oozie interface.
>
> Is it possible to load a pig script stored on S3 into a workflow, if the
> cluster is an EC2 cluster. running CDH4.
>
> Can i specify the pig script file in the normal way as I would refer to any
> normal file on S3 from within hdfs?
>
> Thank you for the help,
>
> Regards,
>
>
> On Thu, Mar 28, 2013 at 10:44 PM, Robert Kanter <rkanter@cloudera.com
> >wrote:
>
> > DistCp (and thus the DistCp action itself) are meant for copying large
> > amounts of data and files between two Hadoop clusters or within a single
> > cluster.  As far as I know, it won't accept a local filesystem path or an
> > ftp/sftp path.
> >
> > - Robert
> >
> >
> > On Thu, Mar 28, 2013 at 10:02 AM, Harish Krishnan <
> > harish.t.krishnan@gmail.com> wrote:
> >
> > > Can we use distcp action to copy from local file system to hdfs?
> > > Use sftp:// for files in local file system and hdfs:// for destination
> > dir.
> > >
> > > Thanks & Regards,
> > > Harish.T.K
> > >
> > >
> > > On Thu, Mar 28, 2013 at 9:35 AM, Ryota Egashira <
> egashira@yahoo-inc.com
> > > >wrote:
> > >
> > > > Hi, Panshul
> > > >
> > > > >1)
> > > > You might need to upload pig script to HDFS (e..g, using hadoop dfs
> > > > command) before running workflow.
> > > > >2)
> > > > AFAIK, it is not common way to do copyFromLocal as part of workflow,
> > > since
> > > > workflow action is running on tasktracker node as M/R job.
> > > > once pig script uploaded on HDFS, Oozie takes care of copying it from
> > > HDFS
> > > > to tasktracker node using Hadoop distributed cache mechanism before
> > > > running pig action, and we don't have to worry about it.
> > > >
> > > > I guess Cloudera folks have answer on 3).
> > > >
> > > > Hope it helps.
> > > > Ryota
> > > >
> > > > On 3/28/13 5:35 AM, "Panshul Whisper" <ou...@gmail.com> wrote:
> > > >
> > > > >Hello,
> > > > >
> > > > >sorry for a novice question, but I have the following question:
> > > > >
> > > > >1. How do I give a pig script file to a workflow if the file is
> stored
> > > on
> > > > >the local filesystem.
> > > > >2. If i need to perform a copyfomlocal before i execute the pig
> > script,
> > > > >what action type should I use? Please give an example if possible.
> > > > >3. I am using CDH4 Hue interface for creating workflow. Any pointers
> > > with
> > > > >that perspective will also help.
> > > > >
> > > > >Thanking You,
> > > > >--
> > > > >Regards,
> > > > >Ouch Whisper
> > > > >010101010101
> > > >
> > > >
> > >
> >
>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Re: perform copyfromlocal

Posted by Panshul Whisper <ou...@gmail.com>.
Hello,

Thank you for the responses.
I get the point that it is not possible to load a pig script file from
local file system from within a oozie workflow, not even while using the
Hue - oozie interface.

Is it possible to load a pig script stored on S3 into a workflow, if the
cluster is an EC2 cluster. running CDH4.

Can i specify the pig script file in the normal way as I would refer to any
normal file on S3 from within hdfs?

Thank you for the help,

Regards,


On Thu, Mar 28, 2013 at 10:44 PM, Robert Kanter <rk...@cloudera.com>wrote:

> DistCp (and thus the DistCp action itself) are meant for copying large
> amounts of data and files between two Hadoop clusters or within a single
> cluster.  As far as I know, it won't accept a local filesystem path or an
> ftp/sftp path.
>
> - Robert
>
>
> On Thu, Mar 28, 2013 at 10:02 AM, Harish Krishnan <
> harish.t.krishnan@gmail.com> wrote:
>
> > Can we use distcp action to copy from local file system to hdfs?
> > Use sftp:// for files in local file system and hdfs:// for destination
> dir.
> >
> > Thanks & Regards,
> > Harish.T.K
> >
> >
> > On Thu, Mar 28, 2013 at 9:35 AM, Ryota Egashira <egashira@yahoo-inc.com
> > >wrote:
> >
> > > Hi, Panshul
> > >
> > > >1)
> > > You might need to upload pig script to HDFS (e..g, using hadoop dfs
> > > command) before running workflow.
> > > >2)
> > > AFAIK, it is not common way to do copyFromLocal as part of workflow,
> > since
> > > workflow action is running on tasktracker node as M/R job.
> > > once pig script uploaded on HDFS, Oozie takes care of copying it from
> > HDFS
> > > to tasktracker node using Hadoop distributed cache mechanism before
> > > running pig action, and we don't have to worry about it.
> > >
> > > I guess Cloudera folks have answer on 3).
> > >
> > > Hope it helps.
> > > Ryota
> > >
> > > On 3/28/13 5:35 AM, "Panshul Whisper" <ou...@gmail.com> wrote:
> > >
> > > >Hello,
> > > >
> > > >sorry for a novice question, but I have the following question:
> > > >
> > > >1. How do I give a pig script file to a workflow if the file is stored
> > on
> > > >the local filesystem.
> > > >2. If i need to perform a copyfomlocal before i execute the pig
> script,
> > > >what action type should I use? Please give an example if possible.
> > > >3. I am using CDH4 Hue interface for creating workflow. Any pointers
> > with
> > > >that perspective will also help.
> > > >
> > > >Thanking You,
> > > >--
> > > >Regards,
> > > >Ouch Whisper
> > > >010101010101
> > >
> > >
> >
>



-- 
Regards,
Ouch Whisper
010101010101

Re: perform copyfromlocal

Posted by Robert Kanter <rk...@cloudera.com>.
DistCp (and thus the DistCp action itself) are meant for copying large
amounts of data and files between two Hadoop clusters or within a single
cluster.  As far as I know, it won't accept a local filesystem path or an
ftp/sftp path.

- Robert


On Thu, Mar 28, 2013 at 10:02 AM, Harish Krishnan <
harish.t.krishnan@gmail.com> wrote:

> Can we use distcp action to copy from local file system to hdfs?
> Use sftp:// for files in local file system and hdfs:// for destination dir.
>
> Thanks & Regards,
> Harish.T.K
>
>
> On Thu, Mar 28, 2013 at 9:35 AM, Ryota Egashira <egashira@yahoo-inc.com
> >wrote:
>
> > Hi, Panshul
> >
> > >1)
> > You might need to upload pig script to HDFS (e..g, using hadoop dfs
> > command) before running workflow.
> > >2)
> > AFAIK, it is not common way to do copyFromLocal as part of workflow,
> since
> > workflow action is running on tasktracker node as M/R job.
> > once pig script uploaded on HDFS, Oozie takes care of copying it from
> HDFS
> > to tasktracker node using Hadoop distributed cache mechanism before
> > running pig action, and we don't have to worry about it.
> >
> > I guess Cloudera folks have answer on 3).
> >
> > Hope it helps.
> > Ryota
> >
> > On 3/28/13 5:35 AM, "Panshul Whisper" <ou...@gmail.com> wrote:
> >
> > >Hello,
> > >
> > >sorry for a novice question, but I have the following question:
> > >
> > >1. How do I give a pig script file to a workflow if the file is stored
> on
> > >the local filesystem.
> > >2. If i need to perform a copyfomlocal before i execute the pig script,
> > >what action type should I use? Please give an example if possible.
> > >3. I am using CDH4 Hue interface for creating workflow. Any pointers
> with
> > >that perspective will also help.
> > >
> > >Thanking You,
> > >--
> > >Regards,
> > >Ouch Whisper
> > >010101010101
> >
> >
>

Re: perform copyfromlocal

Posted by Harish Krishnan <ha...@gmail.com>.
Can we use distcp action to copy from local file system to hdfs?
Use sftp:// for files in local file system and hdfs:// for destination dir.

Thanks & Regards,
Harish.T.K


On Thu, Mar 28, 2013 at 9:35 AM, Ryota Egashira <eg...@yahoo-inc.com>wrote:

> Hi, Panshul
>
> >1)
> You might need to upload pig script to HDFS (e..g, using hadoop dfs
> command) before running workflow.
> >2)
> AFAIK, it is not common way to do copyFromLocal as part of workflow, since
> workflow action is running on tasktracker node as M/R job.
> once pig script uploaded on HDFS, Oozie takes care of copying it from HDFS
> to tasktracker node using Hadoop distributed cache mechanism before
> running pig action, and we don't have to worry about it.
>
> I guess Cloudera folks have answer on 3).
>
> Hope it helps.
> Ryota
>
> On 3/28/13 5:35 AM, "Panshul Whisper" <ou...@gmail.com> wrote:
>
> >Hello,
> >
> >sorry for a novice question, but I have the following question:
> >
> >1. How do I give a pig script file to a workflow if the file is stored on
> >the local filesystem.
> >2. If i need to perform a copyfomlocal before i execute the pig script,
> >what action type should I use? Please give an example if possible.
> >3. I am using CDH4 Hue interface for creating workflow. Any pointers with
> >that perspective will also help.
> >
> >Thanking You,
> >--
> >Regards,
> >Ouch Whisper
> >010101010101
>
>

Re: perform copyfromlocal

Posted by Ryota Egashira <eg...@yahoo-inc.com>.
Hi, Panshul

>1)
You might need to upload pig script to HDFS (e..g, using hadoop dfs
command) before running workflow.
>2)
AFAIK, it is not common way to do copyFromLocal as part of workflow, since
workflow action is running on tasktracker node as M/R job.
once pig script uploaded on HDFS, Oozie takes care of copying it from HDFS
to tasktracker node using Hadoop distributed cache mechanism before
running pig action, and we don't have to worry about it.

I guess Cloudera folks have answer on 3).

Hope it helps.
Ryota

On 3/28/13 5:35 AM, "Panshul Whisper" <ou...@gmail.com> wrote:

>Hello,
>
>sorry for a novice question, but I have the following question:
>
>1. How do I give a pig script file to a workflow if the file is stored on
>the local filesystem.
>2. If i need to perform a copyfomlocal before i execute the pig script,
>what action type should I use? Please give an example if possible.
>3. I am using CDH4 Hue interface for creating workflow. Any pointers with
>that perspective will also help.
>
>Thanking You,
>-- 
>Regards,
>Ouch Whisper
>010101010101