You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Baraa Mohamad <ba...@gmail.com> on 2012/06/14 18:47:20 UTC

Cannot determine the output FILE name?

Hello,

I'm wondering why I cannot precise the output file name

for example:

C = store user_results into 'tables/user.txt';

this command create a *folder *with the name *user.txt* and inside it I
find some file part-m-0000 who stores the needed results.

My question is; can I determine the name of my output *file* and so that I
can store multiple files in the same directory (folder) , like


C = store user_results into 'tables/user.txt';
D = store client_results into 'tables/clients.txt';

Sorry if I'm asking a stupid question but I really need your help to
understand why is that not working with me

thanks

Baraa

Re: Cannot determine the output FILE name?

Posted by Baraa Mohamad <ba...@gmail.com>.

Thank you very much for your helpful answers,

Baraa

On Thu, Jun 14, 2012 at 7:30 PM, Norbert Burger <no...@gmail.com>wrote:

> No -- there's no concept of "part" files with HBaseStorage, since your data
> is written directly into an HBase table.
>
> Norbert
>
> On Thu, Jun 14, 2012 at 1:12 PM, Baraa Mohamad <
> baraa.issa.mohamad@gmail.com
> > wrote:
>
> > aha!!!!
> > Thank you very much, it's nice to know that :)
> >
> > So that would be the same even when using HbaseStorage ?
> >
> > Thanks again for your explications
> >
> > regards
> >
> > Baraa
> >
> > On Thu, Jun 14, 2012 at 6:57 PM, Norbert Burger <
> norbert.burger@gmail.com
> > >wrote:
> >
> > > Baraa -- this is standard Hadoop job functionality (that your job
> > > destination is only a target dir, and the actual file names are
> > > systematically generated).  Pig inherits this functionality.  You will
> > have
> > > 1 "part" file per reduce task of your final Hadoop job in the target
> > > directory.
> > >
> > > Probably the easiest way to combine these files together is to issue a
> > call
> > > to "hadoop fs -getmerge" as you're extracting the files out of HDFS.
> > >
> > > Norbert
> > >
> > > On Thu, Jun 14, 2012 at 12:47 PM, Baraa Mohamad <
> > > baraa.issa.mohamad@gmail.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm wondering why I cannot precise the output file name
> > > >
> > > > for example:
> > > >
> > > > C = store user_results into 'tables/user.txt';
> > > >
> > > > this command create a *folder *with the name *user.txt* and inside
> it I
> > > > find some file part-m-0000 who stores the needed results.
> > > >
> > > > My question is; can I determine the name of my output *file* and so
> > that
> > > I
> > > > can store multiple files in the same directory (folder) , like
> > > >
> > > >
> > > > C = store user_results into 'tables/user.txt';
> > > > D = store client_results into 'tables/clients.txt';
> > > >
> > > > Sorry if I'm asking a stupid question but I really need your help to
> > > > understand why is that not working with me
> > > >
> > > > thanks
> > > >
> > > > Baraa
> > > >
> > >
> >
> >
> >
> > --
> >
>



--

Re: Cannot determine the output FILE name?

Posted by Norbert Burger <no...@gmail.com>.

No -- there's no concept of "part" files with HBaseStorage, since your data
is written directly into an HBase table.

Norbert

On Thu, Jun 14, 2012 at 1:12 PM, Baraa Mohamad <baraa.issa.mohamad@gmail.com
> wrote:

> aha!!!!
> Thank you very much, it's nice to know that :)
>
> So that would be the same even when using HbaseStorage ?
>
> Thanks again for your explications
>
> regards
>
> Baraa
>
> On Thu, Jun 14, 2012 at 6:57 PM, Norbert Burger <norbert.burger@gmail.com
> >wrote:
>
> > Baraa -- this is standard Hadoop job functionality (that your job
> > destination is only a target dir, and the actual file names are
> > systematically generated).  Pig inherits this functionality.  You will
> have
> > 1 "part" file per reduce task of your final Hadoop job in the target
> > directory.
> >
> > Probably the easiest way to combine these files together is to issue a
> call
> > to "hadoop fs -getmerge" as you're extracting the files out of HDFS.
> >
> > Norbert
> >
> > On Thu, Jun 14, 2012 at 12:47 PM, Baraa Mohamad <
> > baraa.issa.mohamad@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > I'm wondering why I cannot precise the output file name
> > >
> > > for example:
> > >
> > > C = store user_results into 'tables/user.txt';
> > >
> > > this command create a *folder *with the name *user.txt* and inside it I
> > > find some file part-m-0000 who stores the needed results.
> > >
> > > My question is; can I determine the name of my output *file* and so
> that
> > I
> > > can store multiple files in the same directory (folder) , like
> > >
> > >
> > > C = store user_results into 'tables/user.txt';
> > > D = store client_results into 'tables/clients.txt';
> > >
> > > Sorry if I'm asking a stupid question but I really need your help to
> > > understand why is that not working with me
> > >
> > > thanks
> > >
> > > Baraa
> > >
> >
>
>
>
> --
>

Re: Cannot determine the output FILE name?

Posted by Baraa Mohamad <ba...@gmail.com>.

aha!!!!
Thank you very much, it's nice to know that :)

So that would be the same even when using HbaseStorage ?

Thanks again for your explications

regards

Baraa

On Thu, Jun 14, 2012 at 6:57 PM, Norbert Burger <no...@gmail.com>wrote:

> Baraa -- this is standard Hadoop job functionality (that your job
> destination is only a target dir, and the actual file names are
> systematically generated).  Pig inherits this functionality.  You will have
> 1 "part" file per reduce task of your final Hadoop job in the target
> directory.
>
> Probably the easiest way to combine these files together is to issue a call
> to "hadoop fs -getmerge" as you're extracting the files out of HDFS.
>
> Norbert
>
> On Thu, Jun 14, 2012 at 12:47 PM, Baraa Mohamad <
> baraa.issa.mohamad@gmail.com> wrote:
>
> > Hello,
> >
> > I'm wondering why I cannot precise the output file name
> >
> > for example:
> >
> > C = store user_results into 'tables/user.txt';
> >
> > this command create a *folder *with the name *user.txt* and inside it I
> > find some file part-m-0000 who stores the needed results.
> >
> > My question is; can I determine the name of my output *file* and so that
> I
> > can store multiple files in the same directory (folder) , like
> >
> >
> > C = store user_results into 'tables/user.txt';
> > D = store client_results into 'tables/clients.txt';
> >
> > Sorry if I'm asking a stupid question but I really need your help to
> > understand why is that not working with me
> >
> > thanks
> >
> > Baraa
> >
>



--

Re: Cannot determine the output FILE name?

Posted by Norbert Burger <no...@gmail.com>.

Baraa -- this is standard Hadoop job functionality (that your job
destination is only a target dir, and the actual file names are
systematically generated).  Pig inherits this functionality.  You will have
1 "part" file per reduce task of your final Hadoop job in the target
directory.

Probably the easiest way to combine these files together is to issue a call
to "hadoop fs -getmerge" as you're extracting the files out of HDFS.

Norbert

On Thu, Jun 14, 2012 at 12:47 PM, Baraa Mohamad <
baraa.issa.mohamad@gmail.com> wrote:

> Hello,
>
> I'm wondering why I cannot precise the output file name
>
> for example:
>
> C = store user_results into 'tables/user.txt';
>
> this command create a *folder *with the name *user.txt* and inside it I
> find some file part-m-0000 who stores the needed results.
>
> My question is; can I determine the name of my output *file* and so that I
> can store multiple files in the same directory (folder) , like
>
>
> C = store user_results into 'tables/user.txt';
> D = store client_results into 'tables/clients.txt';
>
> Sorry if I'm asking a stupid question but I really need your help to
> understand why is that not working with me
>
> thanks
>
> Baraa
>