You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Harrison Cavallero <ha...@cavallero.me> on 2014/08/18 19:03:17 UTC

Filename in load

Hey all,

I'm loading a group of csv files into pig storage, and I would like to
include the filename in each tuple loaded from that file. So as to
differentiate the tuple as unique to coming from that file (each file is
for a particular user).

So for example:
csv_all =LOAD 'sample1.csv, sample2.csv' USING PigStorage('|')
AS (upc:chararray, store_id:int, date:chararray,
product_description:chararray);

Is there a way to load each tuple from each csv to include another field
that contains the filename or part of it (like filename:chararry)?

Thanks in advance!

-- 
Harrison Cavallero

*cavallero.me <http://cavallero.me>*

Re: Filename in load

Posted by Harrison Cavallero <ha...@cavallero.me>.
Awesome, thanks Prashant! Can't believe I missed that :)


On Mon, Aug 18, 2014 at 11:01 AM, Prashant Kommireddi <pr...@gmail.com>
wrote:

> Take a look at tagFile/tagPath options
>
>
> http://pig.apache.org/docs/r0.13.0/api/org/apache/pig/builtin/PigStorage.html
>
> On Monday, August 18, 2014, Harrison Cavallero <ha...@cavallero.me>
> wrote:
>
> > Hey all,
> >
> > I'm loading a group of csv files into pig storage, and I would like to
> > include the filename in each tuple loaded from that file. So as to
> > differentiate the tuple as unique to coming from that file (each file is
> > for a particular user).
> >
> > So for example:
> > csv_all =LOAD 'sample1.csv, sample2.csv' USING PigStorage('|')
> > AS (upc:chararray, store_id:int, date:chararray,
> > product_description:chararray);
> >
> > Is there a way to load each tuple from each csv to include another field
> > that contains the filename or part of it (like filename:chararry)?
> >
> > Thanks in advance!
> >
> > --
> > Harrison Cavallero
> >
> > *cavallero.me <http://cavallero.me>*
> >
>



-- 
Harrison Cavallero

*cavallero.me <http://cavallero.me>*

Re: Filename in load

Posted by Prashant Kommireddi <pr...@gmail.com>.
Take a look at tagFile/tagPath options

http://pig.apache.org/docs/r0.13.0/api/org/apache/pig/builtin/PigStorage.html

On Monday, August 18, 2014, Harrison Cavallero <ha...@cavallero.me>
wrote:

> Hey all,
>
> I'm loading a group of csv files into pig storage, and I would like to
> include the filename in each tuple loaded from that file. So as to
> differentiate the tuple as unique to coming from that file (each file is
> for a particular user).
>
> So for example:
> csv_all =LOAD 'sample1.csv, sample2.csv' USING PigStorage('|')
> AS (upc:chararray, store_id:int, date:chararray,
> product_description:chararray);
>
> Is there a way to load each tuple from each csv to include another field
> that contains the filename or part of it (like filename:chararry)?
>
> Thanks in advance!
>
> --
> Harrison Cavallero
>
> *cavallero.me <http://cavallero.me>*
>