You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ronald Green <gr...@gmail.com> on 2015/03/05 15:18:52 UTC

Loading multiple files, each file as a record

Hi,

I'm looking for a loader function that will let me read each file as a
record on its own so I'll be able to treat each as a single record/field.
For example:

a = load '/files' USING TheLoader() as (file:chararray);
b = foreach a GENERATE REGEX_EXTRACT(file,'...');

PigStorage and TextLoader return each line in the file as a record/tuple.

Do you know any other loader that allows to get an entire file as a record?

Thanks,
Ron

Re: Loading multiple files, each file as a record

Posted by Ronald Green <gr...@gmail.com>.
Thanks for your suggestion.

I wouldn't want to run a map reduce job just to just get the file in a
single tuple. But also, I can't be sure I get the lines sorted within the
group, in the same order they are in the file.

Thanks

On 10 March 2015 at 06:39, Arvind S <ar...@gmail.com> wrote:

> while loading file you can attempt to use
> PigStorage(',','-tagFile')
> then regex on each line of the file .. then group by file name
>
>
> https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/builtin/PigStorage.html
>
> *Cheers !!*
> Arvind
>
> On Fri, Mar 6, 2015 at 2:26 AM, Daniel Dai <da...@hortonworks.com> wrote:
>
> > Didn¹t realize any, but it should be pretty easy to write a customized
> > Loader/InputFormat for that.
> >
> > Daniel
> >
> > On 3/5/15, 6:18 AM, "Ronald Green" <gr...@gmail.com> wrote:
> >
> > >Hi,
> > >
> > >I'm looking for a loader function that will let me read each file as a
> > >record on its own so I'll be able to treat each as a single
> record/field.
> > >For example:
> > >
> > >a = load '/files' USING TheLoader() as (file:chararray);
> > >b = foreach a GENERATE REGEX_EXTRACT(file,'...');
> > >
> > >PigStorage and TextLoader return each line in the file as a
> record/tuple.
> > >
> > >Do you know any other loader that allows to get an entire file as a
> > >record?
> > >
> > >Thanks,
> > >Ron
> >
> >
>

Re: Loading multiple files, each file as a record

Posted by Arvind S <ar...@gmail.com>.
while loading file you can attempt to use
PigStorage(',','-tagFile')
then regex on each line of the file .. then group by file name

https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/builtin/PigStorage.html

*Cheers !!*
Arvind

On Fri, Mar 6, 2015 at 2:26 AM, Daniel Dai <da...@hortonworks.com> wrote:

> Didn¹t realize any, but it should be pretty easy to write a customized
> Loader/InputFormat for that.
>
> Daniel
>
> On 3/5/15, 6:18 AM, "Ronald Green" <gr...@gmail.com> wrote:
>
> >Hi,
> >
> >I'm looking for a loader function that will let me read each file as a
> >record on its own so I'll be able to treat each as a single record/field.
> >For example:
> >
> >a = load '/files' USING TheLoader() as (file:chararray);
> >b = foreach a GENERATE REGEX_EXTRACT(file,'...');
> >
> >PigStorage and TextLoader return each line in the file as a record/tuple.
> >
> >Do you know any other loader that allows to get an entire file as a
> >record?
> >
> >Thanks,
> >Ron
>
>

Re: Loading multiple files, each file as a record

Posted by Daniel Dai <da...@hortonworks.com>.
Didn¹t realize any, but it should be pretty easy to write a customized
Loader/InputFormat for that.

Daniel

On 3/5/15, 6:18 AM, "Ronald Green" <gr...@gmail.com> wrote:

>Hi,
>
>I'm looking for a loader function that will let me read each file as a
>record on its own so I'll be able to treat each as a single record/field.
>For example:
>
>a = load '/files' USING TheLoader() as (file:chararray);
>b = foreach a GENERATE REGEX_EXTRACT(file,'...');
>
>PigStorage and TextLoader return each line in the file as a record/tuple.
>
>Do you know any other loader that allows to get an entire file as a
>record?
>
>Thanks,
>Ron