You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by L N <nk...@gmail.com> on 2012/12/09 17:51:42 UTC

Re: PIG script - PIGStorage

Hi,



> I have an unstructured file format. Assume below is the data in a file
>
> <x1,value1 ><y <x2,values> <x3,value3> > <x4, value 4> <x5, value5>
>
     abxcd xyxc

> <x6, value6>
> <x7,value7>
>
> I need to process the data in between < > only and neglect other
characters

>
> How do i write a pig script like below for loading the data in above file
> log =LOAD 'input'  USING PigStorage(' ')
>
> What would be the delimiter here that needs to be used for PigStorage and
> how should i specify variable names. What else I need to take care
>
>
> Thanks
>

Re: PIG script - PIGStorage

Posted by Jonathan Coveney <jc...@gmail.com>.
The default loader can't handle this. You  would need a custom InputFormat,
which isn't too bad.


2012/12/9 L N <nk...@gmail.com>

> Hi,
>
>
>
> > I have an unstructured file format. Assume below is the data in a file
> >
> > <x1,value1 ><y <x2,values> <x3,value3> > <x4, value 4> <x5, value5>
> >
>      abxcd xyxc
>
> > <x6, value6>
> > <x7,value7>
> >
> > I need to process the data in between < > only and neglect other
> characters
>
> >
> > How do i write a pig script like below for loading the data in above file
> > log =LOAD 'input'  USING PigStorage(' ')
> >
> > What would be the delimiter here that needs to be used for PigStorage and
> > how should i specify variable names. What else I need to take care
> >
> >
> > Thanks
> >
>