You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Craig Hamilton <cr...@spintopgames.com> on 2009/09/19 01:35:00 UTC

parse custom delimited file

Hi,

I have a file which uses non-default delimiters.

Here is an example:

type::click;;date::Sun Apr 26 23:57:20 CDT 2009;;

It uses '::' to delimit between the field name, and the value, and
';;' to delete between the key-value pairs.

Does anyone have a suggestion on how I could pull this into pig?

thanks,
craig

Re: parse custom delimited file

Posted by Jeff Zhang <zj...@gmail.com>.
Or you can still use PigStorage(), but write an UDF to extract values.

e.g.

A = LOAD '/data/' USING PigStorage(';;') AS (type,date);
B = FOREACH A GENERATE extractValue(type),extractValue(date)



On Fri, Sep 18, 2009 at 6:40 PM, Jeff Zhang <zj...@gmail.com> wrote:

> I think you have to write a new loader, you can see PigStorage for
> reference.
>
> Maybe you can extends PigStorage, and override the getNext() method.
> extract the value of each filed in this method
>
>
>
>
> On Fri, Sep 18, 2009 at 4:35 PM, Craig Hamilton <cr...@spintopgames.com>wrote:
>
>> Hi,
>>
>> I have a file which uses non-default delimiters.
>>
>> Here is an example:
>>
>> type::click;;date::Sun Apr 26 23:57:20 CDT 2009;;
>>
>> It uses '::' to delimit between the field name, and the value, and
>> ';;' to delete between the key-value pairs.
>>
>> Does anyone have a suggestion on how I could pull this into pig?
>>
>> thanks,
>> craig
>>
>
>

Re: parse custom delimited file

Posted by Jeff Zhang <zj...@gmail.com>.
I think you have to write a new loader, you can see PigStorage for
reference.

Maybe you can extends PigStorage, and override the getNext() method. extract
the value of each filed in this method



On Fri, Sep 18, 2009 at 4:35 PM, Craig Hamilton <cr...@spintopgames.com>wrote:

> Hi,
>
> I have a file which uses non-default delimiters.
>
> Here is an example:
>
> type::click;;date::Sun Apr 26 23:57:20 CDT 2009;;
>
> It uses '::' to delimit between the field name, and the value, and
> ';;' to delete between the key-value pairs.
>
> Does anyone have a suggestion on how I could pull this into pig?
>
> thanks,
> craig
>