You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Gheorghe Muresan <gh...@gmail.com> on 2011/10/18 07:17:18 UTC

Re: Optionally Enclosed By in PIG

If some columns may contain the separator, you can escape their
content before writing them into the table, and unescape them after
you split the row, before you use the content.
You can use URL escape characters (e.g.
http://www.werockyourweb.com/url-escape-characters) or something more
reader-friendly (e.g. "|" -> "<pipe>").

Cheers,
Gheorghe

On Mon, Oct 17, 2011 at 9:37 PM, kiranprasad
<ki...@imimobile.com> wrote:
> Hi
>
> How can I ignore the seperator character in middle of a column value.
>
> eg : Seperator char is ‘|’.
>
> The  Record values are | seperated
>
> xyz|1234|98798|”xyz|abc”|
>
>
> Regards
> Kiran.G

Re: Optionally Enclosed By in PIG

Posted by Thejas Nair <th...@hortonworks.com>.
The default load function of pig (PigStorage) does not support escaping 
of the delimiter. If you hvae any characters that will not appear in 
your data, you can use that as the delim (control-chars for example, i 
believe they don't appear in utf8 strings).
Otherwise, you can extend PigStorage class in pig to create a new load 
func that supports escaping (and contribute it to piggybank if you like).

Thanks,
Thejas



On 10/18/11 12:15 AM, kiranprasad wrote:
> Can it be done using PIG Latin Script?
>
> Regards
> Kiran
>
> -----Original Message----- From: Gheorghe Muresan
> Sent: Tuesday, October 18, 2011 10:47 AM
> To: user@pig.apache.org
> Subject: Re: Optionally Enclosed By in PIG
>
> If some columns may contain the separator, you can escape their
> content before writing them into the table, and unescape them after
> you split the row, before you use the content.
> You can use URL escape characters (e.g.
> http://www.werockyourweb.com/url-escape-characters) or something more
> reader-friendly (e.g. "|" -> "<pipe>").
>
> Cheers,
> Gheorghe
>
> On Mon, Oct 17, 2011 at 9:37 PM, kiranprasad
> <ki...@imimobile.com> wrote:
>> Hi
>>
>> How can I ignore the seperator character in middle of a column value.
>>
>> eg : Seperator char is ‘|’.
>>
>> The Record values are | seperated
>>
>> xyz|1234|98798|”xyz|abc”|
>>
>>
>> Regards
>> Kiran.G
>
>


Re: Optionally Enclosed By in PIG

Posted by kiranprasad <ki...@imimobile.com>.
Can it be done using PIG Latin Script?

Regards
Kiran

-----Original Message----- 
From: Gheorghe Muresan
Sent: Tuesday, October 18, 2011 10:47 AM
To: user@pig.apache.org
Subject: Re: Optionally Enclosed By in PIG

If some columns may contain the separator, you can escape their
content before writing them into the table, and unescape them after
you split the row, before you use the content.
You can use URL escape characters (e.g.
http://www.werockyourweb.com/url-escape-characters) or something more
reader-friendly (e.g. "|" -> "<pipe>").

Cheers,
Gheorghe

On Mon, Oct 17, 2011 at 9:37 PM, kiranprasad
<ki...@imimobile.com> wrote:
> Hi
>
> How can I ignore the seperator character in middle of a column value.
>
> eg : Seperator char is ‘|’.
>
> The  Record values are | seperated
>
> xyz|1234|98798|”xyz|abc”|
>
>
> Regards
> Kiran.G