You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Gheorghe Muresan <gh...@gmail.com> on 2011/10/18 07:17:18 UTC
Re: Optionally Enclosed By in PIG
If some columns may contain the separator, you can escape their
content before writing them into the table, and unescape them after
you split the row, before you use the content.
You can use URL escape characters (e.g.
http://www.werockyourweb.com/url-escape-characters) or something more
reader-friendly (e.g. "|" -> "<pipe>").
Cheers,
Gheorghe
On Mon, Oct 17, 2011 at 9:37 PM, kiranprasad
<ki...@imimobile.com> wrote:
> Hi
>
> How can I ignore the seperator character in middle of a column value.
>
> eg : Seperator char is ‘|’.
>
> The Record values are | seperated
>
> xyz|1234|98798|”xyz|abc”|
>
>
> Regards
> Kiran.G
Re: Optionally Enclosed By in PIG
Posted by Thejas Nair <th...@hortonworks.com>.
The default load function of pig (PigStorage) does not support escaping
of the delimiter. If you hvae any characters that will not appear in
your data, you can use that as the delim (control-chars for example, i
believe they don't appear in utf8 strings).
Otherwise, you can extend PigStorage class in pig to create a new load
func that supports escaping (and contribute it to piggybank if you like).
Thanks,
Thejas
On 10/18/11 12:15 AM, kiranprasad wrote:
> Can it be done using PIG Latin Script?
>
> Regards
> Kiran
>
> -----Original Message----- From: Gheorghe Muresan
> Sent: Tuesday, October 18, 2011 10:47 AM
> To: user@pig.apache.org
> Subject: Re: Optionally Enclosed By in PIG
>
> If some columns may contain the separator, you can escape their
> content before writing them into the table, and unescape them after
> you split the row, before you use the content.
> You can use URL escape characters (e.g.
> http://www.werockyourweb.com/url-escape-characters) or something more
> reader-friendly (e.g. "|" -> "<pipe>").
>
> Cheers,
> Gheorghe
>
> On Mon, Oct 17, 2011 at 9:37 PM, kiranprasad
> <ki...@imimobile.com> wrote:
>> Hi
>>
>> How can I ignore the seperator character in middle of a column value.
>>
>> eg : Seperator char is ‘|’.
>>
>> The Record values are | seperated
>>
>> xyz|1234|98798|”xyz|abc”|
>>
>>
>> Regards
>> Kiran.G
>
>
Re: Optionally Enclosed By in PIG
Posted by kiranprasad <ki...@imimobile.com>.
Can it be done using PIG Latin Script?
Regards
Kiran
-----Original Message-----
From: Gheorghe Muresan
Sent: Tuesday, October 18, 2011 10:47 AM
To: user@pig.apache.org
Subject: Re: Optionally Enclosed By in PIG
If some columns may contain the separator, you can escape their
content before writing them into the table, and unescape them after
you split the row, before you use the content.
You can use URL escape characters (e.g.
http://www.werockyourweb.com/url-escape-characters) or something more
reader-friendly (e.g. "|" -> "<pipe>").
Cheers,
Gheorghe
On Mon, Oct 17, 2011 at 9:37 PM, kiranprasad
<ki...@imimobile.com> wrote:
> Hi
>
> How can I ignore the seperator character in middle of a column value.
>
> eg : Seperator char is ‘|’.
>
> The Record values are | seperated
>
> xyz|1234|98798|”xyz|abc”|
>
>
> Regards
> Kiran.G