You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by James Kebinger <jk...@gmail.com> on 2009/12/09 00:12:22 UTC

CSV format loader

Hi all, I realized a week or two ago that PigStorage(',') wasn't adequate to
parse files that had commas embedded in properly CSV quoted fields.

I went ahead and built a CSV parser for pig 0.3 that deals with embedded
quotes (but not embedded newlines). Its up on github:
http://github.com/jkebinger/pig-user-defined-functions/tree/master/src/com/kebinger/pig/storage/

What I want to know is - is there interest in having this in the PiggyBank?
I'm happy to upgrade it to be compatible w/ the current pig version and
write some tests if there's interest.

thanks

-james

Re: CSV format loader

Posted by Matteo Nasi <ma...@gmail.com>.
yes  please, very useful ;-)


On Wed, Dec 9, 2009 at 2:03 AM, Amr Awadallah <aa...@cloudera.com> wrote:

> +1
>
>
> On 12/8/2009 5:02 PM, Alan Gates wrote:
>
>> Definitely.
>>
>> Alan.
>>
>> On Dec 8, 2009, at 3:12 PM, James Kebinger wrote:
>>
>> Hi all, I realized a week or two ago that PigStorage(',') wasn't adequate
>>> to
>>> parse files that had commas embedded in properly CSV quoted fields.
>>>
>>> I went ahead and built a CSV parser for pig 0.3 that deals with embedded
>>> quotes (but not embedded newlines). Its up on github:
>>>
>>> http://github.com/jkebinger/pig-user-defined-functions/tree/master/src/com/kebinger/pig/storage/
>>>
>>> What I want to know is - is there interest in having this in the
>>> PiggyBank?
>>> I'm happy to upgrade it to be compatible w/ the current pig version and
>>> write some tests if there's interest.
>>>
>>> thanks
>>>
>>> -james
>>>
>>
>>

Re: CSV format loader

Posted by Amr Awadallah <aa...@cloudera.com>.
+1

On 12/8/2009 5:02 PM, Alan Gates wrote:
> Definitely.
>
> Alan.
>
> On Dec 8, 2009, at 3:12 PM, James Kebinger wrote:
>
>> Hi all, I realized a week or two ago that PigStorage(',') wasn't 
>> adequate to
>> parse files that had commas embedded in properly CSV quoted fields.
>>
>> I went ahead and built a CSV parser for pig 0.3 that deals with embedded
>> quotes (but not embedded newlines). Its up on github:
>> http://github.com/jkebinger/pig-user-defined-functions/tree/master/src/com/kebinger/pig/storage/ 
>>
>>
>> What I want to know is - is there interest in having this in the 
>> PiggyBank?
>> I'm happy to upgrade it to be compatible w/ the current pig version and
>> write some tests if there's interest.
>>
>> thanks
>>
>> -james
>

Re: CSV format loader

Posted by Alan Gates <ga...@yahoo-inc.com>.
Definitely.

Alan.

On Dec 8, 2009, at 3:12 PM, James Kebinger wrote:

> Hi all, I realized a week or two ago that PigStorage(',') wasn't  
> adequate to
> parse files that had commas embedded in properly CSV quoted fields.
>
> I went ahead and built a CSV parser for pig 0.3 that deals with  
> embedded
> quotes (but not embedded newlines). Its up on github:
> http://github.com/jkebinger/pig-user-defined-functions/tree/master/src/com/kebinger/pig/storage/
>
> What I want to know is - is there interest in having this in the  
> PiggyBank?
> I'm happy to upgrade it to be compatible w/ the current pig version  
> and
> write some tests if there's interest.
>
> thanks
>
> -james