You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2010/08/22 10:36:15 UTC

[jira] Created: (PIG-1555) [piggybank] add CSV Loader

[piggybank] add CSV Loader
--------------------------

                 Key: PIG-1555
                 URL: https://issues.apache.org/jira/browse/PIG-1555
             Project: Pig
          Issue Type: New Feature
            Reporter: Dmitriy V. Ryaboy
            Assignee: Dmitriy V. Ryaboy
            Priority: Minor
             Fix For: 0.8.0


Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1555) [piggybank] add CSV Loader

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901556#action_12901556 ] 

Alan Gates commented on PIG-1555:
---------------------------------

+1

If you have a chance sometime I'd be curious to learn the performance characteristics of this versus PigStorage.  I'm curious if there is substantial cost to dealing with escaping.

> [piggybank] add CSV Loader
> --------------------------
>
>                 Key: PIG-1555
>                 URL: https://issues.apache.org/jira/browse/PIG-1555
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1555) [piggybank] add CSV Loader

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-1555:
-----------------------------------

    Status: Patch Available  (was: Open)

> [piggybank] add CSV Loader
> --------------------------
>
>                 Key: PIG-1555
>                 URL: https://issues.apache.org/jira/browse/PIG-1555
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1555) [piggybank] add CSV Loader

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-1555:
-----------------------------------

    Attachment: PIG_1555.patch

This is loosely based on the loader by James Kebinger that he open-sourced at http://github.com/jkebinger/pig-user-defined-functions 

I ported to the new API and fixed a few bugs.

Still doesn't support multi-line records, but the basic stuff works, including quoting quotes by more quotes, excel-style.

> [piggybank] add CSV Loader
> --------------------------
>
>                 Key: PIG-1555
>                 URL: https://issues.apache.org/jira/browse/PIG-1555
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1555) [piggybank] add CSV Loader

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-1555:
-----------------------------------

          Status: Resolved  (was: Patch Available)
    Release Note: 
CSVLoader can be used to load comma-separated value files.
It properly handles commas included inside quoted fields, and quotes escaped by preceding them with another quote character (Excel-style).
CSVLoader only handle single-line entries; quoting a multi-line value will *not* work.
      Resolution: Fixed

> [piggybank] add CSV Loader
> --------------------------
>
>                 Key: PIG-1555
>                 URL: https://issues.apache.org/jira/browse/PIG-1555
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1555) [piggybank] add CSV Loader

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901697#action_12901697 ] 

Dmitriy V. Ryaboy commented on PIG-1555:
----------------------------------------

Alan,
The differences I observe when running on actual csv files are within the margin of error -- sometimes CSVLoader comes out on top. Then again I am reading actual CSVs with quoted commas, so it's possible that the similarity in runtimes is due to the fact that PigStorage sees the commas and allocates extra tuple fields.

-D

> [piggybank] add CSV Loader
> --------------------------
>
>                 Key: PIG-1555
>                 URL: https://issues.apache.org/jira/browse/PIG-1555
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.