You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2010/08/22 10:36:15 UTC
[jira] Created: (PIG-1555) [piggybank] add CSV Loader
[piggybank] add CSV Loader
--------------------------
Key: PIG-1555
URL: https://issues.apache.org/jira/browse/PIG-1555
Project: Pig
Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
Fix For: 0.8.0
Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1555) [piggybank] add CSV Loader
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901556#action_12901556 ]
Alan Gates commented on PIG-1555:
---------------------------------
+1
If you have a chance sometime I'd be curious to learn the performance characteristics of this versus PigStorage. I'm curious if there is substantial cost to dealing with escaping.
> [piggybank] add CSV Loader
> --------------------------
>
> Key: PIG-1555
> URL: https://issues.apache.org/jira/browse/PIG-1555
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1555) [piggybank] add CSV Loader
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-1555:
-----------------------------------
Status: Patch Available (was: Open)
> [piggybank] add CSV Loader
> --------------------------
>
> Key: PIG-1555
> URL: https://issues.apache.org/jira/browse/PIG-1555
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1555) [piggybank] add CSV Loader
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-1555:
-----------------------------------
Attachment: PIG_1555.patch
This is loosely based on the loader by James Kebinger that he open-sourced at http://github.com/jkebinger/pig-user-defined-functions
I ported to the new API and fixed a few bugs.
Still doesn't support multi-line records, but the basic stuff works, including quoting quotes by more quotes, excel-style.
> [piggybank] add CSV Loader
> --------------------------
>
> Key: PIG-1555
> URL: https://issues.apache.org/jira/browse/PIG-1555
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1555) [piggybank] add CSV Loader
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-1555:
-----------------------------------
Status: Resolved (was: Patch Available)
Release Note:
CSVLoader can be used to load comma-separated value files.
It properly handles commas included inside quoted fields, and quotes escaped by preceding them with another quote character (Excel-style).
CSVLoader only handle single-line entries; quoting a multi-line value will *not* work.
Resolution: Fixed
> [piggybank] add CSV Loader
> --------------------------
>
> Key: PIG-1555
> URL: https://issues.apache.org/jira/browse/PIG-1555
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1555) [piggybank] add CSV Loader
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901697#action_12901697 ]
Dmitriy V. Ryaboy commented on PIG-1555:
----------------------------------------
Alan,
The differences I observe when running on actual csv files are within the margin of error -- sometimes CSVLoader comes out on top. Then again I am reading actual CSVs with quoted commas, so it's possible that the similarity in runtimes is due to the fact that PigStorage sees the commas and allocates extra tuple fields.
-D
> [piggybank] add CSV Loader
> --------------------------
>
> Key: PIG-1555
> URL: https://issues.apache.org/jira/browse/PIG-1555
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG_1555.patch
>
>
> Users often ask for a CSV loader that can handle quoted commas. Let's get 'er done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.