You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "George Mavromatis (JIRA)" <ji...@apache.org> on 2009/06/05 07:09:07 UTC
[jira] Created: (PIG-836) Allow setting of end-of-record delimiter
in PigStorage
Allow setting of end-of-record delimiter in PigStorage
------------------------------------------------------
Key: PIG-836
URL: https://issues.apache.org/jira/browse/PIG-836
Project: Pig
Issue Type: Improvement
Components: impl
Reporter: George Mavromatis
Fix For: 0.2.0
PigStorage allows overriding the default field delimiter ('\t'), but does not allow overriding the record delimiter ('\n').
It is a valid use case that fields contain new lines, e.g. because they are contents of a document/web page. It is possible for the user to create a custom load/store UDF to achieve that, but that is extra work on the user, many users will have to do it , and that udf would be the exact code duplicate of the PigStorage except for the delimiter.
Thus, PigStorage() should allow to configure both field and record separators.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-836) Allow setting of end-of-record delimiter
in PigStorage
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates resolved PIG-836.
----------------------------
Resolution: Won't Fix
PigStorage now depends on TextInputFormat to parse lines. It does not allow the user to specify the end of line indicator. If it does at some point in the future then Pig can make use of that. We are not going to rewrite TextInputFormat for ourselves just to get this feature.
> Allow setting of end-of-record delimiter in PigStorage
> ------------------------------------------------------
>
> Key: PIG-836
> URL: https://issues.apache.org/jira/browse/PIG-836
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: George Mavromatis
> Assignee: Benjamin Reed
>
> PigStorage allows overriding the default field delimiter ('\t'), but does not allow overriding the record delimiter ('\n').
> It is a valid use case that fields contain new lines, e.g. because they are contents of a document/web page. It is possible for the user to create a custom load/store UDF to achieve that, but that is extra work on the user, many users will have to do it , and that udf would be the exact code duplicate of the PigStorage except for the delimiter.
> Thus, PigStorage() should allow to configure both field and record separators.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-836) Allow setting of end-of-record delimiter
in PigStorage
Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Reed reassigned PIG-836:
---------------------------------
Assignee: Benjamin Reed
> Allow setting of end-of-record delimiter in PigStorage
> ------------------------------------------------------
>
> Key: PIG-836
> URL: https://issues.apache.org/jira/browse/PIG-836
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: George Mavromatis
> Assignee: Benjamin Reed
> Fix For: 0.2.0
>
>
> PigStorage allows overriding the default field delimiter ('\t'), but does not allow overriding the record delimiter ('\n').
> It is a valid use case that fields contain new lines, e.g. because they are contents of a document/web page. It is possible for the user to create a custom load/store UDF to achieve that, but that is extra work on the user, many users will have to do it , and that udf would be the exact code duplicate of the PigStorage except for the delimiter.
> Thus, PigStorage() should allow to configure both field and record separators.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.