You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "George Mavromatis (JIRA)" <ji...@apache.org> on 2009/06/05 07:09:07 UTC

[jira] Created: (PIG-836) Allow setting of end-of-record delimiter in PigStorage

Allow setting of end-of-record delimiter in PigStorage
------------------------------------------------------

                 Key: PIG-836
                 URL: https://issues.apache.org/jira/browse/PIG-836
             Project: Pig
          Issue Type: Improvement
          Components: impl
            Reporter: George Mavromatis
             Fix For: 0.2.0


PigStorage allows overriding the default field delimiter ('\t'), but does not allow overriding the record delimiter ('\n').

It is a valid use case that fields contain new lines, e.g. because they are contents of a document/web page. It is possible for the user to create a custom load/store UDF to achieve that, but that is extra work on the user, many users will have to do it , and that udf would be the exact code duplicate of the PigStorage except for the delimiter.

Thus, PigStorage() should allow to configure both field and record separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-836) Allow setting of end-of-record delimiter in PigStorage

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates resolved PIG-836.
----------------------------

    Resolution: Won't Fix

PigStorage now depends on TextInputFormat to parse lines.  It does not allow the user to specify the end of line indicator.  If it does at some point in the future then Pig can make use of that.  We are not going to rewrite TextInputFormat for ourselves just to get this feature.

> Allow setting of end-of-record delimiter in PigStorage
> ------------------------------------------------------
>
>                 Key: PIG-836
>                 URL: https://issues.apache.org/jira/browse/PIG-836
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: George Mavromatis
>            Assignee: Benjamin Reed
>
> PigStorage allows overriding the default field delimiter ('\t'), but does not allow overriding the record delimiter ('\n').
> It is a valid use case that fields contain new lines, e.g. because they are contents of a document/web page. It is possible for the user to create a custom load/store UDF to achieve that, but that is extra work on the user, many users will have to do it , and that udf would be the exact code duplicate of the PigStorage except for the delimiter.
> Thus, PigStorage() should allow to configure both field and record separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-836) Allow setting of end-of-record delimiter in PigStorage

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Reed reassigned PIG-836:
---------------------------------

    Assignee: Benjamin Reed

> Allow setting of end-of-record delimiter in PigStorage
> ------------------------------------------------------
>
>                 Key: PIG-836
>                 URL: https://issues.apache.org/jira/browse/PIG-836
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: George Mavromatis
>            Assignee: Benjamin Reed
>             Fix For: 0.2.0
>
>
> PigStorage allows overriding the default field delimiter ('\t'), but does not allow overriding the record delimiter ('\n').
> It is a valid use case that fields contain new lines, e.g. because they are contents of a document/web page. It is possible for the user to create a custom load/store UDF to achieve that, but that is extra work on the user, many users will have to do it , and that udf would be the exact code duplicate of the PigStorage except for the delimiter.
> Thus, PigStorage() should allow to configure both field and record separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.