You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Andreas Paepcke (JIRA)" <ji...@apache.org> on 2011/03/21 00:02:05 UTC

[jira] [Created] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
-------------------------------------------------------------------------------

                 Key: PIG-1924
                 URL: https://issues.apache.org/jira/browse/PIG-1924
             Project: Pig
          Issue Type: New Feature
          Components: tools
    Affects Versions: 0.8.0
            Reporter: Andreas Paepcke


CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.

The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009750#comment-13009750 ] 

Alan Gates commented on PIG-1924:
---------------------------------

Wait, I missed that you did not check the grant box on the files you uploaded.  Without that I can't check it in because you haven't granted Apache rights to the code.  If you would like me to check it in and are ok with Apache having the rights to the code, please edit your attachments and check the "grant rights to apache" box.

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, PIG-1924.patch, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1924:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0
           Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Andreas for contributing.

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>            Assignee: Andreas Paepcke
>             Fix For: 0.9.0
>
>         Attachments: CSVExcelStorage.java, CSVExcelStorage.java, PIG-1924.patch, PIG-1924.patch, TestCSVExcelStorage.java, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Andreas Paepcke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009833#comment-13009833 ] 

Andreas Paepcke commented on PIG-1924:
--------------------------------------

Resubmitted the two source files, plus Alan's patch file. All unchanged,
but with license assignment box checked.

Andreas

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>            Assignee: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, CSVExcelStorage.java, PIG-1924.patch, PIG-1924.patch, TestCSVExcelStorage.java, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1924:
----------------------------

    Attachment: PIG-1924.patch

Rather than directly attach the source code, you should generate a patch.  I've taken your code, and built and attached a patch.  For full details see http://wiki.apache.org/pig/HowToContribute

Your changes look good and the tests pass.  I'll check it in shortly.

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, PIG-1924.patch, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Andreas Paepcke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Paepcke updated PIG-1924:
---------------------------------

    Status: Patch Available  (was: Open)

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Andreas Paepcke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Paepcke updated PIG-1924:
---------------------------------

    Release Note: This module subsumes the current CSVLoader(). However, its syntax for escaping embedded double quotes is to prepend a second double quote. This syntax is the one honored by Excel 2007. In addition, this module's default field delimiter is a comma. In part, this decision is based on Excel behaving inconsistently with newlines embedded in fields when tab is used as the delimiter. That delimiter default differs from the existing CSVLoader(), which defaults to tab for delimiting fields.  (was: This module subsumes the current CSVLoader(). However, its syntax for escaping embedded double quotes is to prepend a second double quote. This syntax is the one honored by Excel 2007. In addition, this module's default field delimiter is a comma. In part, this decision is based on Excel behaving inconsistently with newlines embedded in fields when tab is used as the delimiter. )

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Andreas Paepcke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Paepcke updated PIG-1924:
---------------------------------

    Attachment: CSVExcelStorage.java
                TestCSVExcelStorage.java

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>            Assignee: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, CSVExcelStorage.java, PIG-1924.patch, TestCSVExcelStorage.java, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Andreas Paepcke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Paepcke updated PIG-1924:
---------------------------------

    Attachment: PIG-1924.patch

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>            Assignee: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, CSVExcelStorage.java, PIG-1924.patch, PIG-1924.patch, TestCSVExcelStorage.java, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Andreas Paepcke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Paepcke updated PIG-1924:
---------------------------------

    Attachment: TestCSVExcelStorage.java
                CSVExcelStorage.java

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-1924:
-------------------------------

    Assignee: Andreas Paepcke

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>            Assignee: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, PIG-1924.patch, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within fields, escaped double quotes, and double quoting of fields with embedded field delimiters. Newline handling is optional, and controlled by a parameter. The module also offers an option to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira