You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Nathan Barry (JIRA)" <ji...@apache.org> on 2015/09/29 22:19:04 UTC

[jira] [Commented] (CRUNCH-564) Add support for using escape character same as open/close quote character

    [ https://issues.apache.org/jira/browse/CRUNCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935788#comment-14935788 ] 

Nathan Barry commented on CRUNCH-564:
-------------------------------------

I'm guessing you are wanting to escape double quotes by using 2 double quotes in a row? 

such as:
{code}
"this line","has "" 2 double quotes in it"
"this line","has none"
"this line","has roast beef"
{code}

If so, I believe the CSVLineReader will handle that case automatically, not through setting double quote as the escape character.  What happens if you set the escape character to backslash?  Does the CSV file parse properly?

> Add support for using escape character same as open/close quote character
> -------------------------------------------------------------------------
>
>                 Key: CRUNCH-564
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-564
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Muhammad
>            Assignee: Josh Wills
>            Priority: Trivial
>              Labels: csv, csvparser
>
> As a user I would like to use CSVInputFormat to handle the CSV files following this RFC http://www.ietf.org/rfc/rfc4180.txt.
> Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape their CSVs. The method escapes the CSV following the RFC4180. 
> https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
> The CSVLineReader throws exception in such a case. We can enhance the code to support the CSVs that use escape same as the quote characters.
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVLineReader.java#L152
> I would appreciate a comment, if someone has knowingly rejected the idea due to some technical limitation or a problem with allowing escape and quote as same characters. By the way Apache HAWQ seem to get around this issue somehow and reads such CSVs alright.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)