You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by LosD <gi...@git.apache.org> on 2017/05/26 12:30:37 UTC

[GitHub] metamodel pull request #145: METAMODEL-1141-Parse-RFC4180-compliant-CSV

GitHub user LosD opened a pull request:

    https://github.com/apache/metamodel/pull/145

    METAMODEL-1141-Parse-RFC4180-compliant-CSV

    This makes it possible to parse RFC4180-compliant CSV, where the escape character is a repeated double quote.
    
    OpenCSVs RFC-4180 parser doesn't enforce other RFC-4180 rules (quote character must be double quote, separator must be comma and EOL must be CRLF), so the check to use the RFC-4180 parser is simply if the escape is the same as the quote character.
    
    This also updates OpenCSV to newest version, and fix a problem in a testfile uncovered by that update.
    
    Fixes METAMODEL-1141

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/LosD/metamodel METAMODEL-1141-RFC4180-CsvDataContext

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metamodel/pull/145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #145
    
----
commit 8bcde03ed6358f12d9e268f981c2ef850acae5e6
Author: Dennis Du Krøger <lo...@apache.org>
Date:   2017-05-24T13:46:22Z

    Create test for double quote, upgrade to OpenCSV 3.9
    
    This uncovers the problems with CSVs using double quotes for
    escaping, and upgrades to OpenCSV 3.9 in preparation of fixing that.

commit feaadd22186e1b29693a48225e7aa2fff36a1f0d
Author: Dennis Du Krøger <lo...@apache.org>
Date:   2017-05-26T11:03:52Z

    Fix bad escape in CSV file used for multiline tests

commit edea22b0f5ca8dff7b6334c9e7a11f58c600aefb
Author: Dennis Du Krøger <lo...@apache.org>
Date:   2017-05-26T11:34:20Z

    Use RFC4180Parser when escape is same as quote.
    
    Also adds an extra test for weird quotes used as escape as well, and
    extends test to make sure double double double-quotes does not break.

commit dc9134f969255c40456aacb98d05f95d669012e3
Author: Dennis Du Krøger <lo...@apache.org>
Date:   2017-05-26T12:29:48Z

    Adds license header and EOL at EOF in CSVs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #145: METAMODEL-1141-Parse-RFC4180-compliant-CSV

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/145#discussion_r118757879
  
    --- Diff: csv/src/test/resources/tickets.csv ---
    @@ -10,7 +10,7 @@ One way of improving this could be through caching. Another way could be through
     3,DataCleaner 1.5 Release,202,Pattern Finder improvements,DataCleaner-core,None,enhancement,darrenH,assigned,2008-08-19T04:27:12Z+0200,2008-09-16T09:21:56Z+0200,"__Pattern Finder suggestions__:
      * have an option to ignore repeating spaces (so {{{\"aaa aaaaa\" and \"aaa         aaaaa\"}}} are counted as one pattern. 
      * have an option to ignore case, and a different option to preserve case.
    - * have an option to treat all 'special' characters as one pattern (so \"aaa*\", \"aaa/\", \"aaa\\" etc' are counted as one pattern, maybe denoted \"aaaS\").
    + * have an option to treat all 'special' characters as one pattern (so \"aaa*\", \"aaa/\", \"aaa\" etc' are counted as one pattern, maybe denoted \"aaaS\").
    --- End diff --
    
    Good catch. This was actually exported directly from an old bugtracker called "trac". So maybe it was a bug in trac that caused this issue, but nevertheless I think the fix is good :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #145: METAMODEL-1141-Parse-RFC4180-compliant-CSV

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metamodel/pull/145


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #145: METAMODEL-1141-Parse-RFC4180-compliant-CSV

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/145
  
    Looks good to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #145: METAMODEL-1141-Parse-RFC4180-compliant-CSV

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/145#discussion_r118757836
  
    --- Diff: csv/src/test/resources/tickets.csv ---
    @@ -10,7 +10,7 @@ One way of improving this could be through caching. Another way could be through
     3,DataCleaner 1.5 Release,202,Pattern Finder improvements,DataCleaner-core,None,enhancement,darrenH,assigned,2008-08-19T04:27:12Z+0200,2008-09-16T09:21:56Z+0200,"__Pattern Finder suggestions__:
      * have an option to ignore repeating spaces (so {{{\"aaa aaaaa\" and \"aaa         aaaaa\"}}} are counted as one pattern. 
      * have an option to ignore case, and a different option to preserve case.
    - * have an option to treat all 'special' characters as one pattern (so \"aaa*\", \"aaa/\", \"aaa\\" etc' are counted as one pattern, maybe denoted \"aaaS\").
    + * have an option to treat all 'special' characters as one pattern (so \"aaa*\", \"aaa/\", \"aaa\" etc' are counted as one pattern, maybe denoted \"aaaS\").
    --- End diff --
    
    Good catch. This was actually exported directly from an old bugtracker called "trac". So maybe it was a bug in trac that caused this issue, but nevertheless I think the fix is good :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #145: METAMODEL-1141-Parse-RFC4180-compliant-CSV

Posted by ClaudiaPHI <gi...@git.apache.org>.
Github user ClaudiaPHI commented on the issue:

    https://github.com/apache/metamodel/pull/145
  
    LGTM! Tested and it works fine !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---