You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tammymendt <gi...@git.apache.org> on 2015/08/26 16:45:10 UTC

[GitHub] flink pull request: [FLINK-2567]

GitHub user tammymendt opened a pull request:

    https://github.com/apache/flink/pull/1059

    [FLINK-2567] 

    Allow quoted strings in CSV fields to contain quotation character inside of the field, as long as its escaped Ex: 'Hi my name is \'Flink\' '. If a quotation character is found, the previous character is checked to see if it is the escape character. If yes, the string continues to be searched for the closing quotation. Test classes were updated to include strings with escaped quotation characters.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tammymendt/flink FLINK-2567

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1059.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1059
    
----
commit 696b3b9f66d37cffa467a41e20f4ce3e145d2cc2
Author: tammymendt <ta...@gmail.com>
Date:   2015-08-25T14:52:04Z

    [FLINK-2567] Allow quoted strings in CSV fields to contain quotation character inside of the field, as long as its escaped Ex: 'Hi my name is \'Flink\''

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by tammymendt <gi...@git.apache.org>.
Github user tammymendt closed the pull request at:

    https://github.com/apache/flink/pull/1059


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1059


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-138484565
  
    Oh sorry, that was a reference from Stephan's repository. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1059#discussion_r38077190
  
    --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java ---
    @@ -984,7 +984,7 @@ public void testPojoTypeWithInvalidFieldMapping() throws Exception {
     	@Test
     	public void testQuotedStringParsingWithIncludeFields() throws Exception {
     		final String fileContent = "\"20:41:52-1-3-2015\"|\"Re: Taskmanager memory error in Eclipse\"|" +
    -				"\"Blahblah <bl...@blahblah.org>\"|\"bla\"|\"blubb\"";
    --- End diff --
    
    Let's not change old checks, but rather add new checks. You never know if you don't loose coverage of a critical case otherwise...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by tammymendt <gi...@git.apache.org>.
GitHub user tammymendt reopened a pull request:

    https://github.com/apache/flink/pull/1059

    [FLINK-2567] 

    Allow quoted strings in CSV fields to contain quotation character inside of the field, as long as its escaped Ex: 'Hi my name is \'Flink\' '. If a quotation character is found, the previous character is checked to see if it is the escape character. If yes, the string continues to be searched for the closing quotation. Test classes were updated to include strings with escaped quotation characters.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tammymendt/flink FLINK-2567

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1059.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1059
    
----
commit 696b3b9f66d37cffa467a41e20f4ce3e145d2cc2
Author: tammymendt <ta...@gmail.com>
Date:   2015-08-25T14:52:04Z

    [FLINK-2567] Allow quoted strings in CSV fields to contain quotation character inside of the field, as long as its escaped Ex: 'Hi my name is \'Flink\''

commit eaf8d6e5427630585940c2ec52ec9fc38a70a209
Author: tammymendt <ta...@gmail.com>
Date:   2015-08-27T09:50:44Z

    Replaced inline comments by using a final static variable for the escape char.
    Reverted changes to test in CsvInputFormatTest and included a new test.
    Refactored the break command from while statements by changing the while condition.
    Updated the documentation in programming_guide.md.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-138487601
  
    @tammymendt Sorry for the confusion. Could you please re-open the pull request? I thought it had been merged but it was only merged into Stephan's repository...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-138493194
  
    Now it's merged. This time for real :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-135365195
  
    This PR should also update documentation of the CSV parser (see "Data Sources" section in programming_guide.md"). 
    
    Thanks, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by chiwanpark <gi...@git.apache.org>.
Github user chiwanpark commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-138483365
  
    Is this merged? I cannot find this commit in ASF Git repository.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-135367172
  
    Agree with Fabian. Other then the inline comments and the documentation, this looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1059#discussion_r38076964
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java ---
    @@ -445,7 +445,12 @@ protected int skipFields(byte[] bytes, int startPos, int limit, byte[] delim) {
     			// quoted string parsing enabled and field is quoted
     			// search for ending quote character
     			i++;
    -			while(i < limit && bytes[i] != quoteCharacter) {
    +
    --- End diff --
    
    Would this be the same as `while (i < limit && (bytes[i] != quoteCharacter || bytes[i-1] == 92))`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-138305084
  
    Looks good, merging this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1059#discussion_r38077040
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java ---
    @@ -445,7 +445,12 @@ protected int skipFields(byte[] bytes, int startPos, int limit, byte[] delim) {
     			// quoted string parsing enabled and field is quoted
     			// search for ending quote character
     			i++;
    -			while(i < limit && bytes[i] != quoteCharacter) {
    +
    +			while(i < limit){
    +				//quote character was escaped
    +				if (bytes[i]==quoteCharacter && bytes[i-1]!=92){
    --- End diff --
    
    Can you define a constant `private static final byte BACKSLASH = 92` ? The compiler inlines those, and the code is more self explaining then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-138484531
  
    Pending merge... it is in my travis branch, tests ran over night...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2567]

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1059#issuecomment-138482074
  
    Thanks for your contribution @tammymendt. You pull request has been merged but the auto-close didn't work :) Could you close the pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---