You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by KurtYoung <gi...@git.apache.org> on 2017/02/25 08:42:46 UTC

[GitHub] flink pull request #3417: [FLINK-5907] [java api] Fix trailing empty fields ...

GitHub user KurtYoung opened a pull request:

    https://github.com/apache/flink/pull/3417

    [FLINK-5907] [java api] Fix trailing empty fields in CsvInputFormat

    If there are 3 fields with field delimiter ",", both these two line should be parsed successfully:
    aaa,bbb,
    aaa,bbb


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KurtYoung/flink flink-5907

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3417
    
----
commit 4e93fe1376d623eda5f8f68afb54583070968881
Author: Kurt Young <yk...@gmail.com>
Date:   2017-02-25T08:37:37Z

    [FLINK-5907] [java api] Fix trailing empty fields in CsvInputFormat

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3417: [FLINK-5907] [java api] Fix trailing empty fields in CsvI...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3417
  
    Thanks for the fast update @KurtYoung.
    +1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] Fix trailing empty fields ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/3417


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] Fix trailing empty fields ...

Posted by KurtYoung <gi...@git.apache.org>.
Github user KurtYoung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103225729
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/RowCsvInputFormat.java ---
    @@ -197,6 +197,14 @@ protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int nu
     			if (startPos < 0) {
     				throw new ParseException(String.format("Unexpected parser position for column %1$s of row '%2$s'",
     					field, new String(bytes, offset, numBytes)));
    +			} else if (startPos == limit
    +					&& field != fieldIncluded.length - 1
    +					&& !FieldParser.endsWithDelimiter(bytes, startPos - 1, fieldDelimiter)) {
    +				if (isLenient()) {
    --- End diff --
    
    added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] Fix trailing empty fields ...

Posted by KurtYoung <gi...@git.apache.org>.
Github user KurtYoung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103224327
  
    --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java ---
    @@ -957,6 +964,42 @@ public void testPojoTypeWithPrivateField() throws Exception {
     	}
     
     	@Test
    +	public void testPojoTypeWithTrailingEmptyFields() throws Exception {
    +		File tempFile = File.createTempFile("CsvReaderPojoType", "tmp");
    --- End diff --
    
    changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] [WIP] Fix trailing empty f...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103200986
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/RowCsvInputFormat.java ---
    @@ -197,6 +197,14 @@ protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int nu
     			if (startPos < 0) {
     				throw new ParseException(String.format("Unexpected parser position for column %1$s of row '%2$s'",
     					field, new String(bytes, offset, numBytes)));
    +			} else if (startPos == limit
    +					&& field != fieldIncluded.length - 1
    +					&& !FieldParser.endsWithDelimiter(bytes, startPos - 1, fieldDelimiter)) {
    +				if (isLenient()) {
    --- End diff --
    
    Add a comment: "We are at the end of the record, but not all fields have been read and the end is not a field delimiter indicating an empty last field."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] [WIP] Fix trailing empty f...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103183251
  
    --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java ---
    @@ -957,6 +964,42 @@ public void testPojoTypeWithPrivateField() throws Exception {
     	}
     
     	@Test
    +	public void testPojoTypeWithTrailingEmptyFields() throws Exception {
    +		File tempFile = File.createTempFile("CsvReaderPojoType", "tmp");
    --- End diff --
    
    Use the tmp file utils as in `readStringFieldsWithTrailingDelimiters()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] Fix trailing empty fields ...

Posted by KurtYoung <gi...@git.apache.org>.
Github user KurtYoung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103225144
  
    --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/RowCsvInputFormatTest.java ---
    @@ -311,7 +317,7 @@ public void readMixedQuotedStringFields() throws Exception {
     
     	@Test
     	public void readStringFieldsWithTrailingDelimiters() throws Exception {
    -		String fileContent = "abc|-def|-ghijk\nabc|-|-hhg\n|-|-|-\n";
    +		String fileContent = "abc|-def|-ghijk\nabc|-|-hhg\n|-|-|-\n|-|-\nabc|-def\n";
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] [WIP] Fix trailing empty f...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103203254
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java ---
    @@ -392,12 +395,17 @@ protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int nu
     			else {
     				// skip field
     				startPos = skipFields(bytes, startPos, limit, this.fieldDelim);
    -				if (startPos < 0) {
    +				if (startPos < 0 ||
    --- End diff --
    
    Same here as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3417: [FLINK-5907] [java api] Fix trailing empty fields in CsvI...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3417
  
    Merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] [WIP] Fix trailing empty f...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103203173
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java ---
    @@ -358,24 +358,27 @@ protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int nu
     		for (int field = 0, output = 0; field < fieldIncluded.length; field++) {
     			
     			// check valid start position
    -			if (startPos >= limit) {
    +			if (startPos > limit || (startPos == limit && field != fieldIncluded.length - 1)) {
     				if (lenient) {
     					return false;
     				} else {
     					throw new ParseException("Row too short: " + new String(bytes, offset, numBytes));
     				}
     			}
    -			
    +
     			if (fieldIncluded[field]) {
     				// parse field
     				@SuppressWarnings("unchecked")
     				FieldParser<Object> parser = (FieldParser<Object>) this.fieldParsers[output];
     				Object reuse = holders[output];
     				startPos = parser.resetErrorStateAndParse(bytes, startPos, limit, this.fieldDelim, reuse);
     				holders[output] = parser.getLastResult();
    -				
    +
     				// check parse result
    -				if (startPos < 0) {
    +				if (startPos < 0 ||
    +						(startPos == limit
    --- End diff --
    
    Move this condition into an `else if` branch and give a more detailed error message (row to short).
    Also add a comment that we read the whole records but that there are fields missing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] Fix trailing empty fields ...

Posted by KurtYoung <gi...@git.apache.org>.
Github user KurtYoung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103226814
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java ---
    @@ -392,12 +395,17 @@ protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int nu
     			else {
     				// skip field
     				startPos = skipFields(bytes, startPos, limit, this.fieldDelim);
    -				if (startPos < 0) {
    +				if (startPos < 0 ||
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] [WIP] Fix trailing empty f...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103197368
  
    --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/RowCsvInputFormatTest.java ---
    @@ -311,7 +317,7 @@ public void readMixedQuotedStringFields() throws Exception {
     
     	@Test
     	public void readStringFieldsWithTrailingDelimiters() throws Exception {
    -		String fileContent = "abc|-def|-ghijk\nabc|-|-hhg\n|-|-|-\n";
    +		String fileContent = "abc|-def|-ghijk\nabc|-|-hhg\n|-|-|-\n|-|-\nabc|-def\n";
    --- End diff --
    
    Can you move the checks for correctly handling empty last fields into a dedicated method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3417: [FLINK-5907] [java api] [WIP] Fix trailing empty fields i...

Posted by fpompermaier <gi...@git.apache.org>.
Github user fpompermaier commented on the issue:

    https://github.com/apache/flink/pull/3417
  
    I've tested your PR against my original CSV that was causing the problem and it works now! Thanks for this fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] [WIP] Fix trailing empty f...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103197216
  
    --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java ---
    @@ -400,7 +400,7 @@ public void readMixedQuotedStringFields() {
     	@Test
     	public void readStringFieldsWithTrailingDelimiters() {
     		try {
    -			final String fileContent = "abc|-def|-ghijk\nabc|-|-hhg\n|-|-|-\n";
    +			final String fileContent = "abc|-def|-ghijk\nabc|-|-hhg\n|-|-|-\n|-|-\nabc|-def\n";
    --- End diff --
    
    This method tests whether an additional field delimiter at the end is accepted.
    Can you move the checks for correctly identifying empty last fields into a separate `testTailingEmptyFields` method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] Fix trailing empty fields ...

Posted by KurtYoung <gi...@git.apache.org>.
Github user KurtYoung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103226821
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java ---
    @@ -358,24 +358,27 @@ protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int nu
     		for (int field = 0, output = 0; field < fieldIncluded.length; field++) {
     			
     			// check valid start position
    -			if (startPos >= limit) {
    +			if (startPos > limit || (startPos == limit && field != fieldIncluded.length - 1)) {
     				if (lenient) {
     					return false;
     				} else {
     					throw new ParseException("Row too short: " + new String(bytes, offset, numBytes));
     				}
     			}
    -			
    +
     			if (fieldIncluded[field]) {
     				// parse field
     				@SuppressWarnings("unchecked")
     				FieldParser<Object> parser = (FieldParser<Object>) this.fieldParsers[output];
     				Object reuse = holders[output];
     				startPos = parser.resetErrorStateAndParse(bytes, startPos, limit, this.fieldDelim, reuse);
     				holders[output] = parser.getLastResult();
    -				
    +
     				// check parse result
    -				if (startPos < 0) {
    +				if (startPos < 0 ||
    +						(startPos == limit
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3417: [FLINK-5907] [java api] Fix trailing empty fields ...

Posted by KurtYoung <gi...@git.apache.org>.
Github user KurtYoung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103224286
  
    --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java ---
    @@ -400,7 +400,7 @@ public void readMixedQuotedStringFields() {
     	@Test
     	public void readStringFieldsWithTrailingDelimiters() {
     		try {
    -			final String fileContent = "abc|-def|-ghijk\nabc|-|-hhg\n|-|-|-\n";
    +			final String fileContent = "abc|-def|-ghijk\nabc|-|-hhg\n|-|-|-\n|-|-\nabc|-def\n";
    --- End diff --
    
    sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---