You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/12/27 03:17:11 UTC

[GitHub] [commons-csv] DamjanJovanovic opened a new pull request, #295: Add support for trailing text after the closing quote, for Excel compatibility

DamjanJovanovic opened a new pull request, #295:
URL: https://github.com/apache/commons-csv/pull/295

   As per issue https://issues.apache.org/jira/browse/CSV-141 and based on what we did in Apache OpenOffice https://bz.apache.org/ooo/show_bug.cgi?id=126805
   
   This adds a setting, allowTrailingText (for lack of a better name) that allows CSV fields to have trailing text after the closing quote, up to the next separator, which can contain anything except the separator character, and this extra text is appended as-is to the field contents (any further quoting is ignored). This is exactly how Excel behaves.
   
   As this is a non-standard setting with surprising behaviour, I've made it off by default. Only CSVFormat.EXCEL has it on by default.
   
   This doesn't fully fix CSV-141 yet as that has line ending issues too, but I'd like to investigate how Excel handles that first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory commented on a diff in pull request #295: Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by GitBox <gi...@apache.org>.
garydgregory commented on code in PR #295:
URL: https://github.com/apache/commons-csv/pull/295#discussion_r1062465796


##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -846,6 +881,8 @@ public CSVFormat getFormat() {
     public static final CSVFormat EXCEL = DEFAULT.builder()

Review Comment:
   I am wondering: Instead of this change, I wonder if we should have a new CSVFormat called `EXCEL_LENIENT`. Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] kinow commented on a diff in pull request #295: Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by "kinow (via GitHub)" <gi...@apache.org>.
kinow commented on code in PR #295:
URL: https://github.com/apache/commons-csv/pull/295#discussion_r1083307259


##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -846,6 +881,8 @@ public CSVFormat getFormat() {
     public static final CSVFormat EXCEL = DEFAULT.builder()

Review Comment:
   >Maybe, but all spreadsheets are "lenient".
    
   Good point. I think @garydgregory suggested the EXCEL_LENIENT format to avoid changing the behavior for users of the EXCEL format. But if we are changing the format to what's supported in Excel, I guess that should be fine.



##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -846,6 +881,8 @@ public CSVFormat getFormat() {
     public static final CSVFormat EXCEL = DEFAULT.builder()

Review Comment:
   >Maybe, but all spreadsheets are "lenient".
    
   Good point. I think @garydgregory suggested the EXCEL_LENIENT format to avoid changing the behavior for users of the EXCEL format. But if we are changing the format to the way it works in Excel, I guess that should be fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory commented on a diff in pull request #295: Add support for trailing text after the closing quote, for Excel compatibility

Posted by GitBox <gi...@apache.org>.
garydgregory commented on code in PR #295:
URL: https://github.com/apache/commons-csv/pull/295#discussion_r1061513167


##########
src/test/java/org/apache/commons/csv/LexerTest.java:
##########
@@ -431,4 +431,14 @@ public void testTrimTrailingSpacesZeroLength() throws Exception {
         lexer.trimTrailingSpaces(buffer);
         assertThat(lexer.nextToken(new Token()), matches(EOF, ""));
     }
+
+    @Test
+    public void testTrailingTextAfterQuote() throws Exception {

Review Comment:
   Add a test asserting the expected behavior when if the new toggle is `false`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory commented on a diff in pull request #295: Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by GitBox <gi...@apache.org>.
garydgregory commented on code in PR #295:
URL: https://github.com/apache/commons-csv/pull/295#discussion_r1082447049


##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -846,6 +881,8 @@ public CSVFormat getFormat() {
     public static final CSVFormat EXCEL = DEFAULT.builder()

Review Comment:
   Hello? Anyone?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory commented on pull request #295: Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by "garydgregory (via GitHub)" <gi...@apache.org>.
garydgregory commented on PR #295:
URL: https://github.com/apache/commons-csv/pull/295#issuecomment-1398884753

   I will find time to review over the weekend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory commented on a diff in pull request #295: Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by GitBox <gi...@apache.org>.
garydgregory commented on code in PR #295:
URL: https://github.com/apache/commons-csv/pull/295#discussion_r1061861827


##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -288,6 +294,18 @@ public Builder setAllowDuplicateHeaderNames(final boolean allowDuplicateHeaderNa
             return this;
         }
 
+        /**
+         * Sets whether the last field on the last line, if quoted, can have no closing quote when the file ends, {@code true} if this is ok,
+         * {@code false} if {@link IOException} should be thrown.
+         *
+         * @param allowEOFWithoutClosingQuote whether to allow the last field on the last line to have a missing closing quote when the file ends,
+         *                                    {@code true} if so, or {@code false} to cause an {@link IOException} to be thrown.
+         */

Review Comment:
   Add Javadoc since tags to new public and protected elements.



##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -206,8 +206,12 @@ public static Builder create(final CSVFormat csvFormat) {
             return new Builder(csvFormat);
         }
 
+        private boolean allowEOFWithoutClosingQuote;

Review Comment:
   Always use camel case allowEOFWithoutClosingQuote -> allowEofWithoutClosingQuote



##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -1503,6 +1544,15 @@ public boolean getAllowDuplicateHeaderNames() {
         return duplicateHeaderMode == DuplicateHeaderMode.ALLOW_ALL;
     }
 
+    /**
+     * Gets whether the file can end before the last field on the last line, if quoted, has a closing quote.
+     *
+     * @return {@code true} if so, {@code false} to throw an {@link IOException}.
+     */

Review Comment:
   Camel case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] DamjanJovanovic commented on a diff in pull request #295: Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by GitBox <gi...@apache.org>.
DamjanJovanovic commented on code in PR #295:
URL: https://github.com/apache/commons-csv/pull/295#discussion_r1062000582


##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -288,6 +294,18 @@ public Builder setAllowDuplicateHeaderNames(final boolean allowDuplicateHeaderNa
             return this;
         }
 
+        /**
+         * Sets whether the last field on the last line, if quoted, can have no closing quote when the file ends, {@code true} if this is ok,
+         * {@code false} if {@link IOException} should be thrown.
+         *
+         * @param allowEOFWithoutClosingQuote whether to allow the last field on the last line to have a missing closing quote when the file ends,
+         *                                    {@code true} if so, or {@code false} to cause an {@link IOException} to be thrown.
+         */

Review Comment:
   Done, used 1.10.0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] DamjanJovanovic commented on a diff in pull request #295: Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by GitBox <gi...@apache.org>.
DamjanJovanovic commented on code in PR #295:
URL: https://github.com/apache/commons-csv/pull/295#discussion_r1082679746


##########
src/main/java/org/apache/commons/csv/CSVFormat.java:
##########
@@ -846,6 +881,8 @@ public CSVFormat getFormat() {
     public static final CSVFormat EXCEL = DEFAULT.builder()

Review Comment:
   Maybe, but all spreadsheets are "lenient".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] DamjanJovanovic commented on pull request #295: [CSV-141] Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by "DamjanJovanovic (via GitHub)" <gi...@apache.org>.
DamjanJovanovic commented on PR #295:
URL: https://github.com/apache/commons-csv/pull/295#issuecomment-1399690869

   > I applied and then reverted this PR because it does not work. Notice the comment I made 2 weeks ago "A test is missing that an actual CSV file can be parsed and not parsed." This was never done, and sure enough, when I added a simple test based on the CSV file in the Jira description, it failed.
   
   It does work. The test you added is incorrect - Excel parses the file differently. Please see PR 303.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory merged pull request #295: Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by "garydgregory (via GitHub)" <gi...@apache.org>.
garydgregory merged PR #295:
URL: https://github.com/apache/commons-csv/pull/295


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory commented on pull request #295: [CSV-141] Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility

Posted by "garydgregory (via GitHub)" <gi...@apache.org>.
garydgregory commented on PR #295:
URL: https://github.com/apache/commons-csv/pull/295#issuecomment-1399319422

   I applied and then reverted this PR because it does not work. Notice the comment I made 2 weeks ago "A test is missing that an actual CSV file can be parsed and not parsed." This was never done, and sure enough, when I added a simple test based on the CSV file in the Jira description, it failed. Please see `CSVParserTest.testCSV141*` and in particular `testCSV141Excel`. You should:
   - Rebase on git master
   - Comment out of remove the `@Disabled` annotation for `testCSV141Excel` and see how the test fails when too much data is consumed.
   
   If you choose to continue this work in another PR, make sure to add tests similar to `testCSV141Excel` to assert various permutations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] DamjanJovanovic commented on pull request #295: Add support for trailing text after the closing quote, for Excel compatibility

Posted by GitBox <gi...@apache.org>.
DamjanJovanovic commented on PR #295:
URL: https://github.com/apache/commons-csv/pull/295#issuecomment-1371312266

   Thank you, I've now added another patch, which adds a setting that controls whether the last field on the last line, if quoted, has to have a closing quote before the file ends. Also made both positive and negative tests for both patches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org