You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/09/25 21:54:04 UTC

[GitHub] [commons-csv] angusdev opened a new pull request, #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

angusdev opened a new pull request, #265:
URL: https://github.com/apache/commons-csv/pull/265

   I tested in psql 14.5 Homebrew in Mac M1.
   
   CSVFormat.POSTGRESQL_CSV - special characters are not escaped.
   CSVFormat.POSTGRESQL_TEXT - values are not quoted.
   
   ```sql
   drop table COMMONS_CSV_PSQL_TEST;
   create table COMMONS_CSV_PSQL_TEST (ID INTEGER, COL1 VARCHAR, COL2 VARCHAR, COL3 VARCHAR, COL4 VARCHAR);
   insert into COMMONS_CSV_PSQL_TEST select 1, 'abc', 'test line 1' || chr(10) || 'test line 2', null, '';
   insert into COMMONS_CSV_PSQL_TEST select 2, 'xyz', '\b:' || chr(8) || ' \n:' || chr(10) || ' \r:' || chr(13), 'a', 'b';
   insert into COMMONS_CSV_PSQL_TEST values (3, 'a', 'b,c,d', '"quoted"', 'e');
   copy COMMONS_CSV_PSQL_TEST TO '/tmp/psql.csv' WITH (FORMAT CSV);
   copy COMMONS_CSV_PSQL_TEST TO '/tmp/psql.tsv';
   ```
   
   ```
   cat /tmp/psql.csv
   1,abc,"test line 1
   test line 2",,""
   2,xyz,"\b:^H \n:
   \r:^M",a,b
   3,a,"b,c,d","""quoted""",e
   ```
   
   ```
   cat /tmp/psql.tsv
   1    abc    test line 1\ntest line 2               \N
   2    xyz    \\b:\b \\n:\n \\r:\r       a           b
   3    a      b,c,d                      "quoted"    e
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] angusdev commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

Posted by GitBox <gi...@apache.org>.
angusdev commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1257389351

   Please hold this PR.
   
   Seems need to escape when writing , but no escape when reading PostgreSQL produced CSV. May need to create a separate CSVFormat (e.g. POSTGRESQL_CSV_READ), need sometime to test the behaviour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] angusdev commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

Posted by GitBox <gi...@apache.org>.
angusdev commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1258667702

   Added test for tab characters (ASCII 9) in values.
   
   For QUOTE and ESCAPE, see below example
   
   ```
   postgres=# insert into COMMONS_CSV_PSQL_TEST select 1, '"quoted"', '|quoted2|', null, null;
   INSERT 0 1
   postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT with CSV;
   1,"""quoted""",|quoted2|,,
   postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT with CSV QUOTE '|';;
   1,"quoted",|||quoted2|||,,
   postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT with CSV ESCAPE '~';
   1,"~"quoted~"",|quoted2|,,
   postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT with CSV QUOTE '|' ESCAPE '~';
   1,"quoted",|~|quoted2~||,,
   postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT QUOTE '|';
   ERROR:  COPY quote available only in CSV mode
   postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT ESCAPE '~';
   ERROR:  COPY escape available only in CSV mode
   ```
   
   In PG (CSV), ESCAPE is used to escape the quote char, while in COMMONS_CSV, ESCAPE is to escape delimiter and special char
   
   In PG (TEXT), QUOTE is not needed as it is tab-delimited and the delimiter (tab) is escaped by '\t'
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] angusdev commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

Posted by GitBox <gi...@apache.org>.
angusdev commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1257383391

   I can't test previous versions of PostgreSQL. But it is pretty safe to say that it applies to all versions.
   
   Postgresql support export to CVS since version 8.0 (year 2005, https://www.postgresql.org/docs/8.0/release-8-0.html)
   In the documentation of 8.1 (https://www.postgresql.org/docs/8.1/sql-copy.html), it didn't state clearly but implied that the CSV export will not escape special characters.  The backslash escape is used for import data from text file (COPY FROM) only.
   
   For text format (tab delimited), there is no reason to quote the text.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory merged pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

Posted by GitBox <gi...@apache.org>.
garydgregory merged PR #265:
URL: https://github.com/apache/commons-csv/pull/265


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

Posted by GitBox <gi...@apache.org>.
garydgregory commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1258615088

   Hello @angusdev 
   
   Thank you for updating your PR.
   
   (1) I think you need to test for tab characters (ASCII 9) in values.
   
   (2) In the PG docs I read
   ```
   QUOTE
   Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.
   
   ESCAPE
   Specifies the character that should appear before a data character that matches the QUOTE value. The default is the same as the QUOTE value (so that the quoting character is doubled if it appears in the data). This must be a single one-byte character. This option is allowed only when using CSV format.
   ```
   Please help me understand why the git master code does not match this definition.
   TY!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] codecov-commenter commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1257289564

   # [Codecov](https://codecov.io/gh/apache/commons-csv/pull/265?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#265](https://codecov.io/gh/apache/commons-csv/pull/265?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d992e26) into [master](https://codecov.io/gh/apache/commons-csv/commit/048d507de95b3aaabac5ecc29878390601b47ff3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (048d507) will **not change** coverage.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff            @@
   ##             master     #265   +/-   ##
   =========================================
     Coverage     97.34%   97.34%           
     Complexity      535      535           
   =========================================
     Files            11       11           
     Lines          1169     1169           
     Branches        205      205           
   =========================================
     Hits           1138     1138           
     Misses           18       18           
     Partials         13       13           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/commons-csv/pull/265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rc/main/java/org/apache/commons/csv/CSVFormat.java](https://codecov.io/gh/apache/commons-csv/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY3N2L0NTVkZvcm1hdC5qYXZh) | `97.16% <100.00%> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] angusdev commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

Posted by GitBox <gi...@apache.org>.
angusdev commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1258595192

   Tested the behaviour of import and export are consistent.
   
   Test case: export csv/tsv from PostgreSQL, read by commons-cvs and write to new csv/tsv, import to PostgreSQL, export csv/tsv again. compare the 1st and 2nd export file
   
   ```sql
   drop table COMMONS_CSV_PSQL_TEST;
   create table COMMONS_CSV_PSQL_TEST (ID INTEGER, COL1 VARCHAR, COL2 VARCHAR, COL3 VARCHAR, COL4 VARCHAR);
   insert into COMMONS_CSV_PSQL_TEST select 1, 'abc', 'test line 1' || chr(10) || 'test line 2', null, '';
   insert into COMMONS_CSV_PSQL_TEST select 2, 'xyz', '\b:' || chr(8) || ' \n:' || chr(10) || ' \r:' || chr(13), 'a', 'b';
   insert into COMMONS_CSV_PSQL_TEST values (3, 'a', 'b,c,d', '"quoted"', 'e');
   copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.csv' with (FORMAT CSV);
   copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.tsv';
   ```
   
   use commons-csv to read '/tmp/psql.csv' and write to '/tmp/outpsql.csv', same for 'psql.tsv'
   
   ```sql
   truncate table COMMONS_CSV_PSQL_TEST;
   copy COMMONS_CSV_PSQL_TEST(ID, COL1, COL2, COL3, COL4) from '/tmp/outpsql.csv' with (FORMAT CSV);
   copy COMMONS_CSV_PSQL_TEST to '/tmp/psql2.csv' with (FORMAT CSV);
   
   truncate table COMMONS_CSV_PSQL_TEST;
   copy COMMONS_CSV_PSQL_TEST(ID, COL1, COL2, COL3, COL4) from '/tmp/outpsql.tsv';
   copy COMMONS_CSV_PSQL_TEST to '/tmp/psql2.tsv';
   ```
   
   diff /tmp/psql.csv /tmp/psql2.csv
   (no difference)
   
   diff /tmp/psql.tsv /tmp/psql2.tsv
   (no difference)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-csv] garydgregory commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats

Posted by GitBox <gi...@apache.org>.
garydgregory commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1257300816

   Hi @angusdev 
   Thank you for the PR.
   Does the version of PostgreSQL matter? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org