You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/09/25 21:54:04 UTC
[GitHub] [commons-csv] angusdev opened a new pull request, #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
angusdev opened a new pull request, #265:
URL: https://github.com/apache/commons-csv/pull/265
I tested in psql 14.5 Homebrew in Mac M1.
CSVFormat.POSTGRESQL_CSV - special characters are not escaped.
CSVFormat.POSTGRESQL_TEXT - values are not quoted.
```sql
drop table COMMONS_CSV_PSQL_TEST;
create table COMMONS_CSV_PSQL_TEST (ID INTEGER, COL1 VARCHAR, COL2 VARCHAR, COL3 VARCHAR, COL4 VARCHAR);
insert into COMMONS_CSV_PSQL_TEST select 1, 'abc', 'test line 1' || chr(10) || 'test line 2', null, '';
insert into COMMONS_CSV_PSQL_TEST select 2, 'xyz', '\b:' || chr(8) || ' \n:' || chr(10) || ' \r:' || chr(13), 'a', 'b';
insert into COMMONS_CSV_PSQL_TEST values (3, 'a', 'b,c,d', '"quoted"', 'e');
copy COMMONS_CSV_PSQL_TEST TO '/tmp/psql.csv' WITH (FORMAT CSV);
copy COMMONS_CSV_PSQL_TEST TO '/tmp/psql.tsv';
```
```
cat /tmp/psql.csv
1,abc,"test line 1
test line 2",,""
2,xyz,"\b:^H \n:
\r:^M",a,b
3,a,"b,c,d","""quoted""",e
```
```
cat /tmp/psql.tsv
1 abc test line 1\ntest line 2 \N
2 xyz \\b:\b \\n:\n \\r:\r a b
3 a b,c,d "quoted" e
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [commons-csv] angusdev commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
Posted by GitBox <gi...@apache.org>.
angusdev commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1257389351
Please hold this PR.
Seems need to escape when writing , but no escape when reading PostgreSQL produced CSV. May need to create a separate CSVFormat (e.g. POSTGRESQL_CSV_READ), need sometime to test the behaviour.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [commons-csv] angusdev commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
Posted by GitBox <gi...@apache.org>.
angusdev commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1258667702
Added test for tab characters (ASCII 9) in values.
For QUOTE and ESCAPE, see below example
```
postgres=# insert into COMMONS_CSV_PSQL_TEST select 1, '"quoted"', '|quoted2|', null, null;
INSERT 0 1
postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT with CSV;
1,"""quoted""",|quoted2|,,
postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT with CSV QUOTE '|';;
1,"quoted",|||quoted2|||,,
postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT with CSV ESCAPE '~';
1,"~"quoted~"",|quoted2|,,
postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT with CSV QUOTE '|' ESCAPE '~';
1,"quoted",|~|quoted2~||,,
postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT QUOTE '|';
ERROR: COPY quote available only in CSV mode
postgres=# copy COMMONS_CSV_PSQL_TEST to STDOUT ESCAPE '~';
ERROR: COPY escape available only in CSV mode
```
In PG (CSV), ESCAPE is used to escape the quote char, while in COMMONS_CSV, ESCAPE is to escape delimiter and special char
In PG (TEXT), QUOTE is not needed as it is tab-delimited and the delimiter (tab) is escaped by '\t'
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [commons-csv] angusdev commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
Posted by GitBox <gi...@apache.org>.
angusdev commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1257383391
I can't test previous versions of PostgreSQL. But it is pretty safe to say that it applies to all versions.
Postgresql support export to CVS since version 8.0 (year 2005, https://www.postgresql.org/docs/8.0/release-8-0.html)
In the documentation of 8.1 (https://www.postgresql.org/docs/8.1/sql-copy.html), it didn't state clearly but implied that the CSV export will not escape special characters. The backslash escape is used for import data from text file (COPY FROM) only.
For text format (tab delimited), there is no reason to quote the text.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [commons-csv] garydgregory merged pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
Posted by GitBox <gi...@apache.org>.
garydgregory merged PR #265:
URL: https://github.com/apache/commons-csv/pull/265
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [commons-csv] garydgregory commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
Posted by GitBox <gi...@apache.org>.
garydgregory commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1258615088
Hello @angusdev
Thank you for updating your PR.
(1) I think you need to test for tab characters (ASCII 9) in values.
(2) In the PG docs I read
```
QUOTE
Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.
ESCAPE
Specifies the character that should appear before a data character that matches the QUOTE value. The default is the same as the QUOTE value (so that the quoting character is doubled if it appears in the data). This must be a single one-byte character. This option is allowed only when using CSV format.
```
Please help me understand why the git master code does not match this definition.
TY!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [commons-csv] codecov-commenter commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1257289564
# [Codecov](https://codecov.io/gh/apache/commons-csv/pull/265?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#265](https://codecov.io/gh/apache/commons-csv/pull/265?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d992e26) into [master](https://codecov.io/gh/apache/commons-csv/commit/048d507de95b3aaabac5ecc29878390601b47ff3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (048d507) will **not change** coverage.
> The diff coverage is `100.00%`.
```diff
@@ Coverage Diff @@
## master #265 +/- ##
=========================================
Coverage 97.34% 97.34%
Complexity 535 535
=========================================
Files 11 11
Lines 1169 1169
Branches 205 205
=========================================
Hits 1138 1138
Misses 18 18
Partials 13 13
```
| [Impacted Files](https://codecov.io/gh/apache/commons-csv/pull/265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rc/main/java/org/apache/commons/csv/CSVFormat.java](https://codecov.io/gh/apache/commons-csv/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY3N2L0NTVkZvcm1hdC5qYXZh) | `97.16% <100.00%> (ø)` | |
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [commons-csv] angusdev commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
Posted by GitBox <gi...@apache.org>.
angusdev commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1258595192
Tested the behaviour of import and export are consistent.
Test case: export csv/tsv from PostgreSQL, read by commons-cvs and write to new csv/tsv, import to PostgreSQL, export csv/tsv again. compare the 1st and 2nd export file
```sql
drop table COMMONS_CSV_PSQL_TEST;
create table COMMONS_CSV_PSQL_TEST (ID INTEGER, COL1 VARCHAR, COL2 VARCHAR, COL3 VARCHAR, COL4 VARCHAR);
insert into COMMONS_CSV_PSQL_TEST select 1, 'abc', 'test line 1' || chr(10) || 'test line 2', null, '';
insert into COMMONS_CSV_PSQL_TEST select 2, 'xyz', '\b:' || chr(8) || ' \n:' || chr(10) || ' \r:' || chr(13), 'a', 'b';
insert into COMMONS_CSV_PSQL_TEST values (3, 'a', 'b,c,d', '"quoted"', 'e');
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.csv' with (FORMAT CSV);
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.tsv';
```
use commons-csv to read '/tmp/psql.csv' and write to '/tmp/outpsql.csv', same for 'psql.tsv'
```sql
truncate table COMMONS_CSV_PSQL_TEST;
copy COMMONS_CSV_PSQL_TEST(ID, COL1, COL2, COL3, COL4) from '/tmp/outpsql.csv' with (FORMAT CSV);
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql2.csv' with (FORMAT CSV);
truncate table COMMONS_CSV_PSQL_TEST;
copy COMMONS_CSV_PSQL_TEST(ID, COL1, COL2, COL3, COL4) from '/tmp/outpsql.tsv';
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql2.tsv';
```
diff /tmp/psql.csv /tmp/psql2.csv
(no difference)
diff /tmp/psql.tsv /tmp/psql2.tsv
(no difference)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [commons-csv] garydgregory commented on pull request #265: CSV-290 - Fix the wrong assumptions in PostgreSQL formats
Posted by GitBox <gi...@apache.org>.
garydgregory commented on PR #265:
URL: https://github.com/apache/commons-csv/pull/265#issuecomment-1257300816
Hi @angusdev
Thank you for the PR.
Does the version of PostgreSQL matter?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org