You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Rob Tompkins <ch...@gmail.com> on 2020/01/24 02:47:18 UTC

[CSV] discuss 1.8 vote (was: Re: [VOTE] Release Apache Commons CSV 1.8 based on RC1)


> On Jan 23, 2020, at 8:55 AM, sebb <se...@gmail.com> wrote:
> 
> I think we don't want temporary serialisation fixes to encourage the
> use of serialisation.
> 
> So I suggest that the Release Notes and Javadoc should point out that
> although serialisation is possible, it is not fully supported, and
> that there are plans to drop all serialisation support.

My 2c (maybe I’m not sufficiently contextualized here): I think that it’s ok to have breaking changes in a minor version release if it’s for a pointed security reason like removing serialization for the sake of security. In fact I think it’s almost a good idea to do that because it forces people to think about the security issues as opposed to just up versioning and assuming things will just work. That said, if we can make things more secure and have non-breaking changes, then that’s clearly ideal.

> 
> On Tue, 21 Jan 2020 at 13:02, Alex Herbert <alex.d.herbert@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>> On 20/01/2020 23:28, Gary Gregory wrote:
>>> On Sun, Jan 19, 2020 at 7:39 AM Alex Herbert <al...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Gary,
>>>> 
>>>> I raised a few niggles a while back with CSV and the discussion did not
>>>> receive a response on how to proceed.
>>>> 
>>>> There is the major bug CSV-248 where the CSVRecord is not Serializable
>>>> [1]. This requires a decision on what to do to fix it. This bug is still
>>>> present in 1.8 RC1 as found by FindBugs [2].
>>>> 
>>>> From what I can see the CSVRecord maintains a reference to the CSVParser.
>>>> This chain of objects maintained in memory is not serializable and leads
>>>> back to the original input Reader.
>>>> 
>>>> I can see from the JApiCmp report that the serial version id was changed
>>>> for CSVRecord this release so there is still an intention to support
>>>> serialization. So this should be a blocker.
>>>> 
>>>> I could not find a serialisation test in the unit tests for CSVRecord.
>>>> This quick test added to CSVRecordTest fails:
>>>> 
>>>> 
>>>> @Test
>>>> public void testSerialization() throws IOException {
>>>>     CSVRecord shortRec;
>>>>     try (final CSVParser parser = CSVParser.parse("a,b",
>>>> CSVFormat.newFormat(','))) {
>>>>         shortRec = parser.iterator().next();
>>>>     }
>>>>     final ByteArrayOutputStream out = new ByteArrayOutputStream();
>>>>     try (ObjectOutputStream oos = new ObjectOutputStream(out)) {
>>>>         oos.writeObject(shortRec);
>>>>     }
>>>> }
>>>> 
>>>> mvn test -Dtest=CSVRecordTest
>>>> 
>>>> [ERROR] testSerialization  Time elapsed: 0.032 s  <<< ERROR!
>>>> java.io.NotSerializableException: org.apache.commons.csv.CSVParser
>>>>         at
>>>> org.apache.commons.csv.CSVRecordTest.testSerialization(CSVRecordTest.java:235)
>>>> 
>>>> If I mark the field csvParser as transient it passes. So this is a problem
>>>> as raised by FindBugs.
>>>> 
>>> Making the field transient would indeed fix this test but... some of the
>>> record APIs would then fail with NPEs... so we would be kicking the can
>>> down the road basically. Making the parser serializable is going in the
>>> wrong direction feature-wise IMO, so let's not go there. A serialization
>>> proxy would be less worse but should we provide such a thing? I would say
>>> no. I am OK with making the field transient despite the can kicking, so
>>> let's do that.
>> 
>> I was adding some tests that the deserialised record behaves as if the
>> parser and header map are not available. I think this will be fine if we
>> update to this:
>> 
>> private Map<String, Integer> getHeaderMapRaw() {
>>     return parser == null ? null : parser.getHeaderMapRaw();
>> }
>> 
>> Deserialisation from a form created with version 1.6 is fine. I added a
>> test for this. See current master.
>> 
>> Alex
>> 
>> 
>>> 
>>> Gary
>>> 
>>> 
>>>> 
>>>> I also raised [3] the strange implementation of the CSVParser
>>>> getHeaderNames() which ignores null headers as they cannot be used as a key
>>>> into the map. However the list of column names could contain the null
>>>> values. This test currently fails:
>>>> 
>>>> @Test
>>>> public void testHeaderNamesWithNull() throws IOException {
>>>>     final Reader in = new
>>>> StringReader("header1,null,header3\n1,2,3\n4,5,6");
>>>>     final Iterator<CSVRecord> records = CSVFormat.DEFAULT.withHeader()
>>>> 
>>>>  .withNullString("null")
>>>> 
>>>>  .withAllowMissingColumnNames()
>>>> 
>>>>  .parse(in).iterator();
>>>>     final CSVRecord record = records.next();
>>>>     assertEquals(Arrays.asList("header1", null, "header3"),
>>>> record.getParser().getHeaderNames());
>>>> }
>>>> 
>>>> I am not saying it should pass but at least the documentation should state
>>>> the behaviour in this edge case. That is the list of header names may be
>>>> shorter than the number of columns when the parser is configured to allow
>>>> null headers. I’ve not raised a bug ticket for this as it is open to
>>>> opinion if this is by design or actually a bug. This issue is still present
>>>> in 1.8 RC1.
>>>> 
>>>> Previously I suggested documentation changes for this and another edge
>>>> case using the header map to be added to the javadoc for getHeaderNames()
>>>> and getHeaderMap():
>>>> 
>>>> - Documentation:
>>>> 
>>>> The mapping is only guaranteed to be a one-to-one mapping if the record
>>>> was created with a format that does not allow duplicate or null header
>>>> names. Null headers are excluded from the map and duplicates can only map
>>>> to 1 column.
>>>> 
>>>> 
>>>> - Bug / Documentation
>>>> 
>>>> The CSVParser only stores headers names in a list of header names if they
>>>> are not null. So the list can be shorter than the number of columns if you
>>>> use a format that allows empty headers and contains null column names.
>>>> 
>>>> 
>>>> The ultimate result is that we should document that the purpose of the
>>>> header names is to provide a list of non-null header names in the order
>>>> they occur in the header and thus represent keys that can be used in the
>>>> header map. In certain circumstances there may be more columns in the data
>>>> than there are header names.
>>>> 
>>>> 
>>>> Alex
>>>> 
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/CSV-248 <
>>>> https://issues.apache.org/jira/browse/CSV-248>
>>>> 
>>>> [2]
>>>> https://dist.apache.org/repos/dist/dev/commons/csv/1.8-RC1/site/findbugs.html
>>>> <
>>>> https://dist.apache.org/repos/dist/dev/commons/csv/1.8-RC1/site/findbugs.html
>>>> [3] https://markmail.org/message/woti2iymecosihx6 <
>>>> https://markmail.org/message/woti2iymecosihx6>
>>>> 
>>>> 
>>>> 
>>>>> On 18 Jan 2020, at 17:52, Gary Gregory <gg...@apache.org> wrote:
>>>>> 
>>>>> We have fixed quite a few bugs and added some significant enhancements
>>>>> since Apache Commons CSV 1.7 was released, so I would like to release
>>>>> Apache Commons CSV 1.8.
>>>>> 
>>>>> Apache Commons CSV 1.8 RC1 is available for review here:
>>>>>    https://dist.apache.org/repos/dist/dev/commons/csv/1.8-RC1 (svn
>>>>> revision 37670)
>>>>> 
>>>>> The Git tag commons-csv-1.8-RC1 commit for this RC is
>>>>> c1c8b32809df295423fc897eae0e8b22bfadfe27 which you can browse here:
>>>>> 
>>>>> 
>>>> https://gitbox.apache.org/repos/asf?p=commons-csv.git;a=commit;h=c1c8b32809df295423fc897eae0e8b22bfadfe27
>>>>> You may checkout this tag using:
>>>>>    git clone https://gitbox.apache.org/repos/asf/commons-csv.git
>>>> --branch
>>>>> commons-csv-1.8-RC1 commons-csv-1.8-RC1
>>>>> 
>>>>> Maven artifacts are here:
>>>>> 
>>>>> 
>>>> https://repository.apache.org/content/repositories/orgapachecommons-1489/org/apache/commons/commons-csv/1.8/
>>>>> These are the artifacts and their hashes:
>>>>> 
>>>>> #Release SHA-512s
>>>>> #Sat Jan 18 12:01:01 EST 2020
>>>>> 
>>>> commons-csv-1.8-bin.tar.gz=85a876b41aa9ce61f7f533c46df48754e05bddbdef892aed2bac7674b5ea13855de25576364649048dbb55e7fb18a354305b56cb697e85df68a87113954128ed
>>>> commons-csv-1.8-bin.zip=9b86a22367c84a0c96a457e8495f81113b64ae5501eabbe2ea4137654b6baa05bcc24a19626452b80e30ff2dd39214840c6ec534be1db9eec2d12c93eeab2de1
>>>> commons-csv-1.8-javadoc.jar=a481149dfeffe4e915d5d2e846831994223fc7d09ed2b61398c68eed5a672654a141fa6de705aa743d0b5af6fd24a3f4b0d5e7cee238a1f7642673288d4a985d
>>>> commons-csv-1.8-sources.jar=f68e50f8a025a8b2a570b46905b22b5753a83c19bee5c38103d92ec1e47b4e0d27353e7931961e74fe8e67c4909b0ece6ede49a585d2f9180a7a15458b020bc0
>>>> commons-csv-1.8-src.tar.gz=c3268f456978e75c19134e35d05bff77002b2fa7439be2623d58a102cab4f93b0913a1a789f962aafcd677938be1547f47c5dd86e3ea08b7bf8f0420e81beb7a
>>>> commons-csv-1.8-src.zip=ebb32f2406b6afa48472283612c7d0a94f932d7ae7a72ad1d239e2249de12f1e0da7f61d34d95d66b1d1fe95b66b6316af9d1fc93734f610cce4a7163b0900d0
>>>> commons-csv-1.8-test-sources.jar=13508d417e23a5d3f575a39b3aedb20d0d834335d7994f3045fff316e6b12e50cbf9afe908271357b4091d981c178a28dc61bcdb8db60bd0ada07d3de59eacbf
>>>> commons-csv-1.8-tests.jar=901889d4be203c2044df89b7e051d21e7b806e5e56438bf9a7483b334331da94b71de1a129c8bf7967e02479a0922bb834ce37eaabf6662702e147813ecb2b7f
>>>>> I have tested this with ' mvn -V -Prelease -Ptest-deploy -P jacoco -P
>>>>> japicmp clean package site deploy' using:
>>>>> 
>>>>> Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
>>>>> Maven home: C:\Java\apache-maven-3.6.3\bin\..
>>>>> Java version: 1.8.0_241, vendor: Oracle Corporation, runtime: C:\Program
>>>>> Files\Java\jdk1.8.0_241\jre
>>>>> Default locale: en_US, platform encoding: Cp1252
>>>>> OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
>>>>> 
>>>>> Additional tests with 'mvn -V clean test' using:
>>>>> 
>>>>> Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
>>>>> Maven home: C:\Java\apache-maven-3.6.3\bin\..
>>>>> Java version: 1.8.0_241, vendor: Oracle Corporation, runtime: C:\Program
>>>>> Files\Java\jdk1.8.0_241\jre
>>>>> Default locale: en_US, platform encoding: Cp1252
>>>>> OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
>>>>> 
>>>>> Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
>>>>> Maven home: C:\Java\apache-maven-3.6.3\bin\..
>>>>> Java version: 11.0.6, vendor: Oracle Corporation, runtime: C:\Program
>>>>> Files\Java\jdk-11.0.6
>>>>> Default locale: en_US, platform encoding: Cp1252
>>>>> OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
>>>>> 
>>>>> Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
>>>>> Maven home: C:\Java\apache-maven-3.6.3\bin\..
>>>>> Java version: 12.0.2, vendor: Oracle Corporation, runtime: C:\Program
>>>>> Files\Java\jdk-12.0.2
>>>>> Default locale: en_US, platform encoding: Cp1252
>>>>> OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
>>>>> 
>>>>> Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
>>>>> Maven home: C:\Java\apache-maven-3.6.3\bin\..
>>>>> Java version: 13.0.2, vendor: Oracle Corporation, runtime: C:\Program
>>>>> Files\Java\jdk-13.0.2
>>>>> Default locale: en_US, platform encoding: Cp1252
>>>>> OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
>>>>> 
>>>>> Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
>>>>> Maven home: C:\Java\apache-maven-3.6.3\bin\..
>>>>> Java version: 14-ea, vendor: Oracle Corporation, runtime: C:\Program
>>>>> Files\Java\openjdk\jdk-14
>>>>> Default locale: en_US, platform encoding: Cp1252
>>>>> OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
>>>>> 
>>>>> JaCoCo fails on Java 15-EA because it does not know about class file
>>>> major
>>>>> version 59:
>>>>> 
>>>>> Caused by: java.lang.IllegalArgumentException: Unsupported class file
>>>> major
>>>>> version 59
>>>>>        at
>>>>> 
>>>> org.jacoco.agent.rt.internal_43f5073.asm.ClassReader.<init>(ClassReader.java:195)
>>>>> Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
>>>>> Maven home: C:\Java\apache-maven-3.6.3\bin\..
>>>>> Java version: 15-ea, vendor: Oracle Corporation, runtime: C:\Program
>>>>> Files\Java\openjdk\jdk-15
>>>>> Default locale: en_US, platform encoding: Cp1252
>>>>> OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
>>>>> 
>>>>> Details of changes since 1.7 are in the release notes:
>>>>> 
>>>>> 
>>>> https://dist.apache.org/repos/dist/dev/commons/csv/1.8-RC1/RELEASE-NOTES.txt
>>>>> 
>>>> https://dist.apache.org/repos/dist/dev/commons/csv/1.8-RC1/site/changes-report.html
>>>>> Site:
>>>>> 
>>>>> 
>>>> https://dist.apache.org/repos/dist/dev/commons/csv/1.8-RC1/site/index.html
>>>>>    (note some *relative* links are broken and the 1.8 directories are not
>>>>> yet created - these will be OK once the site is deployed.)
>>>>> 
>>>>> JApiCmp Report (compared to 1.7):
>>>>> 
>>>>> 
>>>> https://dist.apache.org/repos/dist/dev/commons/csv/1.8-RC1/site/japicmp.html
>>>>> RAT Report:
>>>>> 
>>>>> 
>>>> https://dist.apache.org/repos/dist/dev/commons/csv/1.8-RC1/site/rat-report.html
>>>>> KEYS:
>>>>>  https://www.apache.org/dist/commons/KEYS
>>>>> 
>>>>> Please review the release candidate and vote.
>>>>> This vote will close no sooner that 72 hours from now.
>>>>> 
>>>>>  [ ] +1 Release these artifacts
>>>>>  [ ] +0 OK, but...
>>>>>  [ ] -0 OK, but really should fix...
>>>>>  [ ] -1 I oppose this release because...
>>>>> 
>>>>> Thank you,
>>>>> 
>>>>> Gary Gregory,
>>>>> Release Manager (using key 86fdc7e2a11262cb)
>>>>> 
>>>>> For following is intended as a helper and refresher for reviewers.
>>>>> 
>>>>> Validating a release candidate
>>>>> ==============================
>>>>> 
>>>>> These guidelines are NOT complete.
>>>>> 
>>>>> Requirements: Git, Java, Maven.
>>>>> 
>>>>> You can validate a release from a release candidate (RC) tag as follows.
>>>>> 
>>>>> 1) Clone and checkout the RC tag
>>>>> 
>>>>> git clone https://gitbox.apache.org/repos/asf/commons-csv.git --branch
>>>>> commons-csv-1.8-RC1 commons-csv-1.8-RC1
>>>>> cd commons-csv-1.8-RC1
>>>>> 
>>>>> 2) Check Apache licenses
>>>>> 
>>>>> This step is not required if the site includes a RAT report page which
>>>> you
>>>>> then must check.
>>>>> 
>>>>> mvn apache-rat:check
>>>>> 
>>>>> 3) Check binary compatibility
>>>>> 
>>>>> Older components still use Apache Clirr:
>>>>> 
>>>>> This step is not required if the site includes a Clirr report page which
>>>>> you then must check.
>>>>> 
>>>>> mvn clirr:check
>>>>> 
>>>>> Newer components use JApiCmp with the japicmp Maven Profile:
>>>>> 
>>>>> This step is not required if the site includes a JApiCmp report page
>>>> which
>>>>> you then must check.
>>>>> 
>>>>> mvn install -DskipTests -P japicmp japicmp:cmp
>>>>> 
>>>>> 4) Build the package
>>>>> 
>>>>> mvn -V clean package
>>>>> 
>>>>> You can record the Maven and Java version produced by -V in your VOTE
>>>> reply.
>>>>> To gather OS information from a command line:
>>>>> Windows: ver
>>>>> Linux: uname -a
>>>>> 
>>>>> 5) Build the site for a single module project
>>>>> 
>>>>> Note: Some plugins require the components to be installed instead of
>>>>> packaged.
>>>>> 
>>>>> mvn site
>>>>> Check the site reports in:
>>>>> - Windows: target\site\index.html
>>>>> - Linux: target/site/index.html
>>>>> 
>>>>> 6) Build the site for a multi-module project
>>>>> 
>>>>> mvn site
>>>>> mvn site:stage
>>>>> Check the site reports in:
>>>>> - Windows: target\site\index.html
>>>>> - Linux: target/site/index.html
>>>>> 
>>>>> -the end-
>>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <ma...@commons.apache.org>
>> For additional commands, e-mail: dev-help@commons.apache.org <ma...@commons.apache.org>
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <ma...@commons.apache.org>
> For additional commands, e-mail: dev-help@commons.apache.org <ma...@commons.apache.org>