You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Alex Herbert <al...@gmail.com> on 2019/07/03 08:13:39 UTC

Re: [CSV] Release again


> On 15 Jun 2019, at 19:59, Alex Herbert <al...@gmail.com> wrote:
> 
> 
> 
>> On 15 Jun 2019, at 18:57, Gary Gregory <ga...@gmail.com> wrote:
>> 
>> We've fixed some issues immediately after 1.7. How does everyone feel about
>> releasing again?
>> What else needs to be addressed in the short term?
>> Gary
> 
> - Bug (picked up by FindBugs):
> 
> CSVRecord is no longer serialisable as it stores the CSVParser and that is not serialisable. This can be fixed by making the parser transient and documenting the getParser() method as invalid after deserialisation. Otherwise the cascade of fields that must be serialisable eventually includes the original BufferedReader used to read the data. The parser is required for the header map functionality and the getParser() method. So the class can store the header map (easy), or overload the serialisation methods to record and load the header map (more effort), or not support the parser and the header map functionality after deserialisation, or something else. 
> 
> This should have been picked up by FindBugs during the release cycle but for unknown reasons the FindBugs report on the site does not show this. So anyone who deserialises this class will get an error as the serialVersionUID has not been updated and the new parser field is not serialisable.
> 

I’ve raised this as a bug (CSV-248). It requires a decision on how to fix it.

> 
> I also raised a few issues about the handling of column headers in my recent fix PR:
> 
> - Bug:
> 
> If the settings are not allowing empty columns headers you can currently use a single empty header. This is because column headers are only checked for empty when they are duplicates. So it is the second empty header (the first duplicate) that raises an error. This test should pass but does not:
> 
> // This fails to throw an exception.
> // Only 1 column name is missing and this is not recognised as a problem.
> @Test(expected = IllegalArgumentException.class)
> public void testHeadersMissing1ColumnException() throws Exception {
>    final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz");
>    CSVFormat.DEFAULT.withHeader().parse(in).iterator();
> }

I’ve raised this as a bug (CSV-247). It is a rearrangement of the checking logic and I will raise a PR to fix it.

> 
> - Documentation:
> 
> If you are using the map of the header to column index it should be noted that under certain circumstances where you allow duplicate column headers that the map cannot represent the one-to-many relationship. The map is only guaranteed to be valid if the record was created with a format that does not allow duplicate or empty header names.

Any objections to updating the Javadoc to record the limitations of the header map?

> 
> - Bug / Documentation
> 
> The CSVParser only stores headers names in a list of header names if they are not null. So the list can be shorter than the number of columns if you use a format that allows empty headers.
> 
> If the purpose of the list is to provide the correct order of the column headers then it should contain empty entries too. Otherwise I don’t see the purpose of storing this list unless it is documented what it will contain. This appears to be a list of non-null header names in the order they occur in the header and thus represent keys that can be used in the header map. But you can get that (without the ordering) using the map ketSet.

Is this a bug or should it be documented as a feature?

I consider it a bug. The list of header names does not always reflect the header, i.e. it should contain empty entries if the header was empty.

> 
> 
> 
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [CSV] Release again

Posted by Alex Herbert <al...@gmail.com>.


> On 3 Jul 2019, at 09:18, Hoa Phan <s5...@gmail.com> wrote:
> 
> +1

Please can you state what you are voting for. 

A new release or each of my proposals to fix bugs before the release.

> 
> Sent from Yahoo Mail on Android 
> 
>  On Wed, Jul 3, 2019 at 18:13, Alex Herbert<al...@gmail.com> wrote:   
> 
>> On 15 Jun 2019, at 19:59, Alex Herbert <al...@gmail.com> wrote:
>> 
>> 
>> 
>>> On 15 Jun 2019, at 18:57, Gary Gregory <ga...@gmail.com> wrote:
>>> 
>>> We've fixed some issues immediately after 1.7. How does everyone feel about
>>> releasing again?
>>> What else needs to be addressed in the short term?
>>> Gary
>> 
>> - Bug (picked up by FindBugs):
>> 
>> CSVRecord is no longer serialisable as it stores the CSVParser and that is not serialisable. This can be fixed by making the parser transient and documenting the getParser() method as invalid after deserialisation. Otherwise the cascade of fields that must be serialisable eventually includes the original BufferedReader used to read the data. The parser is required for the header map functionality and the getParser() method. So the class can store the header map (easy), or overload the serialisation methods to record and load the header map (more effort), or not support the parser and the header map functionality after deserialisation, or something else. 
>> 
>> This should have been picked up by FindBugs during the release cycle but for unknown reasons the FindBugs report on the site does not show this. So anyone who deserialises this class will get an error as the serialVersionUID has not been updated and the new parser field is not serialisable.
>> 
> 
> I’ve raised this as a bug (CSV-248). It requires a decision on how to fix it.
> 
>> 
>> I also raised a few issues about the handling of column headers in my recent fix PR:
>> 
>> - Bug:
>> 
>> If the settings are not allowing empty columns headers you can currently use a single empty header. This is because column headers are only checked for empty when they are duplicates. So it is the second empty header (the first duplicate) that raises an error. This test should pass but does not:
>> 
>> // This fails to throw an exception.
>> // Only 1 column name is missing and this is not recognised as a problem.
>> @Test(expected = IllegalArgumentException.class)
>> public void testHeadersMissing1ColumnException() throws Exception {
>>     final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz");
>>     CSVFormat.DEFAULT.withHeader().parse(in).iterator();
>> }
> 
> I’ve raised this as a bug (CSV-247). It is a rearrangement of the checking logic and I will raise a PR to fix it.
> 
>> 
>> - Documentation:
>> 
>> If you are using the map of the header to column index it should be noted that under certain circumstances where you allow duplicate column headers that the map cannot represent the one-to-many relationship. The map is only guaranteed to be valid if the record was created with a format that does not allow duplicate or empty header names.
> 
> Any objections to updating the Javadoc to record the limitations of the header map?
> 
>> 
>> - Bug / Documentation
>> 
>> The CSVParser only stores headers names in a list of header names if they are not null. So the list can be shorter than the number of columns if you use a format that allows empty headers.
>> 
>> If the purpose of the list is to provide the correct order of the column headers then it should contain empty entries too. Otherwise I don’t see the purpose of storing this list unless it is documented what it will contain. This appears to be a list of non-null header names in the order they occur in the header and thus represent keys that can be used in the header map. But you can get that (without the ordering) using the map ketSet.
> 
> Is this a bug or should it be documented as a feature?
> 
> I consider it a bug. The list of header names does not always reflect the header, i.e. it should contain empty entries if the header was empty.
> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [CSV] Release again

Posted by Hoa Phan <s5...@gmail.com>.

+1

Sent from Yahoo Mail on Android 
 
  On Wed, Jul 3, 2019 at 18:13, Alex Herbert<al...@gmail.com> wrote:   

> On 15 Jun 2019, at 19:59, Alex Herbert <al...@gmail.com> wrote:
> 
> 
> 
>> On 15 Jun 2019, at 18:57, Gary Gregory <ga...@gmail.com> wrote:
>> 
>> We've fixed some issues immediately after 1.7. How does everyone feel about
>> releasing again?
>> What else needs to be addressed in the short term?
>> Gary
> 
> - Bug (picked up by FindBugs):
> 
> CSVRecord is no longer serialisable as it stores the CSVParser and that is not serialisable. This can be fixed by making the parser transient and documenting the getParser() method as invalid after deserialisation. Otherwise the cascade of fields that must be serialisable eventually includes the original BufferedReader used to read the data. The parser is required for the header map functionality and the getParser() method. So the class can store the header map (easy), or overload the serialisation methods to record and load the header map (more effort), or not support the parser and the header map functionality after deserialisation, or something else. 
> 
> This should have been picked up by FindBugs during the release cycle but for unknown reasons the FindBugs report on the site does not show this. So anyone who deserialises this class will get an error as the serialVersionUID has not been updated and the new parser field is not serialisable.
> 

I’ve raised this as a bug (CSV-248). It requires a decision on how to fix it.

> 
> I also raised a few issues about the handling of column headers in my recent fix PR:
> 
> - Bug:
> 
> If the settings are not allowing empty columns headers you can currently use a single empty header. This is because column headers are only checked for empty when they are duplicates. So it is the second empty header (the first duplicate) that raises an error. This test should pass but does not:
> 
> // This fails to throw an exception.
> // Only 1 column name is missing and this is not recognised as a problem.
> @Test(expected = IllegalArgumentException.class)
> public void testHeadersMissing1ColumnException() throws Exception {
>    final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz");
>    CSVFormat.DEFAULT.withHeader().parse(in).iterator();
> }

I’ve raised this as a bug (CSV-247). It is a rearrangement of the checking logic and I will raise a PR to fix it.

> 
> - Documentation:
> 
> If you are using the map of the header to column index it should be noted that under certain circumstances where you allow duplicate column headers that the map cannot represent the one-to-many relationship. The map is only guaranteed to be valid if the record was created with a format that does not allow duplicate or empty header names.

Any objections to updating the Javadoc to record the limitations of the header map?

> 
> - Bug / Documentation
> 
> The CSVParser only stores headers names in a list of header names if they are not null. So the list can be shorter than the number of columns if you use a format that allows empty headers.
> 
> If the purpose of the list is to provide the correct order of the column headers then it should contain empty entries too. Otherwise I don’t see the purpose of storing this list unless it is documented what it will contain. This appears to be a list of non-null header names in the order they occur in the header and thus represent keys that can be used in the header map. But you can get that (without the ordering) using the map ketSet.

Is this a bug or should it be documented as a feature?

I consider it a bug. The list of header names does not always reflect the header, i.e. it should contain empty entries if the header was empty.

> 
> 
> 
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org