You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Emmanuel Bourg <eb...@apache.org> on 2012/03/12 23:11:20 UTC

[csv] Headers

[csv] is missing some elements to ease the use of headers. I have no 
clear idea on how to address this, here are my thoughts.

Headers are used when the fields are accessed by the column name rather 
than by the index. This provides some flexibility because the input file 
can be slightly modified by reordering the columns or by inserting new 
columns without breaking the existing code.

Using the current API here is how one would work with headers:

   CSVParser parser = new CSVParser(in);
   Iterator<String[]> it = parser.iterator();

   // read the header
   String[] header = it.next();

   // build a name to index mapping
   Map<String, Integer> mapping = new HashMap<>();
   for (int i = 0; i < header.length; i++) {
       mapping.put(header[i], i);
   }

   // parse the records
   for (String[] record : parser) {
       Person person = new Person();
       person.setName(record[mapping.get("name")]);
       person.setEmail(record[mapping.get("email")]);
       person.setPhone(record[mapping.get("phone")]);
       persons.add(person);
   }

The user has to take care of the mapping, which is not very friendly. I 
have several solutions in mind:

1. Do nothing and address it in the next release with the bean mapping. 
Parsing the file would then look like this:

   CSVFormat<Person> format = CSVFormat.DEFAULT.withType(Person.class);
   for (Person person : format.parse(in)) {
       persons.add(person);
   }


2. Add a parser returning a Map instead of a String[]

   // declare the header in the format,
   // the header line will be parsed automatically
   CSVFormat format = CSVFormat.DEFAULT.withHeader();

   for (Map<String, String> record : new CSVMapParser(in, format))) {
       Person person = new Person();
       person.setName(record.get("name"));
       person.setEmail(record.get("email"));
       person.setPhone(record.get("phone"));
       persons.add(person);
   }


2bis. Have the same CSVParser class returning String[] or Map<String, 
String> depending on a generic parameter. Not sure it's possible with 
type erasure.


3. Have the parser maintain the name->index mapping. The parser read the 
first line automatically if the format declares a header, and a 
getColumnIndex() method is exposed.

   CSVFormat format = CSVFormat.DEFAULT.withHeader();
   CSVParser parser = new CSVParser(in, format);

   // parse the records
   for (String[] record : parser) {
       Person person = new Person();
       person.setName(record[parser.getColumnIndex("name")]);
       person.setEmail(record[parser.getColumnIndex("email")]);
       person.setPhone(record[parser.getColumnIndex("phone")]);
       persons.add(person);
   }


What do you think?

Emmanuel Bourg

Re: [csv] Headers

Posted by Benedikt Ritter <be...@googlemail.com>.

I think transforming the result of the parse process into instances of
some class is a different concern. That should not be part of as
CSVParser. In Hibernate they use ResultTransformers for this purpose
[1]. I think we should separate this concerns as well.

[1] http://docs.jboss.org/hibernate/orm/3.3/api/org/hibernate/transform/ResultTransformer.html

Am 13. März 2012 10:03 schrieb Emmanuel Bourg <eb...@apache.org>:
> Le 13/03/2012 09:56, sebb a écrit :
>
>
>> It needs to be possible to access columns by index without having to
>> use annotations.
>
>
> That's still possible with the low level API. I'm just exploring the
> features I would expect of a bean mapping.
>
> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [csv] Headers

Posted by Emmanuel Bourg <eb...@apache.org>.

Le 13/03/2012 09:56, sebb a écrit :

> It needs to be possible to access columns by index without having to
> use annotations.

That's still possible with the low level API. I'm just exploring the 
features I would expect of a bean mapping.

Emmanuel Bourg

Re: [csv] Headers

Posted by sebb <se...@gmail.com>.

On 13 March 2012 08:52, Emmanuel Bourg <eb...@apache.org> wrote:
> Le 13/03/2012 09:21, Jörg Schaible a écrit :
>
>
>>> If the file has a header, the fields are matched by attribute name, and
>>> an annotation can override the name of the column associated to an
>>> attribute.
>>
>>
>> Yeah, but that's not required. Just because you can read the names of the
>> columns does not mean that you want to address them by name. Why pay the
>> price for creating the map and accessing the values by name just for a
>> one-
>> time information?
>
>
> Sorry I forgot the end of my message, I meant to access the fields by name
> OR by index when the header is present. That would be configured with the
> annotations.

It needs to be possible to access columns by index without having to
use annotations.

> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [csv] Headers

Posted by Emmanuel Bourg <eb...@apache.org>.

Le 13/03/2012 09:21, Jörg Schaible a écrit :

>> If the file has a header, the fields are matched by attribute name, and
>> an annotation can override the name of the column associated to an
>> attribute.
>
> Yeah, but that's not required. Just because you can read the names of the
> columns does not mean that you want to address them by name. Why pay the
> price for creating the map and accessing the values by name just for a one-
> time information?

Sorry I forgot the end of my message, I meant to access the fields by 
name OR by index when the header is present. That would be configured 
with the annotations.

Emmanuel Bourg

Re: [csv] Headers

Posted by Jörg Schaible <Jo...@scalaris.com>.

Emmanuel Bourg wrote:

> Le 13/03/2012 00:56, sebb a écrit :
> 
>>> 1. Do nothing and address it in the next release with the bean mapping.
>>> Parsing the file would then look like this:
>>>
>>>   CSVFormat<Person>  format = CSVFormat.DEFAULT.withType(Person.class);
>>>   for (Person person : format.parse(in)) {
>>>       persons.add(person);
>>>   }
>>>
>>
>> Does this automatically mean that the file has a header?
>> Or is there another way to link columns to Person attributes?
> 
> If the file doesn't have a header, the fields are matched by index
> (either the natural ordering of the attributes in the class, or
> specified by an annotation).
> 
> If the file has a header, the fields are matched by attribute name, and
> an annotation can override the name of the column associated to an
> attribute.

Yeah, but that's not required. Just because you can read the names of the 
columns does not mean that you want to address them by name. Why pay the 
price for creating the map and accessing the values by name just for a one-
time information?

- Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [csv] Headers

Posted by Emmanuel Bourg <eb...@apache.org>.

Le 13/03/2012 00:56, sebb a écrit :

>> 1. Do nothing and address it in the next release with the bean mapping.
>> Parsing the file would then look like this:
>>
>>   CSVFormat<Person>  format = CSVFormat.DEFAULT.withType(Person.class);
>>   for (Person person : format.parse(in)) {
>>       persons.add(person);
>>   }
>>
>
> Does this automatically mean that the file has a header?
> Or is there another way to link columns to Person attributes?

If the file doesn't have a header, the fields are matched by index 
(either the natural ordering of the attributes in the class, or 
specified by an annotation).

If the file has a header, the fields are matched by attribute name, and 
an annotation can override the name of the column associated to an 
attribute.


Emmanuel Bourg

Re: [csv] Headers

Posted by Luc Maisonobe <Lu...@free.fr>.

Le 13/03/2012 00:56, sebb a écrit :
> On 12 March 2012 22:11, Emmanuel Bourg <eb...@apache.org> wrote:
>> [csv] is missing some elements to ease the use of headers. I have no clear
>> idea on how to address this, here are my thoughts.
>>
>> Headers are used when the fields are accessed by the column name rather than
>> by the index. This provides some flexibility because the input file can be
>> slightly modified by reordering the columns or by inserting new columns
>> without breaking the existing code.
>>
>> Using the current API here is how one would work with headers:
>>
>>  CSVParser parser = new CSVParser(in);
>>  Iterator<String[]> it = parser.iterator();
>>
>>  // read the header
>>  String[] header = it.next();
>>
>>  // build a name to index mapping
>>  Map<String, Integer> mapping = new HashMap<>();
>>  for (int i = 0; i < header.length; i++) {
>>      mapping.put(header[i], i);
>>  }
>>
>>  // parse the records
>>  for (String[] record : parser) {
>>      Person person = new Person();
>>      person.setName(record[mapping.get("name")]);
>>      person.setEmail(record[mapping.get("email")]);
>>      person.setPhone(record[mapping.get("phone")]);
>>      persons.add(person);
>>  }
>>
>> The user has to take care of the mapping, which is not very friendly. I have
>> several solutions in mind:
>>
>> 1. Do nothing and address it in the next release with the bean mapping.
>> Parsing the file would then look like this:
>>
>>  CSVFormat<Person> format = CSVFormat.DEFAULT.withType(Person.class);
>>  for (Person person : format.parse(in)) {
>>      persons.add(person);
>>  }
>>
> 
> Does this automatically mean that the file has a header?
> Or is there another way to link columns to Person attributes?
> 
> I don't think this should be the only way of handling named columns;
> it's not always convenient to create a type.

I agree. Sometimes, the colums are just a part of a class that would
need other parameters not in the columns (but perhaps in a custom
comment of the header, if these parameters are constant throughout the
file. So providing intermediate level API (with mapping already done,
but still access to individual fields) is a must.

> 
>> 2. Add a parser returning a Map instead of a String[]
>>
>>  // declare the header in the format,
>>  // the header line will be parsed automatically
>>  CSVFormat format = CSVFormat.DEFAULT.withHeader();
>>
>>  for (Map<String, String> record : new CSVMapParser(in, format))) {
>>      Person person = new Person();
>>      person.setName(record.get("name"));
>>      person.setEmail(record.get("email"));
>>      person.setPhone(record.get("phone"));
>>      persons.add(person);
>>  }
> 
> That seems OK; one can also just use the column values directly.

+1


Luc

> 
>>
>> 2bis. Have the same CSVParser class returning String[] or Map<String,
>> String> depending on a generic parameter. Not sure it's possible with type
>> erasure.
>>
> 
> It's not possible for two methods to differ only by return parameter
> type, so this can only be done if the method parameters are different
> after type erasure.
> 
>> 3. Have the parser maintain the name->index mapping. The parser read the
>> first line automatically if the format declares a header, and a
>> getColumnIndex() method is exposed.
>>
>>  CSVFormat format = CSVFormat.DEFAULT.withHeader();
>>  CSVParser parser = new CSVParser(in, format);
>>
>>  // parse the records
>>  for (String[] record : parser) {
>>      Person person = new Person();
>>      person.setName(record[parser.getColumnIndex("name")]);
>>      person.setEmail(record[parser.getColumnIndex("email")]);
>>      person.setPhone(record[parser.getColumnIndex("phone")]);
>>      persons.add(person);
>>  }
> 
> Quite awkard to use.
> 
>>
>> What do you think?
>>
>> Emmanuel Bourg
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [csv] Headers

Posted by sebb <se...@gmail.com>.

On 12 March 2012 22:11, Emmanuel Bourg <eb...@apache.org> wrote:
> [csv] is missing some elements to ease the use of headers. I have no clear
> idea on how to address this, here are my thoughts.
>
> Headers are used when the fields are accessed by the column name rather than
> by the index. This provides some flexibility because the input file can be
> slightly modified by reordering the columns or by inserting new columns
> without breaking the existing code.
>
> Using the current API here is how one would work with headers:
>
>  CSVParser parser = new CSVParser(in);
>  Iterator<String[]> it = parser.iterator();
>
>  // read the header
>  String[] header = it.next();
>
>  // build a name to index mapping
>  Map<String, Integer> mapping = new HashMap<>();
>  for (int i = 0; i < header.length; i++) {
>      mapping.put(header[i], i);
>  }
>
>  // parse the records
>  for (String[] record : parser) {
>      Person person = new Person();
>      person.setName(record[mapping.get("name")]);
>      person.setEmail(record[mapping.get("email")]);
>      person.setPhone(record[mapping.get("phone")]);
>      persons.add(person);
>  }
>
> The user has to take care of the mapping, which is not very friendly. I have
> several solutions in mind:
>
> 1. Do nothing and address it in the next release with the bean mapping.
> Parsing the file would then look like this:
>
>  CSVFormat<Person> format = CSVFormat.DEFAULT.withType(Person.class);
>  for (Person person : format.parse(in)) {
>      persons.add(person);
>  }
>

Does this automatically mean that the file has a header?
Or is there another way to link columns to Person attributes?

I don't think this should be the only way of handling named columns;
it's not always convenient to create a type.

> 2. Add a parser returning a Map instead of a String[]
>
>  // declare the header in the format,
>  // the header line will be parsed automatically
>  CSVFormat format = CSVFormat.DEFAULT.withHeader();
>
>  for (Map<String, String> record : new CSVMapParser(in, format))) {
>      Person person = new Person();
>      person.setName(record.get("name"));
>      person.setEmail(record.get("email"));
>      person.setPhone(record.get("phone"));
>      persons.add(person);
>  }

That seems OK; one can also just use the column values directly.

>
> 2bis. Have the same CSVParser class returning String[] or Map<String,
> String> depending on a generic parameter. Not sure it's possible with type
> erasure.
>

It's not possible for two methods to differ only by return parameter
type, so this can only be done if the method parameters are different
after type erasure.

> 3. Have the parser maintain the name->index mapping. The parser read the
> first line automatically if the format declares a header, and a
> getColumnIndex() method is exposed.
>
>  CSVFormat format = CSVFormat.DEFAULT.withHeader();
>  CSVParser parser = new CSVParser(in, format);
>
>  // parse the records
>  for (String[] record : parser) {
>      Person person = new Person();
>      person.setName(record[parser.getColumnIndex("name")]);
>      person.setEmail(record[parser.getColumnIndex("email")]);
>      person.setPhone(record[parser.getColumnIndex("phone")]);
>      persons.add(person);
>  }

Quite awkard to use.

>
> What do you think?
>
> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [csv] Headers

Posted by Emmanuel Bourg <eb...@apache.org>.

Le 15/03/2012 08:55, Benedikt Ritter a écrit :

> I'm not sure if I understand the approach completely. The Header can
> not be accessed as a CSVRecord, right? CSVRecords know the header
> values through get(string). What happens if the format does not
> support a header? UnsupportedOperationException?

Yes, or IllegalStateException.


> If I got you right, we could use getHeaders() to know, which header
> values are available.

The actual header would be returned by parser.getHeader().


> Maybe it would be useful to have the record implement iterable as well.

Or have a method return the array of values if you want to iterate over it.

Emmanuel Bourg

Re: [csv] Headers

Posted by Benedikt Ritter <be...@googlemail.com>.

Am 15. März 2012 01:58 schrieb Emmanuel Bourg <eb...@apache.org>:
> There is another alternative, we might replace the records returned as a
> String[] by a CSVRecord class able to access the fields by id or by name.
> This would be similar to a JDBC resultset (except for the looping logic)
>

sounds good. This discussion showed, that a record is more than a
String array. So having a specialized class is a good idea.

> This avoids the duplication of the parser, which might still be generified
> later to support custom beans.
>
> The example becomes:
>
>  CSVFormat format = CSVFormat.DEFAULT.withHeader();
>
>  for (CSVRecord record : format.parse(in)) {
>
>      Person person = new Person();
>      person.setName(record.get("name"));
>      person.setEmail(record.get("email"));
>      person.setPhone(record.get("phone"));
>      persons.add(person);
>  }
>
> The record is not a Map to keep it simple, it only exposes 3 methods:
> get(int), get(String) and size()
>

I'm not sure if I understand the approach completely. The Header can
not be accessed as a CSVRecord, right? CSVRecords know the header
values through get(string). What happens if the format does not
support a header? UnsupportedOperationException?
If I got you right, we could use getHeaders() to know, which header
values are available.

Maybe it would be useful to have the record implement iterable as well.

Benedikt

> Emmanuel Bourg
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [csv] Headers

Posted by Emmanuel Bourg <eb...@apache.org>.

There is another alternative, we might replace the records returned as a 
String[] by a CSVRecord class able to access the fields by id or by 
name. This would be similar to a JDBC resultset (except for the looping 
logic)

This avoids the duplication of the parser, which might still be 
generified later to support custom beans.

The example becomes:

   CSVFormat format = CSVFormat.DEFAULT.withHeader();

   for (CSVRecord record : format.parse(in)) {
       Person person = new Person();
       person.setName(record.get("name"));
       person.setEmail(record.get("email"));
       person.setPhone(record.get("phone"));
       persons.add(person);
   }

The record is not a Map to keep it simple, it only exposes 3 methods: 
get(int), get(String) and size()

Emmanuel Bourg