You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2018/08/15 18:11:05 UTC

[GitHub] buptljy commented on issue #6541: [FLINK-9964] [table] Add a CSV table format factory

buptljy commented on issue #6541: [FLINK-9964] [table] Add a CSV table format factory
URL: https://github.com/apache/flink/pull/6541#issuecomment-413285602
 
 
   @twalthr 
   I've replied a few coments above and optimize some codes according to your coments.
   I've finished:
   1. Null value configuration.
   2. Schema derivation.
   3. some optimizations.
   
   About the encoding: The encoding for csv data can only be one of elements of com.fasterxml.jackson.core.JsonEncoding, and the jackson reader is able to automatically detect the encoding according to the rules of [rfc4627](http://www.ietf.org/rfc/rfc4627.txt). So we don't need to set the encoding mannually, and we can't allow users to use other encodings that JsonEncoding doesn't support, such as 'latin'.
   
   About the byte array: The byte array logic is weird because of the internal logic of the jackson that I explained in CsvRowSerializationSchema(line: 159). We regard the byte array as string to avoid unnecessary logic because jackson use base64 to deal with byte array(CsvGenerator: line 691), which means our users cannot give their original byte array, otherwise they cannot get original content after serializing or deserializing(see the codes below). Additionally, byte array is regarded binaryNode in jackson, so we cannot convert byte array like what we do with other array. 
   
   ```
   byte[] origin = "123".getBytes();
   CsvSchema schema = CsvSchema.builder()
   		.addColumn("a", STRING).build();
   CsvMapper cm = new CsvMapper();
   JsonNode result = cm.readerFor(JsonNode.class).with(schema).readValue(origin);
   byte[] transformed = result.binaryValue();
   System.out.println(Arrays.equals(transformed, origin)); (expect true, actual false)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services