You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by GitBox <gi...@apache.org> on 2019/07/10 21:27:08 UTC

[GitHub] [metron] mmiklavc commented on issue #1341: METRON-614: Eliminate use of the default Charset

mmiklavc commented on issue #1341: METRON-614: Eliminate use of the default Charset
URL: https://github.com/apache/metron/pull/1341#issuecomment-510235543
 
 
   > > I think setting a default to UTF-8 in the parsers and documenting it would be the way to go. Provide a per-sensor config option, e.g. inputDataCharset
   > 
   > What is the status of this? I assume the suggestion here to provide additional configuration is a feature enhancement request and not something we would tackle on this PR. Is that right @mmiklavc?
   > 
   > What do we need to do to get this PR merged?
   
   I thought this could be handled pretty trivially here. What if we modified this ever so slightly to include a default charset for the parsers that can be overridden as needed? Here's an example.
   
   ```
   # MessageParser interface
   
   public interface MessageParser<T> extends Configurable {
   ...
       default Charset getReadCharset() {
           return StandardCharsets.UTF_8;
       }
   ...
   }
   
   // then we can do this for the default case
   
   # CSVParser
   
   public class CSVParser extends BasicParser {
   ...
     @Override
     public List<JSONObject> parse(byte[] rawMessage) {
       try {
         String msg = new String(rawMessage, getReadCharset());
   ...
   
   // And for implementations that might want something else
   
   public class CSVParser extends BasicParser {
   ...
     private Charset readCharset;
   
     @Override
     public void configure(Map<String, Object> parserConfig) {
       converter = new CSVConverter();
       converter.initialize(parserConfig);
       Object tsFormatObj = parserConfig.get(TIMESTAMP_FORMAT_CONF);
       if(tsFormatObj != null) {
         timestampFormat = new SimpleDateFormat(tsFormatObj.toString());
       }
       if (parserConfig.containsKey(READ_CHARSET)) {
         readCharset = Charset.forName((String) parserConfig.get(READ_CHARSET));
       }
     }
   
     @Override
     public Charset getReadCharset() {
       return readCharset != null ? readCharset : super.getReadCharset();
     }
   ...
   ```
   
   We could even push that config option into the BasicParser if we wanted. The upshot is that this seems like a really trivial change. Rather than opening another PR to address it, can we just add the option with our desired default now and be done with it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services