You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by GitBox <gi...@apache.org> on 2019/07/10 21:27:08 UTC
[GitHub] [metron] mmiklavc commented on issue #1341: METRON-614: Eliminate
use of the default Charset
mmiklavc commented on issue #1341: METRON-614: Eliminate use of the default Charset
URL: https://github.com/apache/metron/pull/1341#issuecomment-510235543
> > I think setting a default to UTF-8 in the parsers and documenting it would be the way to go. Provide a per-sensor config option, e.g. inputDataCharset
>
> What is the status of this? I assume the suggestion here to provide additional configuration is a feature enhancement request and not something we would tackle on this PR. Is that right @mmiklavc?
>
> What do we need to do to get this PR merged?
I thought this could be handled pretty trivially here. What if we modified this ever so slightly to include a default charset for the parsers that can be overridden as needed? Here's an example.
```
# MessageParser interface
public interface MessageParser<T> extends Configurable {
...
default Charset getReadCharset() {
return StandardCharsets.UTF_8;
}
...
}
// then we can do this for the default case
# CSVParser
public class CSVParser extends BasicParser {
...
@Override
public List<JSONObject> parse(byte[] rawMessage) {
try {
String msg = new String(rawMessage, getReadCharset());
...
// And for implementations that might want something else
public class CSVParser extends BasicParser {
...
private Charset readCharset;
@Override
public void configure(Map<String, Object> parserConfig) {
converter = new CSVConverter();
converter.initialize(parserConfig);
Object tsFormatObj = parserConfig.get(TIMESTAMP_FORMAT_CONF);
if(tsFormatObj != null) {
timestampFormat = new SimpleDateFormat(tsFormatObj.toString());
}
if (parserConfig.containsKey(READ_CHARSET)) {
readCharset = Charset.forName((String) parserConfig.get(READ_CHARSET));
}
}
@Override
public Charset getReadCharset() {
return readCharset != null ? readCharset : super.getReadCharset();
}
...
```
We could even push that config option into the BasicParser if we wanted. The upshot is that this seems like a really trivial change. Rather than opening another PR to address it, can we just add the option with our desired default now and be done with it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services