You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/02 18:50:59 UTC

[GitHub] josephglanville edited a comment on issue #5584: Decoupling FirehoseFactory and InputRowParser

josephglanville edited a comment on issue #5584: Decoupling FirehoseFactory and InputRowParser
URL: https://github.com/apache/incubator-druid/issues/5584#issuecomment-410030450
 
 
   One thing that has become apparent is the `StringInputRowParser` is a bit of a bad citizen.
   It extends `ByteBufferInputRowParser` aka `InputRowParser<ByteBuffer>`.
   
   This is not a problem in and of itself and it's completely reasonable to have a input row parser designed to read UTF-8 encoded strings from `ByteBuffer`.
   
   The issue is that isn't how it's used, it's primarily used via function with signature the `InputRow parse(String)`, the fact this is `String` rather than `ByteBuffer` not only violates the assumption of the interface but it also means that users of `StringInputRowParser` aren't using the `InputRowParser` interface at all, effectively making any code built on `StringInputRowParser` not very generic.
   
   It seems to me that it would make sense for there to be `StringInputRowParser` that is actually `InputRowParser<String>` and `StringDecoderInputRowParser` which inherits from `ByteBufferInputRowParser` reads data from ByteBuffer decoding UTF-8.
   
   I would also move to put add a `reset()` to the `InputRowParser` interface as it's already on `Parser`. Currently there is `startFileFromBeginning()` on `StringInputRowParser` that calls the same function on the inner `Parser`. Seems as this may be useful in general for resetting the state of stateful parsers when opening the next object/input stream.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org