You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/02 19:45:36 UTC

[GitHub] gianm commented on issue #5584: Decoupling FirehoseFactory and InputRowParser

gianm commented on issue #5584: Decoupling FirehoseFactory and InputRowParser
URL: https://github.com/apache/incubator-druid/issues/5584#issuecomment-410045743
 
 
   Yeah -- StringInputRowParser is a bit of a special snowflake. Maybe doing this feature is the motivation we need to rethink how that all works. To add to the points you raised, there's also this gross code in TransformSpec (which decorates InputRowParsers with expression-based transforms):
   
   ```
       if (parser instanceof StringInputRowParser) {
         // Hack to support the fact that some callers use special methods in StringInputRowParser, such as
         // parse(String) and startFileFromBeginning.
         return (InputRowParser<T>) new TransformingStringInputRowParser(
             parser.getParseSpec(),
             ((StringInputRowParser) parser).getEncoding(),
             this
         );
       } else {
         return new TransformingInputRowParser<>(parser, this);
       }
   ```
   
   And there's also the warty fact that ParseSpec's `makeParser()` method is actually only used by StringInputRowParser (and a couple of extension parsers that work by converting inputs to JSON and then parsing them as strings). Other parsers call `getTimestampSpec` and `getDimensionsSpec`, but they don't call `makeParser`. Some parsers, however, can extract useful things from certain types of ParseSpecs. The way this is done is weird, leading to code like this in the Avro extension to yank out an optional FlattenSpec:
   
   ```
       final JSONPathSpec flattenSpec;
       if (parseSpec != null && (parseSpec instanceof AvroParseSpec)) {
         flattenSpec = ((AvroParseSpec) parseSpec).getFlattenSpec();
       } else {
         flattenSpec = JSONPathSpec.DEFAULT;
       }
   ```
   
   The only reason StringInputRowParser doesn't have to do something like this is that the method _it_ needs (makeParser) has been granted special status as a member of the ParseSpec interface (even though many ParseSpecs don't implement it, they just throw exceptions).
   
   If we are wanting to be ambitious (I don't see why not!) then I think we can make nicer, clearer interfaces by rethinking how this all works. In a backwards compatible way, since we don't want all of our users to have to rewrite their ingestion specs (which will probably be the tricky part).
   
   @josephglanville what ambition level are you interested in having here? 😄

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org