You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/31 02:29:32 UTC

[GitHub] [druid] FrankChen021 commented on issue #10259: Kafka ingestion fails to parse multiple-line messages in 0.19

FrankChen021 commented on issue #10259:
URL: https://github.com/apache/druid/issues/10259#issuecomment-683518040


   After an attempt to resolve this problem, I found it's a little bit tricky to fix this issue as the way above. 
   
   All the InputSources except Kafka assume each text line of input as a JSON object. Overriding  `intermediateRowIterator ` in JsonReader to parse input text as a whole as above would also break this assumption, which would cause parsing of these input source, such as local text, work incorrectly.
   
   To handle these two different needs, another feasible and easy to fix way is:
   
   1. add a boolean property, called as `lineSplittable` for example, to `InputEntity` to indicate whether the text should be treated as line by line or as a whole. the default value is true, meaning to be treated one by one because only Kafka records need to be treated as a whole.
   
   2. `ByteEntity`, which inherits from `InputEntity` is used by Kafka input source `RecordSupplierInputSource`, provides a ctor to allow higher level code to pass value to `lineSplittable`
   
   3. createReader of JsonInputFormat checks this property on InputEntity, if it's true, create an instance of current JsonReader class, if not, create a new JsonReader to read the input text as a whole.
   
   4. `RecordSupplierInputSource` passes a `false` value to `ByteEntity` to indicate there is no need to treat the input text line by line instead of a whole part for json.
   
   @jihoonson Do you have any other better ideas ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org