You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/02/13 22:20:00 UTC

[GitHub] jihoonson opened a new issue #7071: [PROPOSAL] Support only finiteFirehose for native batch ingestion

jihoonson opened a new issue #7071: [PROPOSAL] Support only finiteFirehose for native batch ingestion
URL: https://github.com/apache/incubator-druid/issues/7071
 
 
   # Motivation
   
   Currently native batch tasks (local and parallel index tasks) support any firehose implementation. However, it isn't very useful when firehose is an infinite one because they don't have any context about stream ingestion.
   
   # Proposed changes
   
   I propose to change the type of `firehose` of `IndexIOConfig` and `ParallelIndexIOConfig` from `FirehoseFactory` to `FiniteFirehoseFactory`. 
   
   # Rationale
   
   `FiniteFirehoseFactory` is designed for any type of batch ingestion. It assumes that input data is finite (and provides an optional hint for parallel indexing). It makes more sense to support only `FiniteFirehoseFactory` for native batch tasks rather than improve them to support any kind of firehoseFactory which may be designed for stream input data.
   
   # Operational impact
   
   There's no change in the task spec because the variable name isn't changed. 
   
   Custom firehoseFactory implementations for native batch tasks need to be updated.
   
   # Future work (optional)
   
   This change effectively makes native batch tasks to support only text file formats by default because all implementations of `FiniteFirehoseFactory` are using `StringInputRowParser`. https://github.com/apache/incubator-druid/issues/5584 should be solved to support various file formats.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org