You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@heron.apache.org by GitBox <gi...@apache.org> on 2019/03/19 16:45:13 UTC

[GitHub] [incubator-heron] simingweng edited a comment on issue #3198: Feature/create kafka spout

simingweng edited a comment on issue #3198: Feature/create kafka spout
URL: https://github.com/apache/incubator-heron/pull/3198#issuecomment-474463341
 
 
   > Hi, thanks for this PR, happy to finally see a Kafka Spout implementation in Heron. I am planning on using this once it is merged, but I have a question. One major difference between this implementation and the Storm one is that Storm's spout allows emitting to different streams, using the `org.apache.storm.kafka.spout.RecordTranslator` interface. This implementation is missing this particular functionality, which is quite useful.
   > 
   > Is there a reason for not keeping this functionality in this Kafka Spout implementation? Or is there another way to achieve similar functionality (other than creating a map-function like bolt for this purpose)? It's really useful for sending data from different topics to different downstream bolts.
   
   very good question. I actually started with a "one-record-to-many-tuple" implementation, then I gave it a deep thought when I was implementing the `ATLEAST_ONCE` delivery guarantee. Allowing "one-record-to-many-tuple" will significantly complicate the algorithm to track acknowledgement, because then we have to keep tracking the mapping relationship between a single Kafka record offset to multiple message IDs.
   
   And then we also face a design choice whether the KafkaSpout itself should decide the uniqueness of a set of Message IDs coming from the same ConsumerRecord, or we should open the choice up to the developer?
   
   So, a neater choice is to use multiple KafkaSpout, each dedicated to an output stream.
   
   But, I do agree "one-record-to-many-tuple" is pretty useful and cost effective in terms of resource consumption. I have no obligation to put it back in, but then it becomes the developer's responsibility to make sure avoid emitting multiple tuples out of one ConsumerRecord ONLY in `ATLEAST_ONCE` mode, at least for this version of KafkaSpout before we introduce a more complicated ack/fail tracking mechanism.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services