You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 18:39:01 UTC

[GitHub] [beam] kennknowles opened a new issue, #18479: Add Kafka Streams runner

kennknowles opened a new issue, #18479:
URL: https://github.com/apache/beam/issues/18479

   Kafka Streams ([https://kafka.apache.org/documentation/streams](https://kafka.apache.org/documentation/streams)) has more and more features that could make it a viable candidate for a streaming runner. It uses DataFlow-like model.
   
    
   
   Please look at the [Design Document](https://docs.google.com/document/d/1mNqERvvV8oGI_O4tGewH2Kgkq6PQGv3ylmxnaTRBqH8/edit?usp=sharing) and add comments.
   
   Imported from Jira [BEAM-2466](https://issues.apache.org/jira/browse/BEAM-2466). Original Jira may contain additional context.
   Reported by: klorand.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] wilsonwang371 commented on issue #18479: Add Kafka Streams runner

Posted by GitBox <gi...@apache.org>.

wilsonwang371 commented on issue #18479:
URL: https://github.com/apache/beam/issues/18479#issuecomment-1340369856

   > There are several requirements that must be met. Essentially: a) preserving per-partition order of records (i.e. records emitted in order from one distributed producer must not overtake each other when consumed) b) producer must be able to enqueue output records for a specific consumer (e.g. assigning a key of a output record, all records with same key must then be consumed by the same instance of downstream consumer) c) producer must be able to send record to all downstream consumers (i.e. producer must know how many consumers there - possibly - is) d) there must be some kind of support of state commit, either at the end of bundle, during bundle commit (dataflow model), or as a flowing checkpoint barrier (flink model), there must be a way to safely store state in a distributed fault tolerant storage and be able to possibly restore the complete state from that committed state
   > 
   > Having these conditions met I think it should be possible (though quite hard) to implement Beam runner on top of it. Kafka definitely has all four (even without Kafka streams).
   
   Thank you so much for the reply.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] wilsonwang371 commented on issue #18479: Add Kafka Streams runner

Posted by GitBox <gi...@apache.org>.

wilsonwang371 commented on issue #18479:
URL: https://github.com/apache/beam/issues/18479#issuecomment-1338192901

   I am actually very interested in this topic. Generally, it is possible to have beam running on a generic faas system that with MQ support?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] je-ik commented on issue #18479: Add Kafka Streams runner

Posted by GitBox <gi...@apache.org>.

je-ik commented on issue #18479:
URL: https://github.com/apache/beam/issues/18479#issuecomment-1339080375

There are several requirements that must be met. Essentially:
a) preserving per-partition order of records (i.e. records emitted in order from one distributed producer must not overtake each other when consumed)
b) producer must be able to enqueue output records for a specific consumer (e.g. assigning a key of a output record, all records with same key must then be consumed by the same instance of downstream consumer)
c) producer must be able to send record to all downstream consumers (i.e. producer must know how many consumers there - possibly - is)
d) there must be some kind of support of state commit, either at the end of bundle, during bundle commit (dataflow model), or as a flowing checkpoint barrier (flink model), there must be a way to safely store state in a distributed fault tolerant storage and be able to possibly restore the complete state from that committed state

Having these conditions met I think it should be possible (though quite hard) to implement Beam runner on top of it. Kafka definitely has all four (even without Kafka streams).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org