You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by 황보동규 <hw...@gmail.com> on 2016/12/26 06:07:23 UTC

What is difference between Kafka streaming and Samza?

Hi there! 

I’m newbie on Kafka. 
I have an interest in streaming service, especially Kafka streaming. But I have no Idea what’s the difference between Kafka streaming and samza.
Both has similiar architecture and functionality, I think. 
What’s the main difference? What’s the pros and cons? It’s really helpful with your kind explanation. It’s also welcome to give me helpful documentation relate of my question.

Thanks,
Dongkyu 

Re: What is difference between Kafka streaming and Samza?

Posted by Ofir Sharony <of...@myheritage.com>.
Here are some differences between the two:

   - KafkaStreams is a library, whereas Samza is a framework, which makes
   the learning curve of KafkaStreams a bit easier.
   - Sources - KafkaStreams works with Kafka alone, while Samza can also be
   configured with Kinesis, ElasticSearch, HDFS and others.
   - Deployment - Samza works closely with Yarn (although not a must),
   whereas KafkaStreams can be run and deployed as a simple Java library,
   where running more instances of it will cause and automatic load balance
   between the processes. Cluster is not required in KafkaStreams.
   - StateManagement - both have local state, In KafkaStreams there are
   common statefull operations (e.g join, aggregation, map) that are made
   simpler, you just call the function and the state is managed behind the
   scenes, needless to be defined explicitly
   - Configuration - In Samza there's a configuration file, whereas in KS
   it's all inside your class.
   - Code unification with batch jobs - Samza code can be written once for
   both ongoing stream processing and batch jobs, by allowing running samza
   jobs on Hadoop cluster
   - Samza supports host-affinity, allocating the same machine (that has
   the local state stored) after a job restarts, preventing startup latency
   loading the state
   - Samza supports Async I/O model - significantly improve the performance
   of jobs bottlenecked on remote I/O.
   - Samza has Rest API to query its processing streams, start & stop jobs
   - Samza is a bit more matured (KafkaStreams is the new kid in the block)


*Ofir Sharony*
BackEnd Tech Lead

Mobile: +972-54-7560277 | ofir.sharony@myheritage.com | www.myheritage.com
MyHeritage Ltd., 3 Ariel Sharon St., Or Yehuda 60250, Israel

<http://www.myheritage.com/>

<https://www.facebook.com/myheritage>
<https://twitter.com/myheritage>         <http://blog.myheritage.com/>
    <https://www.youtube.com/user/MyHeritageLtd>


On Mon, Dec 26, 2016 at 8:07 AM, 황보동규 <hw...@gmail.com> wrote:

> Hi there!
>
> I’m newbie on Kafka.
> I have an interest in streaming service, especially Kafka streaming. But I
> have no Idea what’s the difference between Kafka streaming and samza.
> Both has similiar architecture and functionality, I think.
> What’s the main difference? What’s the pros and cons? It’s really helpful
> with your kind explanation. It’s also welcome to give me helpful
> documentation relate of my question.
>
> Thanks,
> Dongkyu