You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@samza.apache.org by Nick Quinn <nq...@objectivity.com> on 2016/08/03 18:04:16 UTC

Kafka Streams

There has been a lot of talk around town about Confluent's new stream processing engine, Kafka Streams. We are currently using Samza and I want to get some feedback for myself and other developers on this group list about the differences and possible advantages to using Samza when compared to Kafka Streams. I would appreciate any and all feedback.

Thanks!
Nick Quinn

Re: Kafka Streams

Posted by Yi Pan <ni...@gmail.com>.

Hi, Nick,

IMHO, there are following points that differs Samza from KStreams:

- Stability of local state management. Samza supports durable local state
and host-affinity for faster state recovery. 0.10.1 makes further progress
in host-affinity to allow a) continuous check-pointing of state store; b)
minimum movement of state stores when container number changes.
- Support of non-Kafka sources and destinations. Samza allows non-Kafka
sources and destinations to be added in natively. We already have
Elasticsearch and HDFS producer supported in open source. HDFS consumer is
coming up soon and LinkedIn has successfully integrate Samza w/ Databus,
Kinesis, and DynamoDB Streams internally.
- Unification w/ batch jobs. We are actively working on Samza on HDFS
project in LinkedIn and have successfully done some proto-typing test that
allows running a long-running Samza job on secured Hadoop cluster.
Development of HDFSSystemConsumer is underway. The goal is to have the
*same* Samza job running in both batch and stream world, simply by
switching the data sources/destinations.
- Async I/O model. We have built in async processing model as an option in
Samza, which will significantly improve the performance of jobs
bottlenecked on remote I/O. It is in the open source trunk and will be part
of 0.11 release. As far as we know, no other stream processing platform
supports the async processing model natively yet.
- Operational support for run-as-a-service. Samza has a long operation
history in LinkedIn and we run it as a hosted service. To better support
auto-service and multi-tenant, we have recently added Samza REST service to
allow admin command via REST calls and disk-quote that governs the disk
usage of jobs in a cluster. Disk quote is in 0.10.1 and REST APIs are
deployed in LinkedIn and in-review in open source (SAMZA-865).

Please see Kartik's response to the exact same question in May for some of
the points above as well:
http://mail-archives.apache.org/mod_mbox/samza-dev/201605.mbox/%3CCACsAj_XZZBohSz7Cf9%3DLO5MDOn2vEzfMrDF6Te%3DwrpeMEab1dQ%40mail.gmail.com%3E

Hope that helps. Regards!

-Yi

On Wed, Aug 3, 2016 at 11:04 AM, Nick Quinn <nq...@objectivity.com> wrote:

> There has been a lot of talk around town about Confluent's new stream
> processing engine, Kafka Streams. We are currently using Samza and I want
> to get some feedback for myself and other developers on this group list
> about the differences and possible advantages to using Samza when compared
> to Kafka Streams. I would appreciate any and all feedback.
>
> Thanks!
> Nick Quinn
>