You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Talat Uyarer <tu...@paloaltonetworks.com> on 2021/11/12 07:02:52 UTC

Slow Samza beam Job

Hi,

I am doing some kind of performance testing. I submitted two Beam jobs, one
is running with a Dataflow worker and the one Samza runner.

My Samza deployment is standalone. I have 10 workers for each job. My DAG
is very basic

*Read From Kafka -> BeamSQL filtter -> Write GCS*

However I have the same DAG in one job three times. Jobs are reading from 3
different topics with 280 partitions.

When I compare Samza worker and Dataflow performance for the exact same
Beam job. One samza worker can not process more than *2.5k* messages But
Dataflow worker can process *10K* messages per worker.

Can you help me? Am I missing something? Or Samza Beam jobs are really slow
?

My Samza beam Job settings

>     app.runner.class=org.apache.samza.runtime.LocalApplicationRunner
>     job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
>     job.coordinator.zk.connect=10.64.2.78:2181
>
> task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory
>     task.commit.ms=60000
>     job.default.system=default
>
> systems.default.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
>     systems.default.producer.bootstrap.servers=

 job.systemstreampartition.grouper.factory=org.apache.samza.container.grouper.stream.GroupBySystemStreamPartitionFactory
>
> metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
>     metrics.reporters=jmx


My Samza Runner Params

> --runner=SamzaRunner --samzaExecutionEnvironment=STANDALONE
> --maxSourceParallelism=300  --maxBundleSize=10000 --maxBundleTimeMs=10000
> --systemBufferSize=10000


Thanks in Advance