You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Talat Uyarer <tu...@paloaltonetworks.com> on 2021/11/12 07:02:52 UTC
Slow Samza beam Job
Hi,
I am doing some kind of performance testing. I submitted two Beam jobs, one
is running with a Dataflow worker and the one Samza runner.
My Samza deployment is standalone. I have 10 workers for each job. My DAG
is very basic
*Read From Kafka -> BeamSQL filtter -> Write GCS*
However I have the same DAG in one job three times. Jobs are reading from 3
different topics with 280 partitions.
When I compare Samza worker and Dataflow performance for the exact same
Beam job. One samza worker can not process more than *2.5k* messages But
Dataflow worker can process *10K* messages per worker.
Can you help me? Am I missing something? Or Samza Beam jobs are really slow
?
My Samza beam Job settings
> app.runner.class=org.apache.samza.runtime.LocalApplicationRunner
> job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
> job.coordinator.zk.connect=10.64.2.78:2181
>
> task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory
> task.commit.ms=60000
> job.default.system=default
>
> systems.default.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
> systems.default.producer.bootstrap.servers=
job.systemstreampartition.grouper.factory=org.apache.samza.container.grouper.stream.GroupBySystemStreamPartitionFactory
>
> metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
> metrics.reporters=jmx
My Samza Runner Params
> --runner=SamzaRunner --samzaExecutionEnvironment=STANDALONE
> --maxSourceParallelism=300 --maxBundleSize=10000 --maxBundleTimeMs=10000
> --systemBufferSize=10000
Thanks in Advance