You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Ankit Malhotra <am...@appnexus.com> on 2017/01/30 22:50:34 UTC

Hardware configurations for samza clusters

Hi - I am just curious if people can share hardware configurations on which you have been running Samza? We are evaluating samza for a streaming join use case which makes heavy use of RocksDB where the store spills to disk for most joins. Specifically, how many cores/memory/SSDs (types of SSDs)/RAID configs etc.

Thanks
Ankit

Re: Hardware configurations for samza clusters

Posted by Jagadish Venkatraman <ja...@gmail.com>.
When using Samza to process streaming data (kafka/databus), we deploy to
Yarn clusters dedicated to Samza workloads. The configurations of machines
in this cluster are roughly similar to what I provided.

When using Samza to process batch data (files on hadoop
<https://reviews.apache.org/r/52570/>), we deploy to our hadoop clusters
that are shared with other M-R workloads. I believe these clusters use
spinning disks.

For the future, We plan to explore trade-offs in storage-costs versus
performance and will continue to share what we learn with the community.

Thanks,
Jagadish


On Tue, Jan 31, 2017 at 1:38 PM, Ankit Malhotra <am...@appnexus.com>
wrote:

> Hi Jagadish,
>
> Thanks for your reply. Is it safe to assume that you are running similar
> machines in production YARN clusters where only SAMZA workloads run?
>
> Ankit
>
> > On Jan 31, 2017, at 3:49 PM, Jagadish Venkatraman <
> jagadish1989@gmail.com> wrote:
> >
> > Hi Ankit,
> >
> > We have benchmarked Samza on the following hardware configuration:
> >
> >   - Processor: Intel Xeon 2.67 GHz processor (with 24 cores)
> >   - 48GB of RAM
> >   - 1Gbps Ethernet
> >   - SSD: 1.65TB Fusion-IO SSD
> >
> > Please check out the perf numbers and the methodology here:
> > https://engineering.linkedin.com/performance/benchmarking-
> apache-samza-12-million-messages-second-single-node
> >
> > Thanks,
>
>


-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Re: Hardware configurations for samza clusters

Posted by Ankit Malhotra <am...@appnexus.com>.
Hi Jagadish,

Thanks for your reply. Is it safe to assume that you are running similar machines in production YARN clusters where only SAMZA workloads run?

Ankit

> On Jan 31, 2017, at 3:49 PM, Jagadish Venkatraman <ja...@gmail.com> wrote:
> 
> Hi Ankit,
> 
> We have benchmarked Samza on the following hardware configuration:
> 
>   - Processor: Intel Xeon 2.67 GHz processor (with 24 cores)
>   - 48GB of RAM
>   - 1Gbps Ethernet
>   - SSD: 1.65TB Fusion-IO SSD
> 
> Please check out the perf numbers and the methodology here:
> https://engineering.linkedin.com/performance/benchmarking-apache-samza-12-million-messages-second-single-node
> 
> Thanks,


Re: Hardware configurations for samza clusters

Posted by Jagadish Venkatraman <ja...@gmail.com>.
Hi Ankit,

We have benchmarked Samza on the following hardware configuration:

   - Processor: Intel Xeon 2.67 GHz processor (with 24 cores)
   - 48GB of RAM
   - 1Gbps Ethernet
   - SSD: 1.65TB Fusion-IO SSD

Please check out the perf numbers and the methodology here:
https://engineering.linkedin.com/performance/benchmarking-apache-samza-12-million-messages-second-single-node

Thanks,