You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by amir bahmanyari <am...@yahoo.com> on 2016/09/28 19:46:13 UTC

Appropriate Spark Cluster Mode for running Beam SparkRunner apps

Hi Colleagues,I am in progress setting up Spark Cluster for running Beam SparkRunner apps.The objective is to collect performance matrices via bench-marking techniques.The Spark docs suggest the following Clustering types.Which one is the most appropriate type when it comes to performance testing Beam SparkRunner?Thanks+regardsAmir

Re: Appropriate Spark Cluster Mode for running Beam SparkRunner apps

Posted by Amit Sela <am...@gmail.com>.

I know of people running Standalone in production as well (not sure how it
scales), and to the best of my knowledge it also has a cluster mode. I only
said that I *personally* have experience with Standalone fort testing, and
YARN for production.

The main issue with the SparkRunner's README is that it relates to batch
over HDFS which is a WIP in Beam (not specific a runner). The runner will
support batch in full once https://issues.apache.org/jira/browse/BEAM-259 is
pushed.

As for streaming input:
Kafka is supported via Spark's custom KafkaIO, but there is a WIP on
replacing it with a fully Beam-compliant KafkaIO.
Event-time windows, triggers and accumulation modes - I'm working on
something but Spark 1.x on it's own does not support those so the behaviour
is currently Spark native behaviour which could be described as
"processing-time trigger, with discarding panes".

Hope this helps,
Amit

On Wed, Sep 28, 2016 at 11:47 PM amir bahmanyari <am...@yahoo.com>
wrote:

> Sure...Thanks Amit.
> So basically: Standard for testing & YARN for Production?
> Yes, README for SparkRunner is way outdated. the FlinkRunner version is
> very informative.
> In the meanwhile the README is in progress, could you give me some helpful
> details so I do the perf testing in the right context pls?
> Have a great day
> Amir-
> ------------------------------
> *From:* Amit Sela <am...@gmail.com>
> *To:* amir bahmanyari <am...@yahoo.com>; "
> user@beam.incubator.apache.org" <us...@beam.incubator.apache.org>
> *Sent:* Wednesday, September 28, 2016 1:13 PM
> *Subject:* Re: Appropriate Spark Cluster Mode for running Beam
> SparkRunner apps
>
> Hi Amir,
>
> The Beam SparkRunner basically translates the Beam pipeline into a Spark
> job, so it's not much different then a common Spark job.
> I can personally say that I'm running both in Standalone (mostly testing)
> and YARN. I don't have much experience with Spark over Mesos in general
> though.
>
> As for running over YARN, you can simply use the "spark-submit" script
> supplied with the Spark installation, and the runner will pick-up the
> necessary (Spark) configurations, such as "--master yarn".
>
> The SparkRunner README is not up-to-date right now, and I will patch it up
> soon, I'm also working on some improvements and new features for the runner
> as well, so stay tuned!
>
> Thanks,
> Amit
>
> On Wed, Sep 28, 2016 at 10:46 PM amir bahmanyari <am...@yahoo.com>
> wrote:
>
> Hi Colleagues,
> I am in progress setting up Spark Cluster for running Beam SparkRunner
> apps.
> The objective is to collect performance matrices via bench-marking
> techniques.
> The Spark docs suggest the following Clustering types.
> Which one is the most appropriate type when it comes to performance
> testing Beam SparkRunner?
> Thanks+regards
> Amir
>
>
> [image: Inline image]
>
>
>
>

Re: Appropriate Spark Cluster Mode for running Beam SparkRunner apps

Posted by amir bahmanyari <am...@yahoo.com>.

Sure...Thanks Amit.So basically: Standard for testing & YARN for Production?Yes, README for SparkRunner is way outdated. the FlinkRunner version is very informative.In the meanwhile the README is in progress, could you give me some helpful details so I do the perf testing in the right context pls?Have a great dayAmir-

      From: Amit Sela <am...@gmail.com>
 To: amir bahmanyari <am...@yahoo.com>; "user@beam.incubator.apache.org" <us...@beam.incubator.apache.org> 
 Sent: Wednesday, September 28, 2016 1:13 PM
 Subject: Re: Appropriate Spark Cluster Mode for running Beam SparkRunner apps

Hi Amir, 
The Beam SparkRunner basically translates the Beam pipeline into a Spark job, so it's not much different then a common Spark job.I can personally say that I'm running both in Standalone (mostly testing) and YARN. I don't have much experience with Spark over Mesos in general though.
As for running over YARN, you can simply use the "spark-submit" script supplied with the Spark installation, and the runner will pick-up the necessary (Spark) configurations, such as "--master yarn".
The SparkRunner README is not up-to-date right now, and I will patch it up soon, I'm also working on some improvements and new features for the runner as well, so stay tuned!
Thanks,Amit 
On Wed, Sep 28, 2016 at 10:46 PM amir bahmanyari <am...@yahoo.com> wrote:

Hi Colleagues,I am in progress setting up Spark Cluster for running Beam SparkRunner apps.The objective is to collect performance matrices via bench-marking techniques.The Spark docs suggest the following Clustering types.Which one is the most appropriate type when it comes to performance testing Beam SparkRunner?Thanks+regardsAmir

Re: Appropriate Spark Cluster Mode for running Beam SparkRunner apps

Posted by Amit Sela <am...@gmail.com>.

Hi Amir,

The Beam SparkRunner basically translates the Beam pipeline into a Spark
job, so it's not much different then a common Spark job.
I can personally say that I'm running both in Standalone (mostly testing)
and YARN. I don't have much experience with Spark over Mesos in general
though.

As for running over YARN, you can simply use the "spark-submit" script
supplied with the Spark installation, and the runner will pick-up the
necessary (Spark) configurations, such as "--master yarn".

The SparkRunner README is not up-to-date right now, and I will patch it up
soon, I'm also working on some improvements and new features for the runner
as well, so stay tuned!

Thanks,
Amit

On Wed, Sep 28, 2016 at 10:46 PM amir bahmanyari <am...@yahoo.com>
wrote:

> Hi Colleagues,
> I am in progress setting up Spark Cluster for running Beam SparkRunner
> apps.
> The objective is to collect performance matrices via bench-marking
> techniques.
> The Spark docs suggest the following Clustering types.
> Which one is the most appropriate type when it comes to performance
> testing Beam SparkRunner?
> Thanks+regards
> Amir
>
>
> [image: Inline image]
>
>