You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Brandon White <bw...@gmail.com> on 2016/06/09 17:28:34 UTC

Spark ML - Is it safe to schedule two trainings job at the same time or will worker state be corrupted?

For example, say I want to train two Linear Regressions and two GBD Tree
Regressions.

Using different threads, Spark allows you to submit jobs at the same time
(see: http://spark.apache.org/docs/latest/job-scheduling.html). If I
schedule two or more training jobs and they are running at the same time:

1) Is there any risk that static worker variables or worker state could
become corrupted leading to incorrect calculations?
2) Is Spark ML designed for running two or more training jobs at the same
time? Is this something the architects consider during implementation?

Thanks,

Brandon

Re: Spark ML - Is it safe to schedule two trainings job at the same time or will worker state be corrupted?

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

It's supposed to work like this - share SparkContext to share datasets
between threads.

Ad 1. No
Ad 2. Yes

See CrossValidation and similar validations in spark.ml.

Jacek
On 9 Jun 2016 7:29 p.m., "Brandon White" <bw...@gmail.com> wrote:

> For example, say I want to train two Linear Regressions and two GBD Tree
> Regressions.
>
> Using different threads, Spark allows you to submit jobs at the same time
> (see: http://spark.apache.org/docs/latest/job-scheduling.html). If I
> schedule two or more training jobs and they are running at the same time:
>
> 1) Is there any risk that static worker variables or worker state could
> become corrupted leading to incorrect calculations?
> 2) Is Spark ML designed for running two or more training jobs at the same
> time? Is this something the architects consider during implementation?
>
> Thanks,
>
> Brandon
>
>
>
>