You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Livni, Dana" <da...@intel.com> on 2014/02/21 04:30:33 UTC

multi-concurrent proccessing

Hi,

Wanted to know what is the best practice to a certain scenario we have.

we have a lot of batch processing on data stored in HBASE cluster. they are independent and need to run in parallel.
The current implementation we are using is running multiple independent process (each of them is multi treaded itself).
Each process raise one spark context and all it child thread are using it.
This creates a situation in which we raise around 150 concurrent spark context (each is used by 5-10 threads each preform about 4 map tasks).

It seems this implementation is not very efficient both in memory meaner (mainly for our batch server) and processing time on the cluster.

Wanted to know what will be the best way to do it?
We thought maybe create a service that will raise only one spark context and all the process and threads will send request to it.
Does anyone have insights if this will be better solution or maybe have other ideas.
Thanks in advanced
Dana

---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Re: multi-concurrent proccessing

Posted by Mayur Rustagi <ma...@gmail.com>.

You didnt specify what is the key blocker. Why is processing time
underutilized? Are your threads processing the results hence spark jobs are
not being deployed?


Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Thu, Feb 20, 2014 at 7:30 PM, Livni, Dana <da...@intel.com> wrote:

>  Hi,
>
>
>
> Wanted to know what is the best practice to a certain scenario we have.
>
>
>
> we have a lot of batch processing on data stored in HBASE cluster. they
> are independent and need to run in parallel.
>
> The current implementation we are using is running multiple independent
> process (each of them is multi treaded itself).
>
> Each process raise one spark context and all it child thread are using it.
>
> This creates a situation in which we raise around 150 concurrent spark
> context (each is used by 5-10 threads each preform about 4 map tasks).
>
>
>
> It seems this implementation is not very efficient both in memory meaner
> (mainly for our batch server) and processing time on the cluster.
>
>
>
> Wanted to know what will be the best way to do it?
>
> We thought maybe create a service that will raise only one spark context
> and all the process and threads will send request to it.
>
> Does anyone have insights if this will be better solution or maybe have
> other ideas.
>
> Thanks in advanced
>
> Dana
>
>
>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>