You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Pascoe Scholle <pa...@gmail.com> on 2019/08/29 11:26:44 UTC

Load balancing and executing python processes

Hi,

I have a question regarding the workings of the load balancer and running
prcesses which are outside the jvm.

We have a python commandline tool which is used for processing big data.
The tool is highly optimized and is able to instantly load data into ram.
Some data can be as large as 20 Gb.

When sending multiple jobs each triggering their own python process, I do
not want this to occur on the same machine, is there any way we can use the
load balancer to ensure that all jobs are evenly distributed or possibly
restrict certain jobs to a node running on a desktop we know has enough
memory available.

For example I have two machines one which has 32 Gb of ram and the second
which has 64 Gb, one ignite node per machine. Three jobs are sent using
ComputeTaskContinuousMapper. The 32Gb  machine received two tasks and
obviously froze up.  Having some way of ensuring that two jobs are sent to
the machine with more memory would be really helpful.

Thanks and kind regards,
Pascoe

Re: Load balancing and executing python processes

Posted by Pascoe Scholle <pa...@gmail.com>.
Just some more info.

I use a ContinousMapper within a task that sends off these jobs. Can this
maybe be the reason?

On Fri, 30 Aug 2019 at 21:49, Pascoe Scholle <pa...@gmail.com>
wrote:

> Hi Illya,
>
> So I have exciting news. Like you say we have to configure some of the
> nodes manually and I have done this using the Collision spi and it works
> great!
>
> I have nodes configured for more powerful pc's that also have more ram
> available and limit the number of jobs that can run in parallel, really
> awesome feature.
>
> I now have another issue.
>
> Say I have node A and B both with attribute "compute". So all jobs
> deployed on the cluster are only executed on these two nodes.
>
> Node A runs on a weaker machine so I limit the number of parallel jobs to
> 2.
>
> Node B runs on the stronger machine with 4 parallel jobs running at the
> same time.
>
> Say I send off 100 jobs to the cluster and these are evenly distributed
> between these two nodes. Node B will obviously finish its queue much faster.
>
> So I tried to implement a JobStealingCollision spi, but its not stealing
> any jobs from Node A.
>
> Here is some code to show the configuration, can you point out what is
> missing?
>
> This is the job stealer node that has job stealing enabled:
>
> """"""""""""""""""""""""""""""""""""""""""""""""""""""
>
>   val jobStealer = new JobStealingCollisionSpi();
>   jobStealer.setStealingEnabled(true);
>
>   jobStealer.setStealingAttributes(Map("compute.node" -> "true"));
>
>   /* set the number of parallel jobs allowed */
>   jobStealer.setActiveJobsThreshold(10);
>
>   // Configure stealing attempts number.
>   jobStealer.setMaximumStealingAttempts(10);
>
>   val failoverSpi = new JobStealingFailoverSpi();
>
>   val cfg = new IgniteConfiguration();
>
>   val cacheConfig = new CacheConfiguration("myCache");
>
>   /* partition the cache over the entire cluster */
>   cacheConfig.setCacheMode(CacheMode.PARTITIONED);
>
>   cfg.setCacheConfiguration(cacheConfig);
>
>   cfg.setMetricsUpdateFrequency(10);
>
>   cfg.setFailoverSpi(failoverSpi);
>
>   cfg.setPeerClassLoadingEnabled(true);
>
>   cfg.setCollisionSpi(jobStealer);
>
>   /* jobs are sent to this node for computation */
>   cfg.setUserAttributes(Map("compute.node" -> true));
>
> """"""""""""""""""""""""""""""""""""""""""""
>
> Next the configuration for Node B:
> job stealing is disabled, so that other nodes can take jobs from this node
>
> """""""""""""""""""""""""""""""
>   val jobStealer = new JobStealingCollisionSpi();
>
>   jobStealer.setStealingEnabled(false);
>
>   /* set the number of parallel jobs allowed */
>   jobStealer.setActiveJobsThreshold(2);
>
>   jobStealer.setStealingAttributes(Map("compute.node" -> "true"));
>
>   val failoverSpi = new JobStealingFailoverSpi();
>
>   val cacheConfig = new CacheConfiguration("myCache");
>
>   cacheConfig.setCacheMode(CacheMode.PARTITIONED);
>
>   cfg.setCacheConfiguration(cacheConfig);
>
>   cfg.setPeerClassLoadingEnabled(true);
>
>   cfg.setFailoverSpi(failoverSpi);
>
>   cfg.setCollisionSpi(jobStealer);
>
>   cfg.setUserAttributes(Map("compute.node" -> true));
>
> """"""""""""""""""""""""""""""""""""""""""
>
> I also have a third node reserved for running services with a different
> attribute.
>
> Any pointers on why the JobStealer node is not stealing any jobs?
>
> Thanks!
>
> On Fri, 30 Aug 2019 at 15:02, Ilya Kasnacheev <il...@gmail.com>
> wrote:
>
>> Hello!
>>
>> I think you will have to do it semi-manually: on every node you should
>> know how many resources are available, don't allow jobs to overconsume.
>>
>> You could try to use LoadBalancingSpi but as far as I have heard, it is
>> not trivial.
>>
>> Ignite it not a scheduler, so we don't have any built-ins for resource
>> control.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> чт, 29 авг. 2019 г. в 14:27, Pascoe Scholle <pascoescholletrash@gmail.com
>> >:
>>
>>> Hi,
>>>
>>> I have a question regarding the workings of the load balancer and
>>> running prcesses which are outside the jvm.
>>>
>>> We have a python commandline tool which is used for processing big data.
>>> The tool is highly optimized and is able to instantly load data into ram.
>>> Some data can be as large as 20 Gb.
>>>
>>> When sending multiple jobs each triggering their own python process, I
>>> do not want this to occur on the same machine, is there any way we can use
>>> the load balancer to ensure that all jobs are evenly distributed or
>>> possibly restrict certain jobs to a node running on a desktop we know has
>>> enough memory available.
>>>
>>> For example I have two machines one which has 32 Gb of ram and the
>>> second which has 64 Gb, one ignite node per machine. Three jobs are sent
>>> using ComputeTaskContinuousMapper. The 32Gb  machine received two tasks and
>>> obviously froze up.  Having some way of ensuring that two jobs are sent to
>>> the machine with more memory would be really helpful.
>>>
>>> Thanks and kind regards,
>>> Pascoe
>>>
>>

Re: Load balancing and executing python processes

Posted by Pascoe Scholle <pa...@gmail.com>.
Hi Illya,

So I have exciting news. Like you say we have to configure some of the
nodes manually and I have done this using the Collision spi and it works
great!

I have nodes configured for more powerful pc's that also have more ram
available and limit the number of jobs that can run in parallel, really
awesome feature.

I now have another issue.

Say I have node A and B both with attribute "compute". So all jobs deployed
on the cluster are only executed on these two nodes.

Node A runs on a weaker machine so I limit the number of parallel jobs to 2.

Node B runs on the stronger machine with 4 parallel jobs running at the
same time.

Say I send off 100 jobs to the cluster and these are evenly distributed
between these two nodes. Node B will obviously finish its queue much faster.

So I tried to implement a JobStealingCollision spi, but its not stealing
any jobs from Node A.

Here is some code to show the configuration, can you point out what is
missing?

This is the job stealer node that has job stealing enabled:

""""""""""""""""""""""""""""""""""""""""""""""""""""""

  val jobStealer = new JobStealingCollisionSpi();
  jobStealer.setStealingEnabled(true);

  jobStealer.setStealingAttributes(Map("compute.node" -> "true"));

  /* set the number of parallel jobs allowed */
  jobStealer.setActiveJobsThreshold(10);

  // Configure stealing attempts number.
  jobStealer.setMaximumStealingAttempts(10);

  val failoverSpi = new JobStealingFailoverSpi();

  val cfg = new IgniteConfiguration();

  val cacheConfig = new CacheConfiguration("myCache");

  /* partition the cache over the entire cluster */
  cacheConfig.setCacheMode(CacheMode.PARTITIONED);

  cfg.setCacheConfiguration(cacheConfig);

  cfg.setMetricsUpdateFrequency(10);

  cfg.setFailoverSpi(failoverSpi);

  cfg.setPeerClassLoadingEnabled(true);

  cfg.setCollisionSpi(jobStealer);

  /* jobs are sent to this node for computation */
  cfg.setUserAttributes(Map("compute.node" -> true));

""""""""""""""""""""""""""""""""""""""""""""

Next the configuration for Node B:
job stealing is disabled, so that other nodes can take jobs from this node

"""""""""""""""""""""""""""""""
  val jobStealer = new JobStealingCollisionSpi();

  jobStealer.setStealingEnabled(false);

  /* set the number of parallel jobs allowed */
  jobStealer.setActiveJobsThreshold(2);

  jobStealer.setStealingAttributes(Map("compute.node" -> "true"));

  val failoverSpi = new JobStealingFailoverSpi();

  val cacheConfig = new CacheConfiguration("myCache");

  cacheConfig.setCacheMode(CacheMode.PARTITIONED);

  cfg.setCacheConfiguration(cacheConfig);

  cfg.setPeerClassLoadingEnabled(true);

  cfg.setFailoverSpi(failoverSpi);

  cfg.setCollisionSpi(jobStealer);

  cfg.setUserAttributes(Map("compute.node" -> true));

""""""""""""""""""""""""""""""""""""""""""

I also have a third node reserved for running services with a different
attribute.

Any pointers on why the JobStealer node is not stealing any jobs?

Thanks!

On Fri, 30 Aug 2019 at 15:02, Ilya Kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> I think you will have to do it semi-manually: on every node you should
> know how many resources are available, don't allow jobs to overconsume.
>
> You could try to use LoadBalancingSpi but as far as I have heard, it is
> not trivial.
>
> Ignite it not a scheduler, so we don't have any built-ins for resource
> control.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 29 авг. 2019 г. в 14:27, Pascoe Scholle <pascoescholletrash@gmail.com
> >:
>
>> Hi,
>>
>> I have a question regarding the workings of the load balancer and running
>> prcesses which are outside the jvm.
>>
>> We have a python commandline tool which is used for processing big data.
>> The tool is highly optimized and is able to instantly load data into ram.
>> Some data can be as large as 20 Gb.
>>
>> When sending multiple jobs each triggering their own python process, I do
>> not want this to occur on the same machine, is there any way we can use the
>> load balancer to ensure that all jobs are evenly distributed or possibly
>> restrict certain jobs to a node running on a desktop we know has enough
>> memory available.
>>
>> For example I have two machines one which has 32 Gb of ram and the second
>> which has 64 Gb, one ignite node per machine. Three jobs are sent using
>> ComputeTaskContinuousMapper. The 32Gb  machine received two tasks and
>> obviously froze up.  Having some way of ensuring that two jobs are sent to
>> the machine with more memory would be really helpful.
>>
>> Thanks and kind regards,
>> Pascoe
>>
>

Re: Load balancing and executing python processes

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I think you will have to do it semi-manually: on every node you should know
how many resources are available, don't allow jobs to overconsume.

You could try to use LoadBalancingSpi but as far as I have heard, it is not
trivial.

Ignite it not a scheduler, so we don't have any built-ins for resource
control.

Regards,
-- 
Ilya Kasnacheev


чт, 29 авг. 2019 г. в 14:27, Pascoe Scholle <pa...@gmail.com>:

> Hi,
>
> I have a question regarding the workings of the load balancer and running
> prcesses which are outside the jvm.
>
> We have a python commandline tool which is used for processing big data.
> The tool is highly optimized and is able to instantly load data into ram.
> Some data can be as large as 20 Gb.
>
> When sending multiple jobs each triggering their own python process, I do
> not want this to occur on the same machine, is there any way we can use the
> load balancer to ensure that all jobs are evenly distributed or possibly
> restrict certain jobs to a node running on a desktop we know has enough
> memory available.
>
> For example I have two machines one which has 32 Gb of ram and the second
> which has 64 Gb, one ignite node per machine. Three jobs are sent using
> ComputeTaskContinuousMapper. The 32Gb  machine received two tasks and
> obviously froze up.  Having some way of ensuring that two jobs are sent to
> the machine with more memory would be really helpful.
>
> Thanks and kind regards,
> Pascoe
>