You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modperl@perl.apache.org by James Smith <js...@sanger.ac.uk> on 2020/08/05 08:23:46 UTC

RE: Question about deployment of math computing [EXT]

Wesley,

You will have seen my posts elsewhere - we work on large Terra/Peta byte scale datasets {and these aren't a large number of large records but more a very, very large number of small records} so the memory and response times are both large - less so compute in some cases but not others.

The services which use apache/mod_perl work reliably and return data for these - the dancer/starman sometimes fail/hang as there are no backends to serve the requests or those backends timeout requests to the nginx/proxy (but still continue using resources). The team running the backends fail to notice this - because there is no easy to see reporting etc on these boxes.

We do have other services which we have set up which return large amounts of data computed on the fly and the response time for these could be multiple hours - but by carefully streaming the data in apache we can get the data to return. A similar option isn't available in dancer (or wasn't at the time) to handle these sorts of requests and so similar code was impossible.

In most cases starman hasn't really been the answer and apache works sufficiently well. Even where people are using nginx we are often now using some of the alternative apache workers (mpm_event) which seem to be better/more reliable than nginx, and means we don't have to have completely different configuration setups for some of our proxies, static servers and dynamic content servers.

The good thing about Apache is it's dynamic rescaling - which isn't as easy with starman - if you have a large code base the spin up time for starman can be quite large as it appears (to make it efficient) load in every bit of code that the application needs - even if it is one of those small edge cases.

So yes use starman for simple apps if you need to, but for complex stuff I find mod_perl setup more reliable.

James

-----Original Message-----
From: Wesley Peng <me...@yonghua.org> 
Sent: 05 August 2020 04:31
To: dcook@prosentient.com.au; modperl@perl.apache.org
Subject: Re: Question about deployment of math computing [EXT]

Hi

dcook@prosentient.com.au wrote:
> That's interesting. After re-reading your earlier email, I think that I misunderstood what you were saying.
> 
> Since this is a mod_perl listserv, I imagine that the advice will always be to use mod_perl rather than starman?
> 
> Personally, I'd say either option would be fine. In my experience, the key advantage of mod_perl or starman (say over CGI) is that you can pre-load libraries into memory at web server startup time, and that processes are persistent (although they do have limited lifetimes of course).
> 
> You could use a framework like Catalyst or Mojolicious (note Dancer is another framework, but I haven't worked with it) which can support different web servers, and then try the different options to see what suits you best.
> 
> One thing to note would be that usually people put a reverse proxy in front of starman like Apache or Nginx (partially for serving static assets but other reasons as well). Your stack could be less complicated if you just went the mod_perl/Apache route.
> 
> That said, what OS are you planning to use? It's worth checking if mod_perl is easily available in your target OS's package repositories. I think Red Hat dropped mod_perl starting with RHEL 8, although EPEL 8 now has mod_perl in it. Something to think about.

We use ubuntu 16.04 and 18.04.

We do use dancer/starman in product env, but the service only handle light weight API requests, for example, a restful api for data validation.

While our math computing is heavy weight service, each request will take a lot time to finish, so I think should it be deployed in dancer?

Since the webserver behind dancer is starman by default, starman is event driven, it uses very few processes ,and the process can't scale up/down automatically.

We deploy starman with 5 processes by default. when 5 requests coming, all 5 starman processes are so busy to compute them, so the next request will be blocked. is it?

But apache mp is working as prefork way, generally it can have as many as thousands of processes if the resource is permitted. And the process management can scale up/down the children automatically.

So my real question is, for a CPU consuming service, the event driven service like starman, has no advantage than preforked service like Apache.

Am I right?

Thanks.

-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.

Re: Question about deployment of math computing [EXT]

Posted by Mark Blackman <ma...@blackmans.org>.


> 
> 
> The good thing about Apache is it's dynamic rescaling - which isn't as easy with starman - if you have a large code base the spin up time for starman can be quite large as it appears (to make it efficient) load in every bit of code that the application needs - even if it is one of those small edge cases.
> 
> So yes use starman for simple apps if you need to, but for complex stuff I find mod_perl setup more reliable.

Even Apache has a maximum number of instances. If you’re prepared to let your Apache+mod_perl use up to say 300 concurrent Perl instances, you just set up your starman instance to pre-fork 300 concurrent instances. Your hardware will always impose concurrency limits. You should always be able to achieve the same performance with mod_perl and Starman as Perl is fundamentally single-threaded. Separating the front-end proxy (Apache or Nginx) from the back-end application (Perl app running under starman) is a simplification and a separation of concerns, not a performance gain or penalty.

If you use unix domain sockets for the proxying you can even get zero-downtime application restarts.

mod_perl is great for weird, special cases, like supporting some legacy, 3rd party code, but I don’t believe it’s the best option for the common case.

- Mark

Re: Question about deployment of math computing [EXT]

Posted by Wesley Peng <me...@yonghua.org>.

James,

James Smith wrote:
> The services which use apache/mod_perl work reliably and return data for these - the dancer/starman sometimes fail/hang as there are no backends to serve the requests or those backends timeout requests to the nginx/proxy (but still continue using resources). The team running the backends fail to notice this - because there is no easy to see reporting etc on these boxes.

Thanks for letting me know this.
We have been using starman for restful api service, they are light 
weight http request/response.
But for (machine learning)/(deep learning) serving stuff, we may 
consider to use modperl for more stability.

regards.