You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@arrow.apache.org by Chen Song <ch...@gmail.com> on 2023/01/24 04:01:50 UTC

Question on scheduling/throttling of Flight services

*Hi AllI have a question on best practices on building scheduling /
throttling mechanisms on Flight data services in a multi-tenant
environment.My high level understandings are- The Flight data service uses
a thread-pool-based model. i.e. each data server normally runs with a
fixed-size thread pool. During processing, each request will occupy the
entire thread in its lifecycle.- The size of requests is heterogeneous,
meaning some requests process a few MBs of data and may just take a few
hundred milliseconds, while others may need to process hundreds of GBs of
data taking hours.- For simplicity, let's use thread as the unit of
resources shared among multiple users across data servers to facilitate
discussionOne natural way to start is to only allow a user to use a share
of the thread pool per server. For example, each user is allowed to use up
to 5% of threads in the thread pool on a server.- This mechanism, however,
has a defect on fairness when there are many whale users (users who send
much more concurrent requests than the total number threads allowed for
that user from all servers). Using the example above, if there are 20 such
users (each taking 5% of the thread pool) at all time, they will use up all
threads in the fleet very quickly.- Adding more servers doesn't solve this
issue as each whale user will take threads from the new servers quickly as
well.In other words, how to ensure fairness to not starve regular users
when there are many whale users?My question is: is there any best practice
in Flight data services to handle this with local scheduling/throttling? Or
this can be only solved with global throttling: e.g., track concurrent
requests from a user in a centralized place, and then each Flight metadata
or data service fetches the user concurrencies periodically?Thanks in
advance.*
-- 
Chen Song

Re: Question on scheduling/throttling of Flight services

Posted by Chang She <ch...@eto.ai>.

I don’t know all of the parameters here but it sounds similar to what
database priority / workload management queues solve for? (Eg
https://docs.aws.amazon.com/redshift/latest/dg/c_workload_mngmt_classification.html
)

If the whale users are clustered you can require the clients to send
request information that match them to the right slot.

If you want to get really fancy, if you have the explain plan and the stats
on the datasets, you could make some a rough prediction on resource usage.
But at that point you’re just building a whole DB APM product so prolly not
worth the trouble?



On Tue, Jan 24, 2023 at 9:26 AM David Li <li...@apache.org> wrote:

> Hi Chen,
>
> This is an interesting problem. I don't think it's particularly related to
> Flight, except for the fact that Flight services are likely to face
> problems like this, so I think it's good to discuss here.
>
> The Java and C++ implementations of gRPC/Flight use a thread pool based
> model, yes. One thing where Flight could help you more is if it exposed the
> asynchronous nature of the underlying implementation. In particular, in
> Java, this would let you queue a request without having to tie up a thread.
> (You could just enqueue the request reader/response sender objects and
> return immediately, letting the thread be returned to the thread pool.) Off
> the top of my head, this would be quite achievable to implement [*].
>
> > - The size of requests is heterogeneous, meaning some requests process a
> few MBs of data and may just take a few hundred milliseconds, while others
> may need to process hundreds of GBs of data taking hours.
>
> Are you able to estimate or calculate this up front? If so, you could
> limit based on threads + request cost. Then hopefully whales would use less
> than its 5% quota of threads/concurrent RPC calls.
>
> If not, perhaps one way to approximate it is to do something like what
> cloud providers do for cheap instances: only allow the user to use their
> full RPC quota (per server) for a short time, then throttle down for a
> period after that. I think that won't help for scaling up, though, unless
> you also only grant the full quota after some time has passed after server
> startup.
>
> Another solution might be to keep the quota, but also round-robin handling
> requests, that is, even if a user is technically under quota, if there are
> also requests from other users who haven't been served, prioritize those
> first. This is where I feel like there is almost certainly existing
> literature (queueing theory?) that I am not familiar enough with to
> reference, but which I would go start investigating.
>
> [*]: But we'd end up duplicating all interfaces, so I'd like to also
> evaluate rewriting the current synchronous interface in terms of the
> asynchronous interface.
>
> On Mon, Jan 23, 2023, at 23:01, Chen Song wrote:
> > Hi All
> >
> >
> >
> > I have a question on best practices on building scheduling / throttling
> mechanisms on Flight data services in a multi-tenant environment.
> >
> >
> >
> > My high level understandings are
> >
> > - The Flight data service uses a thread-pool-based model. i.e. each data
> server normally runs with a fixed-size thread pool. During processing, each
> request will occupy the entire thread in its lifecycle.
> >
> > - *The size of requests is heterogeneous, meaning some requests process
> a few MBs of data and may just take a few hundred milliseconds, while
> others may need to process hundreds of GBs of data taking hours.*
> >
> > *- For simplicity, let's use thread as the unit of resources shared
> among multiple users across data servers to facilitate discussion*
> >
> > **
> >
> > *One natural way to start is to only allow a user to use a share of the
> thread pool per server. For example, each user is allowed to use up to 5%
> of threads in the thread pool on a server.*
> >
> > *- This mechanism, however, has a defect on fairness when there are many
> whale users (users who send much more concurrent requests than the total
> number threads allowed for that user from all servers). Using the example
> above, if there are 20 such users (each taking 5% of the thread pool) at
> all time, they will use up all threads in the fleet very quickly.*
> >
> > *- Adding more servers doesn't solve this issue as each whale user will
> take threads from the new servers quickly as well.*
> >
> > **
> >
> > *In other words, how to ensure fairness to not starve regular users when
> there are many whale users?*
> >
> > *My question is: is there any best practice **in Flight data services**
> to handle this with *local* scheduling/throttling? Or this can be only
> solved with global throttling: e.g., track concurrent requests from a user
> in a centralized place, and then each Flight metadata or data service
> fetches the user concurrencies periodically?*
> >
> > **
> >
> > *Thanks in advance.*
> >
> >
> > --
> > Chen Song
> >
> >
> >
>

Re: Question on scheduling/throttling of Flight services

Posted by David Li <li...@apache.org>.

Hi Chen,

This is an interesting problem. I don't think it's particularly related to Flight, except for the fact that Flight services are likely to face problems like this, so I think it's good to discuss here.

The Java and C++ implementations of gRPC/Flight use a thread pool based model, yes. One thing where Flight could help you more is if it exposed the asynchronous nature of the underlying implementation. In particular, in Java, this would let you queue a request without having to tie up a thread. (You could just enqueue the request reader/response sender objects and return immediately, letting the thread be returned to the thread pool.) Off the top of my head, this would be quite achievable to implement [*]. 

> - The size of requests is heterogeneous, meaning some requests process a few MBs of data and may just take a few hundred milliseconds, while others may need to process hundreds of GBs of data taking hours.

Are you able to estimate or calculate this up front? If so, you could limit based on threads + request cost. Then hopefully whales would use less than its 5% quota of threads/concurrent RPC calls.

If not, perhaps one way to approximate it is to do something like what cloud providers do for cheap instances: only allow the user to use their full RPC quota (per server) for a short time, then throttle down for a period after that. I think that won't help for scaling up, though, unless you also only grant the full quota after some time has passed after server startup.

Another solution might be to keep the quota, but also round-robin handling requests, that is, even if a user is technically under quota, if there are also requests from other users who haven't been served, prioritize those first. This is where I feel like there is almost certainly existing literature (queueing theory?) that I am not familiar enough with to reference, but which I would go start investigating.

[*]: But we'd end up duplicating all interfaces, so I'd like to also evaluate rewriting the current synchronous interface in terms of the asynchronous interface.

On Mon, Jan 23, 2023, at 23:01, Chen Song wrote:
> Hi All
> 
> 
> 
> I have a question on best practices on building scheduling / throttling mechanisms on Flight data services in a multi-tenant environment.
> 
> 
> 
> My high level understandings are
> 
> - The Flight data service uses a thread-pool-based model. i.e. each data server normally runs with a fixed-size thread pool. During processing, each request will occupy the entire thread in its lifecycle.
> 
> - *The size of requests is heterogeneous, meaning some requests process a few MBs of data and may just take a few hundred milliseconds, while others may need to process hundreds of GBs of data taking hours.*
> 
> *- For simplicity, let's use thread as the unit of resources shared among multiple users across data servers to facilitate discussion*
> 
> **
> 
> *One natural way to start is to only allow a user to use a share of the thread pool per server. For example, each user is allowed to use up to 5% of threads in the thread pool on a server.*
> 
> *- This mechanism, however, has a defect on fairness when there are many whale users (users who send much more concurrent requests than the total number threads allowed for that user from all servers). Using the example above, if there are 20 such users (each taking 5% of the thread pool) at all time, they will use up all threads in the fleet very quickly.*
> 
> *- Adding more servers doesn't solve this issue as each whale user will take threads from the new servers quickly as well.*
> 
> **
> 
> *In other words, how to ensure fairness to not starve regular users when there are many whale users?*
> 
> *My question is: is there any best practice **in Flight data services** to handle this with *local* scheduling/throttling? Or this can be only solved with global throttling: e.g., track concurrent requests from a user in a centralized place, and then each Flight metadata or data service fetches the user concurrencies periodically?*
> 
> **
> 
> *Thanks in advance.*
> 
> 
> --
> Chen Song
> 
> 
>