You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by S G <sg...@gmail.com> on 2017/08/03 02:41:00 UTC

Limiting the number of queries/updates to Solr

Hi,

My team provides Solr clusters to several other teams in my company.
We get peak-requirements for query-rate and update-rate from our customers
and load-test the cluster based on the same.
This helps us arrive at a cluster suitable for a given peak load.

Problem is that peak load estimates are just estimates.
It would be nice to enforce them from Solr side such that if a rate higher
than that is seen at any core, the core will automatically begin to reject
the requests.
Such a feature would contribute to cluster stability while making sure the
customer gets an exception to remind them of a slower rate.

A configuration like the following in managed-schema or solrconfig.xml
would be great:
<coreRateLimiter>
  <select maxPerSec = 1000/>
  <update maxPerSec = 500/>
  <facets maxPerSec = 100/>
  <pivots maxPerSec = 30/>
</coreRateLimiter>

If the rate exceeds the above limits, an exception like the following
should be thrown: "Cannot process more than 500 updates/second. Please slow
down or raise the coreRateLimiter.update limits in solrconfig.xml'

Is
https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/store/RateLimiter.SimpleRateLimiter.html
a step in that direction?

Thanks
SG

Re: Limiting the number of queries/updates to Solr

Posted by Hrishikesh Gadre <ga...@gmail.com>.
At one point I was working on SOLR-7344
<https://issues.apache.org/jira/browse/SOLR-7344> (but it fell off the
radar due to various reasons). Specifically I built a servlet request
filter which implements a customizable queuing mechanism using asynchronous
servlet API (Servlet 3 spec). This way you can define how many concurrent
requests of a specific type (e.g. query, indexing etc.) you want to
process. This can also be extended at a core (or collection) level.

https://github.com/hgadre/servletrequest-scheduler

<https://github.com/hgadre/servletrequest-scheduler>
If this is something interesting and useful for the community, I would be
more than happy to help moving this forward. Otherwise I would like to get
any feedback for possible improvements (or drawbacks) etc.

Thanks
Hrishikesh




On Wed, Aug 2, 2017 at 9:45 PM, Walter Underwood <wu...@wunderwood.org>
wrote:

>
> > On Aug 2, 2017, at 8:33 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> >
> > IMHO, intentionally causing connections to fail when a limit is exceeded
> > would not be a very good idea.  When the rate gets too high, the first
> > thing that happens is all the requests slow down.  The slowdown could be
> > dramatic.  As the rate continues to increase, some of the requests
> > probably would begin to fail.
>
> No, this is a very good idea. It is called “load shedding” or “fail fast”.
> Gracefully dealing with overload is an essential part of system design.
>
> At Netflix, with a pre-Jetty Solr (war file running under Tomcat), we took
> down 40 front end servers with slow response times from the Solr server
> farm. We tied up all the front end threads waiting on responses from the
> Solr servers. That left no front end threads available to respond to
> incoming HTTP requests. It was not a fun evening.
>
> To fix this, we configured the Citrix load balancer to overflow to a
> different server when the outstanding back-end requests hit a limit. The
> overflow server was a virtual server that immediately returned a 503. That
> would free up front end connections and threads in an overload condition.
> The users would get a “search unavailable” page, but the rest of the site
> would continue to work.
>
> Unfortunately, the AWS load balancers don’t offer anything like this, ten
> years later.
>
> The worst case version of this is a stable congested state. It is pretty
> easy to put requests into a queue (connection/server) that are guaranteed
> to time out before they are serviced. If you have 35 requests in the queue,
> a 1 second service time, and a 30 second timeout, those requests are
> already dead when you put them on the queue.
>
> I learned about this when I worked with John Nagle at Ford Aerospace. I
> recommend his note “On Packet Switches with Infinite Storage” (1985) for
> the full story. It is only eight pages long, but packed with goodness.
>
> https://tools.ietf.org/html/rfc970
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>

Re: Limiting the number of queries/updates to Solr

Posted by Hrishikesh Gadre <ga...@gmail.com>.
At one point I was working on SOLR-7344
<https://issues.apache.org/jira/browse/SOLR-7344> (but it fell off the
radar due to various reasons). Specifically I built a servlet request
filter which implements a customizable queuing mechanism using asynchronous
servlet API (Servlet 3 spec). This way you can define how many concurrent
requests of a specific type (e.g. query, indexing etc.) you want to
process. This can also be extended at a core (or collection) level.

https://github.com/hgadre/servletrequest-scheduler

<https://github.com/hgadre/servletrequest-scheduler>
If this is something interesting and useful for the community, I would be
more than happy to help moving this forward. Otherwise I would like to get
any feedback for possible improvements (or drawbacks) etc.

Thanks
Hrishikesh




On Wed, Aug 2, 2017 at 9:45 PM, Walter Underwood <wu...@wunderwood.org>
wrote:

>
> > On Aug 2, 2017, at 8:33 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> >
> > IMHO, intentionally causing connections to fail when a limit is exceeded
> > would not be a very good idea.  When the rate gets too high, the first
> > thing that happens is all the requests slow down.  The slowdown could be
> > dramatic.  As the rate continues to increase, some of the requests
> > probably would begin to fail.
>
> No, this is a very good idea. It is called “load shedding” or “fail fast”.
> Gracefully dealing with overload is an essential part of system design.
>
> At Netflix, with a pre-Jetty Solr (war file running under Tomcat), we took
> down 40 front end servers with slow response times from the Solr server
> farm. We tied up all the front end threads waiting on responses from the
> Solr servers. That left no front end threads available to respond to
> incoming HTTP requests. It was not a fun evening.
>
> To fix this, we configured the Citrix load balancer to overflow to a
> different server when the outstanding back-end requests hit a limit. The
> overflow server was a virtual server that immediately returned a 503. That
> would free up front end connections and threads in an overload condition.
> The users would get a “search unavailable” page, but the rest of the site
> would continue to work.
>
> Unfortunately, the AWS load balancers don’t offer anything like this, ten
> years later.
>
> The worst case version of this is a stable congested state. It is pretty
> easy to put requests into a queue (connection/server) that are guaranteed
> to time out before they are serviced. If you have 35 requests in the queue,
> a 1 second service time, and a 30 second timeout, those requests are
> already dead when you put them on the queue.
>
> I learned about this when I worked with John Nagle at Ford Aerospace. I
> recommend his note “On Packet Switches with Infinite Storage” (1985) for
> the full story. It is only eight pages long, but packed with goodness.
>
> https://tools.ietf.org/html/rfc970
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>

Re: Limiting the number of queries/updates to Solr

Posted by Walter Underwood <wu...@wunderwood.org>.
> On Aug 2, 2017, at 8:33 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> IMHO, intentionally causing connections to fail when a limit is exceeded
> would not be a very good idea.  When the rate gets too high, the first
> thing that happens is all the requests slow down.  The slowdown could be
> dramatic.  As the rate continues to increase, some of the requests
> probably would begin to fail.

No, this is a very good idea. It is called “load shedding” or “fail fast”. Gracefully dealing with overload is an essential part of system design.

At Netflix, with a pre-Jetty Solr (war file running under Tomcat), we took down 40 front end servers with slow response times from the Solr server farm. We tied up all the front end threads waiting on responses from the Solr servers. That left no front end threads available to respond to incoming HTTP requests. It was not a fun evening.

To fix this, we configured the Citrix load balancer to overflow to a different server when the outstanding back-end requests hit a limit. The overflow server was a virtual server that immediately returned a 503. That would free up front end connections and threads in an overload condition. The users would get a “search unavailable” page, but the rest of the site would continue to work.

Unfortunately, the AWS load balancers don’t offer anything like this, ten years later.

The worst case version of this is a stable congested state. It is pretty easy to put requests into a queue (connection/server) that are guaranteed to time out before they are serviced. If you have 35 requests in the queue, a 1 second service time, and a 30 second timeout, those requests are already dead when you put them on the queue.

I learned about this when I worked with John Nagle at Ford Aerospace. I recommend his note “On Packet Switches with Infinite Storage” (1985) for the full story. It is only eight pages long, but packed with goodness.

https://tools.ietf.org/html/rfc970

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Limiting the number of queries/updates to Solr

Posted by S G <sg...@gmail.com>.
I tried using the Jetty's QoS filter for rate limiting the queries.
It has a good option to apply different rates per URL pattern.

However, the same is not being picked up by Solr and the details of the
same are shared on
https://stackoverflow.com/questions/45536986/why-is-this-qos-jetty-filter-not-working

Has someone has worked on this before and can help?

Thanks
SG


On Fri, Aug 4, 2017 at 5:51 PM, S G <sg...@gmail.com> wrote:

> timeAllowed parameter is a not a good choice for rate limiting and could
> crash the whole Solr cluster.
> In fact, timeAllowed parameter should increase the chances of crashing the
> whole cluster:
>
> When the timeAllowed for a query is over, it's client will get a failure
> but the server handling the query itself will not kill the thread running
> that query. So Solr itself would still be working on that long-running
> query but the client has got a timeOut.
> These failure-receiving client-threads are now free to process other
> requests: retry failed ones or fire new queries to Solr.
> This should suffocate Solr even more, although client application's
> threads will not get blocked ever.
>
> With a rate limiter, we save both - clients' extra traffic gets
> rejected-responses and all Solr nodes breathe easy too.
> IMO, timeAllowed parameter will almost always kill the whole Solr cluster.
>
> -SG
>
>
>
>
> On Fri, Aug 4, 2017 at 3:30 PM, Varun Thacker <va...@vthacker.in> wrote:
>
>> Hi Hrishikesh,
>>
>> I think SOLR-7344 is probably an important addition to Solr. It could help
>> users isolate analytical queries ( streaming ) , search queries and
>> indexing requests and throttle requests
>>
>> Let's continue the discussion on the Jira
>>
>> On Thu, Aug 3, 2017 at 2:03 AM, Rick Leir <rl...@leirtech.com> wrote:
>>
>> >
>> >
>> > On 2017-08-02 11:33 PM, Shawn Heisey wrote:
>> >
>> >> On 8/2/2017 8:41 PM, S G wrote:
>> >>
>> >>> Problem is that peak load estimates are just estimates.
>> >>> It would be nice to enforce them from Solr side such that if a rate
>> >>> higher than that is seen at any core, the core will automatically
>> begin to
>> >>> reject the requests.
>> >>> Such a feature would contribute to cluster stability while making sure
>> >>> the customer gets an exception to remind them of a slower rate.
>> >>>
>> >> Solr doesn't have anything like this.  This is primarily because there
>> >> is no network server code in Solr.  The networking is provided by the
>> >> servlet container.  The container in modern Solr versions is nearly
>> >> guaranteed to be Jetty.  As long as I have been using Solr, it has
>> >> shipped with a Jetty container.
>> >>
>> >> https://wiki.apache.org/solr/WhyNoWar
>> >>
>> >> I have no idea whether Jetty is capable of the kind of rate limiting
>> >> you're after.  If it is, it would be up to you to figure out the
>> >> configuration.
>> >>
>> >> You could always put a proxy server like haproxy in front of Solr.  I'm
>> >> pretty sure that haproxy is capable rejecting connections when the
>> >> request rate gets too high.  Other proxy servers (nginx, apache, F5
>> >> BigIP, solutions from Microsoft, Cisco, etc) are probably also capable
>> >> of this.
>> >>
>> >> IMHO, intentionally causing connections to fail when a limit is
>> exceeded
>> >> would not be a very good idea.  When the rate gets too high, the first
>> >> thing that happens is all the requests slow down.  The slowdown could
>> be
>> >> dramatic.  As the rate continues to increase, some of the requests
>> >> probably would begin to fail.
>> >>
>> >> What you're proposing would be guaranteed to cause requests to fail.
>> >> Failing requests are even more likely than slow requests to result in
>> >> users finding a new source for whatever service they are getting from
>> >> your organization.
>> >>
>> > Shawn,
>> > Agreed, a connection limit is not a good idea.  But there is the
>> > timeAllowed parameter <https://cwiki.apache.org/conf
>> > luence/display/solr/Common+Query+Parameters#CommonQueryPa
>> > rameters-ThetimeAllowedParameter>
>> > timeAllowed - This parameter specifies the amount of time, in
>> > milliseconds, allowed for a search to complete. If this time expires
>> before
>> > the search is complete, any partial results will be returned.
>> >
>> > https://stackoverflow.com/questions/19557476/timing-out-a-query-in-solr
>> >
>> > With timeAllowed, you need not estimate what connection rate is
>> > unbearable. Rather, you would set a max response time. If some queries
>> take
>> > much longer than other queries, then this would cause the long ones to
>> > fail, which might be a good strategy. However, if queries normally all
>> take
>> > about the same time, then this would cause all queries to return partial
>> > results until the server recovers, which might be a bad strategy. In
>> this
>> > case, Walter's post is sensible.
>> >
>> > A previous thread suggested that timeAllowed could cause bad performance
>> > on some cloud servers.
>> > cheers -- Rick
>> >
>> >
>> >
>> >
>> >
>>
>
>

Re: Limiting the number of queries/updates to Solr

Posted by S G <sg...@gmail.com>.
timeAllowed parameter is a not a good choice for rate limiting and could
crash the whole Solr cluster.
In fact, timeAllowed parameter should increase the chances of crashing the
whole cluster:

When the timeAllowed for a query is over, it's client will get a failure
but the server handling the query itself will not kill the thread running
that query. So Solr itself would still be working on that long-running
query but the client has got a timeOut.
These failure-receiving client-threads are now free to process other
requests: retry failed ones or fire new queries to Solr.
This should suffocate Solr even more, although client application's threads
will not get blocked ever.

With a rate limiter, we save both - clients' extra traffic gets
rejected-responses and all Solr nodes breathe easy too.
IMO, timeAllowed parameter will almost always kill the whole Solr cluster.

-SG




On Fri, Aug 4, 2017 at 3:30 PM, Varun Thacker <va...@vthacker.in> wrote:

> Hi Hrishikesh,
>
> I think SOLR-7344 is probably an important addition to Solr. It could help
> users isolate analytical queries ( streaming ) , search queries and
> indexing requests and throttle requests
>
> Let's continue the discussion on the Jira
>
> On Thu, Aug 3, 2017 at 2:03 AM, Rick Leir <rl...@leirtech.com> wrote:
>
> >
> >
> > On 2017-08-02 11:33 PM, Shawn Heisey wrote:
> >
> >> On 8/2/2017 8:41 PM, S G wrote:
> >>
> >>> Problem is that peak load estimates are just estimates.
> >>> It would be nice to enforce them from Solr side such that if a rate
> >>> higher than that is seen at any core, the core will automatically
> begin to
> >>> reject the requests.
> >>> Such a feature would contribute to cluster stability while making sure
> >>> the customer gets an exception to remind them of a slower rate.
> >>>
> >> Solr doesn't have anything like this.  This is primarily because there
> >> is no network server code in Solr.  The networking is provided by the
> >> servlet container.  The container in modern Solr versions is nearly
> >> guaranteed to be Jetty.  As long as I have been using Solr, it has
> >> shipped with a Jetty container.
> >>
> >> https://wiki.apache.org/solr/WhyNoWar
> >>
> >> I have no idea whether Jetty is capable of the kind of rate limiting
> >> you're after.  If it is, it would be up to you to figure out the
> >> configuration.
> >>
> >> You could always put a proxy server like haproxy in front of Solr.  I'm
> >> pretty sure that haproxy is capable rejecting connections when the
> >> request rate gets too high.  Other proxy servers (nginx, apache, F5
> >> BigIP, solutions from Microsoft, Cisco, etc) are probably also capable
> >> of this.
> >>
> >> IMHO, intentionally causing connections to fail when a limit is exceeded
> >> would not be a very good idea.  When the rate gets too high, the first
> >> thing that happens is all the requests slow down.  The slowdown could be
> >> dramatic.  As the rate continues to increase, some of the requests
> >> probably would begin to fail.
> >>
> >> What you're proposing would be guaranteed to cause requests to fail.
> >> Failing requests are even more likely than slow requests to result in
> >> users finding a new source for whatever service they are getting from
> >> your organization.
> >>
> > Shawn,
> > Agreed, a connection limit is not a good idea.  But there is the
> > timeAllowed parameter <https://cwiki.apache.org/conf
> > luence/display/solr/Common+Query+Parameters#CommonQueryPa
> > rameters-ThetimeAllowedParameter>
> > timeAllowed - This parameter specifies the amount of time, in
> > milliseconds, allowed for a search to complete. If this time expires
> before
> > the search is complete, any partial results will be returned.
> >
> > https://stackoverflow.com/questions/19557476/timing-out-a-query-in-solr
> >
> > With timeAllowed, you need not estimate what connection rate is
> > unbearable. Rather, you would set a max response time. If some queries
> take
> > much longer than other queries, then this would cause the long ones to
> > fail, which might be a good strategy. However, if queries normally all
> take
> > about the same time, then this would cause all queries to return partial
> > results until the server recovers, which might be a bad strategy. In this
> > case, Walter's post is sensible.
> >
> > A previous thread suggested that timeAllowed could cause bad performance
> > on some cloud servers.
> > cheers -- Rick
> >
> >
> >
> >
> >
>

Re: Limiting the number of queries/updates to Solr

Posted by Varun Thacker <va...@vthacker.in>.
Hi Hrishikesh,

I think SOLR-7344 is probably an important addition to Solr. It could help
users isolate analytical queries ( streaming ) , search queries and
indexing requests and throttle requests

Let's continue the discussion on the Jira

On Thu, Aug 3, 2017 at 2:03 AM, Rick Leir <rl...@leirtech.com> wrote:

>
>
> On 2017-08-02 11:33 PM, Shawn Heisey wrote:
>
>> On 8/2/2017 8:41 PM, S G wrote:
>>
>>> Problem is that peak load estimates are just estimates.
>>> It would be nice to enforce them from Solr side such that if a rate
>>> higher than that is seen at any core, the core will automatically begin to
>>> reject the requests.
>>> Such a feature would contribute to cluster stability while making sure
>>> the customer gets an exception to remind them of a slower rate.
>>>
>> Solr doesn't have anything like this.  This is primarily because there
>> is no network server code in Solr.  The networking is provided by the
>> servlet container.  The container in modern Solr versions is nearly
>> guaranteed to be Jetty.  As long as I have been using Solr, it has
>> shipped with a Jetty container.
>>
>> https://wiki.apache.org/solr/WhyNoWar
>>
>> I have no idea whether Jetty is capable of the kind of rate limiting
>> you're after.  If it is, it would be up to you to figure out the
>> configuration.
>>
>> You could always put a proxy server like haproxy in front of Solr.  I'm
>> pretty sure that haproxy is capable rejecting connections when the
>> request rate gets too high.  Other proxy servers (nginx, apache, F5
>> BigIP, solutions from Microsoft, Cisco, etc) are probably also capable
>> of this.
>>
>> IMHO, intentionally causing connections to fail when a limit is exceeded
>> would not be a very good idea.  When the rate gets too high, the first
>> thing that happens is all the requests slow down.  The slowdown could be
>> dramatic.  As the rate continues to increase, some of the requests
>> probably would begin to fail.
>>
>> What you're proposing would be guaranteed to cause requests to fail.
>> Failing requests are even more likely than slow requests to result in
>> users finding a new source for whatever service they are getting from
>> your organization.
>>
> Shawn,
> Agreed, a connection limit is not a good idea.  But there is the
> timeAllowed parameter <https://cwiki.apache.org/conf
> luence/display/solr/Common+Query+Parameters#CommonQueryPa
> rameters-ThetimeAllowedParameter>
> timeAllowed - This parameter specifies the amount of time, in
> milliseconds, allowed for a search to complete. If this time expires before
> the search is complete, any partial results will be returned.
>
> https://stackoverflow.com/questions/19557476/timing-out-a-query-in-solr
>
> With timeAllowed, you need not estimate what connection rate is
> unbearable. Rather, you would set a max response time. If some queries take
> much longer than other queries, then this would cause the long ones to
> fail, which might be a good strategy. However, if queries normally all take
> about the same time, then this would cause all queries to return partial
> results until the server recovers, which might be a bad strategy. In this
> case, Walter's post is sensible.
>
> A previous thread suggested that timeAllowed could cause bad performance
> on some cloud servers.
> cheers -- Rick
>
>
>
>
>

Re: Limiting the number of queries/updates to Solr

Posted by Rick Leir <rl...@leirtech.com>.

On 2017-08-02 11:33 PM, Shawn Heisey wrote:
> On 8/2/2017 8:41 PM, S G wrote:
>> Problem is that peak load estimates are just estimates.
>> It would be nice to enforce them from Solr side such that if a rate higher than that is seen at any core, the core will automatically begin to reject the requests.
>> Such a feature would contribute to cluster stability while making sure the customer gets an exception to remind them of a slower rate.
> Solr doesn't have anything like this.  This is primarily because there
> is no network server code in Solr.  The networking is provided by the
> servlet container.  The container in modern Solr versions is nearly
> guaranteed to be Jetty.  As long as I have been using Solr, it has
> shipped with a Jetty container.
>
> https://wiki.apache.org/solr/WhyNoWar
>
> I have no idea whether Jetty is capable of the kind of rate limiting
> you're after.  If it is, it would be up to you to figure out the
> configuration.
>
> You could always put a proxy server like haproxy in front of Solr.  I'm
> pretty sure that haproxy is capable rejecting connections when the
> request rate gets too high.  Other proxy servers (nginx, apache, F5
> BigIP, solutions from Microsoft, Cisco, etc) are probably also capable
> of this.
>
> IMHO, intentionally causing connections to fail when a limit is exceeded
> would not be a very good idea.  When the rate gets too high, the first
> thing that happens is all the requests slow down.  The slowdown could be
> dramatic.  As the rate continues to increase, some of the requests
> probably would begin to fail.
>
> What you're proposing would be guaranteed to cause requests to fail.
> Failing requests are even more likely than slow requests to result in
> users finding a new source for whatever service they are getting from
> your organization.
Shawn,
Agreed, a connection limit is not a good idea.  But there is the 
timeAllowed parameter 
<https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter>
timeAllowed - This parameter specifies the amount of time, in 
milliseconds, allowed for a search to complete. If this time expires 
before the search is complete, any partial results will be returned.

https://stackoverflow.com/questions/19557476/timing-out-a-query-in-solr

With timeAllowed, you need not estimate what connection rate is 
unbearable. Rather, you would set a max response time. If some queries 
take much longer than other queries, then this would cause the long ones 
to fail, which might be a good strategy. However, if queries normally 
all take about the same time, then this would cause all queries to 
return partial results until the server recovers, which might be a bad 
strategy. In this case, Walter's post is sensible.

A previous thread suggested that timeAllowed could cause bad performance 
on some cloud servers.
cheers -- Rick





Re: Limiting the number of queries/updates to Solr

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/2/2017 8:41 PM, S G wrote:
> Problem is that peak load estimates are just estimates.
> It would be nice to enforce them from Solr side such that if a rate higher than that is seen at any core, the core will automatically begin to reject the requests.
> Such a feature would contribute to cluster stability while making sure the customer gets an exception to remind them of a slower rate.

Solr doesn't have anything like this.  This is primarily because there
is no network server code in Solr.  The networking is provided by the
servlet container.  The container in modern Solr versions is nearly
guaranteed to be Jetty.  As long as I have been using Solr, it has
shipped with a Jetty container.

https://wiki.apache.org/solr/WhyNoWar

I have no idea whether Jetty is capable of the kind of rate limiting
you're after.  If it is, it would be up to you to figure out the
configuration.

You could always put a proxy server like haproxy in front of Solr.  I'm
pretty sure that haproxy is capable rejecting connections when the
request rate gets too high.  Other proxy servers (nginx, apache, F5
BigIP, solutions from Microsoft, Cisco, etc) are probably also capable
of this.

IMHO, intentionally causing connections to fail when a limit is exceeded
would not be a very good idea.  When the rate gets too high, the first
thing that happens is all the requests slow down.  The slowdown could be
dramatic.  As the rate continues to increase, some of the requests
probably would begin to fail.

What you're proposing would be guaranteed to cause requests to fail. 
Failing requests are even more likely than slow requests to result in
users finding a new source for whatever service they are getting from
your organization.

Your customer teams might not be able to control the request rate, as it
would probably be related to the number of users who connect to their
services.  It seems like a better option to inform a team that they have
exceeded their request estimates and that they will need to come up with
additional budget so more hardware can be deployed.  If that doesn't
happen, then their service may suffer, and it will not be your fault.

The RateLimiter class in Lucene that you mentioned is designed to limit
the I/O rate of disk or network data transfers, not a request rate.  One
of the most visible uses of this capability in Solr is the ability to
limit the transfer rate of the old-style index replication.  It is also
used in Lucene to slow down the disk I/O usage of segment merging.

A custom Solr component could be built that can be added to a request
handler that does what you're proposing.  If you wanted to write such a
component, you could donate it to the project and try to get it included
in Solr.  Even though I believe such a feature is a bad idea, I'm sure
it would be loved by some users.

Thanks,
Shawn