You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Thawan Kooburat <th...@fb.com> on 2013/02/27 01:41:42 UTC

RFC: Behavior of QuotaExceededException

Hi,
I am currently working on ZOOKEEPER-1383. One of the main feature introduced in this change is to allow ZooKeeper to enforce hard limit (e.g.  Txn per sec) per folder .

With hard limit, we need to introduce a new exception/error code (QuotaExceeded) for ZooKeeper operations that modify the DataTree.  If a client get this error, it means that the particular operation is definitely failed.

>From our internal discussion, this may make it harder for a user to write an application.  The thought is that this can possibly introduce a hole in sequence of operations that the client application performs, since some operation may success but some may be not.  One of the idea is to also  trigger session expire (or at least trigger disconnect) on the server-side in addition to QuotaExceed error.  This will cause all subsequent operations from that client to fail and allow the application to use existing error handling logic to recover from QuotaExceed.  Typically, the application that exceeded the quota is already doing something wrong from administrator's perspective, but we also want to fail gracefully and able to recover when the problem is fixed or quota is increased.

Let me know if you have any suggestion.

--
Thawan Kooburat

Re: RFC: Behavior of QuotaExceededException

Posted by Thawan Kooburat <th...@fb.com>.
Here is how we plan to use quota in our environment.

Currently we use ACL to lock root directory. Each team will have to made a
request through us to setup its project folder. Eventually, we will
enforce authentication but currently we assume that everybody will write
to their own project folder

The existing quota feature only allow soft-limit (log warning message) and
do per-folder resource tracking. We expanded resource tracking so that we
can track read/write per-folder and export that information to our
external monitoring system. This allow us to setup per-folder alert and
easily keep track of each team usage.  We want to expand this to allow
hard-limit enforcement too.

We get to review most of the client's application logic to make sure that
they use it properly, so we are not trying to be protected against
malicious/improper use cases.  However, the problem that we are trying to
solve is that client requests or usage may spike due to other failure. Eg.
the application see unexpected workload spike. Or clean up process wasn't
running so data start to grow too large.  Some services are more important
that the others, so we want to make sure that we have enough capacity for
these critical services under those scenarios.

So far I think expiring the offending session is not ideal, but it should
be able to do the job. Let me know if you have any suggestion.
   

-- 
Thawan Kooburat





On 3/1/13 9:15 AM, "Camille Fournier" <ca...@apache.org> wrote:

>This is fundamentally one of the huge problems with running "ZooKeeper as
>a
>service", I would be very interested to see how you get around it. ZK is
>so
>sensitive to client (ill)behavior, as I'm sure you're experiencing, that
>it's really difficult to provide it as a service.
>Do you guys control the ZK clients? One thing that I did to get around
>problems like this was actually wrapping the client library into my own,
>more restrictive client. There's not a great way to ensure that the only
>people connecting to your ZK are actually running your client (we need an
>API key or something) but if you're actually wrapping the way people work
>with ZK you can solve a reasonable subset of problems. In fact, when I was
>running a "ZooKeeper as a Service" centralized system, the only clients
>that ever caused me problems were from a group of perl developers that
>weren't using a client my team had provided and thus totally misused the
>system. This would at least make your second two problems (requests/sec
>and
>update bytes/sec) less likely to happen.
>
>For the issue of node count and used bytes, those are currently set on
>subtrees aren't they? So it doesn't just affect the client connected but
>any client using that subtree? I don't know how anyone can sanely reason
>about a size/byte limit on a node/subtree that someone else is crapping
>all
>over, unless it's tied to ACLs or something. How are you getting around
>this?
>
>
>On Wed, Feb 27, 2013 at 2:12 PM, Thawan Kooburat <th...@fb.com> wrote:
>
>> On 2/27/13 12:10 AM, "Flavio Junqueira" <fp...@yahoo.com> wrote:
>>
>>
>> >It wouldn't be very nice to allow holes in the sequence of operations
>>of
>> >a client, it would violate session semantics. I'm also wondering about
>>a
>> >couple of things:
>> >
>> >- What does QuotaExceedException convey to the application? That the
>> >application client won't ever be able to send operations again with
>>that
>> >session? That it won't be able to submit new operations for up to x
>> >amount of time, where x is computed somehow? Expiring the session will
>> >have the side-effect that all the ephemeral nodes will be gone, I'm not
>> >sure that's desirable, but as a punishment it might work out fine. ;-)
>>
>> My initial plan is support 4 types of hard limits ( node count, used
>> bytes, requests/sec and update bytes/sec).  For the first 2 types of
>> limits, it is likely that client won't be able to complete any operation
>> after quota is exceeded. For the last two, after some amount of time,
>>the
>> client should be able to make a successful request.
>>
>> >- Have you consider limiting the rate of client operations instead of
>> >failing operations? Shaping the traffic of operations of a client might
>> >be way nicer from the client perspective, but perhaps a bit harder to
>> >implement.
>>
>> We considered that as well, I already prototyped this feature a while
>> back. The main problem that I saw is that the network layer (eg. NIO
>> subsystem) only know about request size/rate ,client's ip/port and
>> sessionId. So its ability to do throttling is limited. Additionally,
>>for a
>> client with a low session timeout, it will eventually timeout and
>> reconnect with other server (or create a new session) which will allow
>>it
>> make a successful request on the other server until it exceeds usage
>> threshold again.
>>
>>
>> Thanks for your response. I think I will go with session expire route.
>>
>>
>> >
>> >-Flavio
>> >
>> >On Feb 27, 2013, at 1:41 AM, Thawan Kooburat <th...@fb.com> wrote:
>> >
>> >> Hi,
>> >> I am currently working on ZOOKEEPER-1383. One of the main feature
>> >>introduced in this change is to allow ZooKeeper to enforce hard limit
>> >>(e.g.  Txn per sec) per folder .
>> >>
>> >> With hard limit, we need to introduce a new exception/error code
>> >>(QuotaExceeded) for ZooKeeper operations that modify the DataTree.
>>If a
>> >>client get this error, it means that the particular operation is
>> >>definitely failed.
>> >>
>> >> From our internal discussion, this may make it harder for a user to
>> >>write an application.  The thought is that this can possibly
>>introduce a
>> >>hole in sequence of operations that the client application performs,
>> >>since some operation may success but some may be not.  One of the idea
>> >>is to also  trigger session expire (or at least trigger disconnect) on
>> >>the server-side in addition to QuotaExceed error.  This will cause all
>> >>subsequent operations from that client to fail and allow the
>>application
>> >>to use existing error handling logic to recover from QuotaExceed.
>> >>Typically, the application that exceeded the quota is already doing
>> >>something wrong from administrator's perspective, but we also want to
>> >>fail gracefully and able to recover when the problem is fixed or quota
>> >>is increased.
>> >>
>> >> Let me know if you have any suggestion.
>> >>
>> >> --
>> >> Thawan Kooburat
>> >
>>
>>


Re: RFC: Behavior of QuotaExceededException

Posted by Camille Fournier <ca...@apache.org>.
This is fundamentally one of the huge problems with running "ZooKeeper as a
service", I would be very interested to see how you get around it. ZK is so
sensitive to client (ill)behavior, as I'm sure you're experiencing, that
it's really difficult to provide it as a service.
Do you guys control the ZK clients? One thing that I did to get around
problems like this was actually wrapping the client library into my own,
more restrictive client. There's not a great way to ensure that the only
people connecting to your ZK are actually running your client (we need an
API key or something) but if you're actually wrapping the way people work
with ZK you can solve a reasonable subset of problems. In fact, when I was
running a "ZooKeeper as a Service" centralized system, the only clients
that ever caused me problems were from a group of perl developers that
weren't using a client my team had provided and thus totally misused the
system. This would at least make your second two problems (requests/sec and
update bytes/sec) less likely to happen.

For the issue of node count and used bytes, those are currently set on
subtrees aren't they? So it doesn't just affect the client connected but
any client using that subtree? I don't know how anyone can sanely reason
about a size/byte limit on a node/subtree that someone else is crapping all
over, unless it's tied to ACLs or something. How are you getting around
this?


On Wed, Feb 27, 2013 at 2:12 PM, Thawan Kooburat <th...@fb.com> wrote:

> On 2/27/13 12:10 AM, "Flavio Junqueira" <fp...@yahoo.com> wrote:
>
>
> >It wouldn't be very nice to allow holes in the sequence of operations of
> >a client, it would violate session semantics. I'm also wondering about a
> >couple of things:
> >
> >- What does QuotaExceedException convey to the application? That the
> >application client won't ever be able to send operations again with that
> >session? That it won't be able to submit new operations for up to x
> >amount of time, where x is computed somehow? Expiring the session will
> >have the side-effect that all the ephemeral nodes will be gone, I'm not
> >sure that's desirable, but as a punishment it might work out fine. ;-)
>
> My initial plan is support 4 types of hard limits ( node count, used
> bytes, requests/sec and update bytes/sec).  For the first 2 types of
> limits, it is likely that client won't be able to complete any operation
> after quota is exceeded. For the last two, after some amount of time, the
> client should be able to make a successful request.
>
> >- Have you consider limiting the rate of client operations instead of
> >failing operations? Shaping the traffic of operations of a client might
> >be way nicer from the client perspective, but perhaps a bit harder to
> >implement.
>
> We considered that as well, I already prototyped this feature a while
> back. The main problem that I saw is that the network layer (eg. NIO
> subsystem) only know about request size/rate ,client's ip/port and
> sessionId. So its ability to do throttling is limited. Additionally, for a
> client with a low session timeout, it will eventually timeout and
> reconnect with other server (or create a new session) which will allow it
> make a successful request on the other server until it exceeds usage
> threshold again.
>
>
> Thanks for your response. I think I will go with session expire route.
>
>
> >
> >-Flavio
> >
> >On Feb 27, 2013, at 1:41 AM, Thawan Kooburat <th...@fb.com> wrote:
> >
> >> Hi,
> >> I am currently working on ZOOKEEPER-1383. One of the main feature
> >>introduced in this change is to allow ZooKeeper to enforce hard limit
> >>(e.g.  Txn per sec) per folder .
> >>
> >> With hard limit, we need to introduce a new exception/error code
> >>(QuotaExceeded) for ZooKeeper operations that modify the DataTree.  If a
> >>client get this error, it means that the particular operation is
> >>definitely failed.
> >>
> >> From our internal discussion, this may make it harder for a user to
> >>write an application.  The thought is that this can possibly introduce a
> >>hole in sequence of operations that the client application performs,
> >>since some operation may success but some may be not.  One of the idea
> >>is to also  trigger session expire (or at least trigger disconnect) on
> >>the server-side in addition to QuotaExceed error.  This will cause all
> >>subsequent operations from that client to fail and allow the application
> >>to use existing error handling logic to recover from QuotaExceed.
> >>Typically, the application that exceeded the quota is already doing
> >>something wrong from administrator's perspective, but we also want to
> >>fail gracefully and able to recover when the problem is fixed or quota
> >>is increased.
> >>
> >> Let me know if you have any suggestion.
> >>
> >> --
> >> Thawan Kooburat
> >
>
>

Re: RFC: Behavior of QuotaExceededException

Posted by Thawan Kooburat <th...@fb.com>.
On 2/27/13 12:10 AM, "Flavio Junqueira" <fp...@yahoo.com> wrote:


>It wouldn't be very nice to allow holes in the sequence of operations of
>a client, it would violate session semantics. I'm also wondering about a
>couple of things:
>
>- What does QuotaExceedException convey to the application? That the
>application client won't ever be able to send operations again with that
>session? That it won't be able to submit new operations for up to x
>amount of time, where x is computed somehow? Expiring the session will
>have the side-effect that all the ephemeral nodes will be gone, I'm not
>sure that's desirable, but as a punishment it might work out fine. ;-)

My initial plan is support 4 types of hard limits ( node count, used
bytes, requests/sec and update bytes/sec).  For the first 2 types of
limits, it is likely that client won't be able to complete any operation
after quota is exceeded. For the last two, after some amount of time, the
client should be able to make a successful request.

>- Have you consider limiting the rate of client operations instead of
>failing operations? Shaping the traffic of operations of a client might
>be way nicer from the client perspective, but perhaps a bit harder to
>implement.

We considered that as well, I already prototyped this feature a while
back. The main problem that I saw is that the network layer (eg. NIO
subsystem) only know about request size/rate ,client's ip/port and
sessionId. So its ability to do throttling is limited. Additionally, for a
client with a low session timeout, it will eventually timeout and
reconnect with other server (or create a new session) which will allow it
make a successful request on the other server until it exceeds usage
threshold again.   


Thanks for your response. I think I will go with session expire route.


>
>-Flavio 
>
>On Feb 27, 2013, at 1:41 AM, Thawan Kooburat <th...@fb.com> wrote:
>
>> Hi,
>> I am currently working on ZOOKEEPER-1383. One of the main feature
>>introduced in this change is to allow ZooKeeper to enforce hard limit
>>(e.g.  Txn per sec) per folder .
>> 
>> With hard limit, we need to introduce a new exception/error code
>>(QuotaExceeded) for ZooKeeper operations that modify the DataTree.  If a
>>client get this error, it means that the particular operation is
>>definitely failed.
>> 
>> From our internal discussion, this may make it harder for a user to
>>write an application.  The thought is that this can possibly introduce a
>>hole in sequence of operations that the client application performs,
>>since some operation may success but some may be not.  One of the idea
>>is to also  trigger session expire (or at least trigger disconnect) on
>>the server-side in addition to QuotaExceed error.  This will cause all
>>subsequent operations from that client to fail and allow the application
>>to use existing error handling logic to recover from QuotaExceed.
>>Typically, the application that exceeded the quota is already doing
>>something wrong from administrator's perspective, but we also want to
>>fail gracefully and able to recover when the problem is fixed or quota
>>is increased.
>> 
>> Let me know if you have any suggestion.
>> 
>> --
>> Thawan Kooburat
>


Re: RFC: Behavior of QuotaExceededException

Posted by Flavio Junqueira <fp...@yahoo.com>.
It wouldn't be very nice to allow holes in the sequence of operations of a client, it would violate session semantics. I'm also wondering about a couple of things:

- What does QuotaExceedException convey to the application? That the application client won't ever be able to send operations again with that session? That it won't be able to submit new operations for up to x amount of time, where x is computed somehow? Expiring the session will have the side-effect that all the ephemeral nodes will be gone, I'm not sure that's desirable, but as a punishment it might work out fine. ;-)
- Have you consider limiting the rate of client operations instead of failing operations? Shaping the traffic of operations of a client might be way nicer from the client perspective, but perhaps a bit harder to implement.

-Flavio 

On Feb 27, 2013, at 1:41 AM, Thawan Kooburat <th...@fb.com> wrote:

> Hi,
> I am currently working on ZOOKEEPER-1383. One of the main feature introduced in this change is to allow ZooKeeper to enforce hard limit (e.g.  Txn per sec) per folder .
> 
> With hard limit, we need to introduce a new exception/error code (QuotaExceeded) for ZooKeeper operations that modify the DataTree.  If a client get this error, it means that the particular operation is definitely failed.
> 
> From our internal discussion, this may make it harder for a user to write an application.  The thought is that this can possibly introduce a hole in sequence of operations that the client application performs, since some operation may success but some may be not.  One of the idea is to also  trigger session expire (or at least trigger disconnect) on the server-side in addition to QuotaExceed error.  This will cause all subsequent operations from that client to fail and allow the application to use existing error handling logic to recover from QuotaExceed.  Typically, the application that exceeded the quota is already doing something wrong from administrator's perspective, but we also want to fail gracefully and able to recover when the problem is fixed or quota is increased.
> 
> Let me know if you have any suggestion.
> 
> --
> Thawan Kooburat