You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Yue Shen <sh...@gmail.com> on 2019/09/27 17:01:23 UTC

How to scale ZooKeeper to support 10K concurrent connections?

Dear ZooKeeper users,

I have a special use case, in which I use AWS lambda service.

Inside the lambda service logic, it goes to ZooKeeper to check the worker
for the data, if exists,  connect to the worker endpoint and send the data.
If the worker isn't assigned, the logic will post a new assignment, and
wait for it to be assigned to a worker. There is a coordinator to watch the
new assignment and assign tasks.

My problem comes with AWS Lambda service, which can launch tens of
thousands of calls. When this happens, I found many calls get timeout. The
active connections to ZooKeeper plateau around 6500.

BTW, I run ZooKeeper as 3 node ensemble, run on Quorum.

How can I scale ZooKeeper to support more concurrent connections?

Thank you,
Yue

Re: How to scale ZooKeeper to support 10K concurrent connections?

Posted by Yue Shen <sh...@gmail.com>.
Thank you, Enrico.

I will check majordodo. Let you know if we can use it for better
architecture.

Yue

On Fri, Sep 27, 2019 at 11:36 AM Enrico Olivelli <eo...@gmail.com>
wrote:

> Yue
> As Jorn said you should introduce some kind of scalable middleware.
> Zookeeper shouldn't be used in the hotpath.
>
> If you want something simple and based on components from Zookeeper
> ecosystem maybe you can give a look to this simple task broker service,
> Majordodo (full disclosure, I am one of the maintainers of the project)
>
> https://github.com/diennea/majordodo
>
> Enrico
>
>
> Il ven 27 set 2019, 19:39 Jörn Franke <jo...@gmail.com> ha scritto:
>
> > Put the Solr request on a SQS queue using your 10k instances and have 10
> > or so worker working on the queue to put it in Solr. Having 10k
> connections
> > just because lambda creates that many instances does not make sense for
> no
> > database service.
> >
> > > Am 27.09.2019 um 19:01 schrieb Yue Shen <sh...@gmail.com>:
> > >
> > > Dear ZooKeeper users,
> > >
> > > I have a special use case, in which I use AWS lambda service.
> > >
> > > Inside the lambda service logic, it goes to ZooKeeper to check the
> worker
> > > for the data, if exists,  connect to the worker endpoint and send the
> > data.
> > > If the worker isn't assigned, the logic will post a new assignment, and
> > > wait for it to be assigned to a worker. There is a coordinator to watch
> > the
> > > new assignment and assign tasks.
> > >
> > > My problem comes with AWS Lambda service, which can launch tens of
> > > thousands of calls. When this happens, I found many calls get timeout.
> > The
> > > active connections to ZooKeeper plateau around 6500.
> > >
> > > BTW, I run ZooKeeper as 3 node ensemble, run on Quorum.
> > >
> > > How can I scale ZooKeeper to support more concurrent connections?
> > >
> > > Thank you,
> > > Yue
> >
>

Re: How to scale ZooKeeper to support 10K concurrent connections?

Posted by Enrico Olivelli <eo...@gmail.com>.
Yue
As Jorn said you should introduce some kind of scalable middleware.
Zookeeper shouldn't be used in the hotpath.

If you want something simple and based on components from Zookeeper
ecosystem maybe you can give a look to this simple task broker service,
Majordodo (full disclosure, I am one of the maintainers of the project)

https://github.com/diennea/majordodo

Enrico


Il ven 27 set 2019, 19:39 Jörn Franke <jo...@gmail.com> ha scritto:

> Put the Solr request on a SQS queue using your 10k instances and have 10
> or so worker working on the queue to put it in Solr. Having 10k connections
> just because lambda creates that many instances does not make sense for no
> database service.
>
> > Am 27.09.2019 um 19:01 schrieb Yue Shen <sh...@gmail.com>:
> >
> > Dear ZooKeeper users,
> >
> > I have a special use case, in which I use AWS lambda service.
> >
> > Inside the lambda service logic, it goes to ZooKeeper to check the worker
> > for the data, if exists,  connect to the worker endpoint and send the
> data.
> > If the worker isn't assigned, the logic will post a new assignment, and
> > wait for it to be assigned to a worker. There is a coordinator to watch
> the
> > new assignment and assign tasks.
> >
> > My problem comes with AWS Lambda service, which can launch tens of
> > thousands of calls. When this happens, I found many calls get timeout.
> The
> > active connections to ZooKeeper plateau around 6500.
> >
> > BTW, I run ZooKeeper as 3 node ensemble, run on Quorum.
> >
> > How can I scale ZooKeeper to support more concurrent connections?
> >
> > Thank you,
> > Yue
>

Re: How to scale ZooKeeper to support 10K concurrent connections?

Posted by Jörn Franke <jo...@gmail.com>.
Hi,

Sorry yes Solr I was in another email. 
I believe 2 months are time enough to create two SQS queues and corresponding Lambda functions. Doing a denial of service attack on your zookeeper ensemble will not help. 
If time allows I would try to use Amazon DynamoDb instead of zookeeper as it looks like you are using ZooKeeper in a scenario should not be used. 
I would probably also not use Kafka it a managed service for the same is available.
 However, I don’t know your exact business case and those are just ideas.

> Am 27.09.2019 um 21:07 schrieb Yue Shen <sh...@gmail.com>:
> 
> Thank you, Jorn.
> 
> We don't use Solr. We inherited this architecture from another team, and we
> don't have time to redesign a new system to scale in 2 months.
> 
> As you said, if I were to design it, I would definitely put a queue in
> front of Lambda service, our new design is actually on the way with Kafka
> upfront. However we need to scale it out with the coming holiday
> season before we can roll out the new system, which is just kicked off a
> couple of weeks ago.
> 
> At this point, we want to tune ZooKeeper so it can handle 10K concurrent
> calls. Any suggestions?
> 
> Thank you,
> Yue
> 
>> On Fri, Sep 27, 2019 at 10:39 AM Jörn Franke <jo...@gmail.com> wrote:
>> 
>> Put the Solr request on a SQS queue using your 10k instances and have 10
>> or so worker working on the queue to put it in Solr. Having 10k connections
>> just because lambda creates that many instances does not make sense for no
>> database service.
>> 
>>>> Am 27.09.2019 um 19:01 schrieb Yue Shen <sh...@gmail.com>:
>>> 
>>> Dear ZooKeeper users,
>>> 
>>> I have a special use case, in which I use AWS lambda service.
>>> 
>>> Inside the lambda service logic, it goes to ZooKeeper to check the worker
>>> for the data, if exists,  connect to the worker endpoint and send the
>> data.
>>> If the worker isn't assigned, the logic will post a new assignment, and
>>> wait for it to be assigned to a worker. There is a coordinator to watch
>> the
>>> new assignment and assign tasks.
>>> 
>>> My problem comes with AWS Lambda service, which can launch tens of
>>> thousands of calls. When this happens, I found many calls get timeout.
>> The
>>> active connections to ZooKeeper plateau around 6500.
>>> 
>>> BTW, I run ZooKeeper as 3 node ensemble, run on Quorum.
>>> 
>>> How can I scale ZooKeeper to support more concurrent connections?
>>> 
>>> Thank you,
>>> Yue
>> 

Re: How to scale ZooKeeper to support 10K concurrent connections?

Posted by Patrick Hunt <ph...@apache.org>.
Whether or not you can use local sessions is a critical aspect:
https://issues.apache.org/jira/browse/ZOOKEEPER-1147

Patrick

On Fri, Sep 27, 2019 at 2:27 PM Michael Han <ha...@apache.org> wrote:

> >> can launch tens of thousands of calls
>
> Is it possible for you to quantify this in a form of (read and write)
> request per second, and the average request payload if it's OK to disclose?
> This information is critical on shaping the best scaling solution.
>
> Without knowing any of ballpark numbers of your system and workload
> characteristics, one immediate experiment you could do is to set up
> Observer servers, remove quorum servers from serving client traffic by
> redirecting all of your client traffic to observer servers. This will at
> least scale your concurrent connections linearly with the number of
> observers. This will also scale concurrent requests processing capabilities
> for read requests (and to a limited extend, for write requests as well),
> but request processing scaling is harder and depends on your workload
> characteristics.
>
>
>
> On Fri, Sep 27, 2019 at 12:07 PM Yue Shen <sh...@gmail.com> wrote:
>
> > Thank you, Jorn.
> >
> > We don't use Solr. We inherited this architecture from another team, and
> we
> > don't have time to redesign a new system to scale in 2 months.
> >
> > As you said, if I were to design it, I would definitely put a queue in
> > front of Lambda service, our new design is actually on the way with Kafka
> > upfront. However we need to scale it out with the coming holiday
> > season before we can roll out the new system, which is just kicked off a
> > couple of weeks ago.
> >
> > At this point, we want to tune ZooKeeper so it can handle 10K concurrent
> > calls. Any suggestions?
> >
> > Thank you,
> > Yue
> >
> > On Fri, Sep 27, 2019 at 10:39 AM Jörn Franke <jo...@gmail.com>
> wrote:
> >
> > > Put the Solr request on a SQS queue using your 10k instances and have
> 10
> > > or so worker working on the queue to put it in Solr. Having 10k
> > connections
> > > just because lambda creates that many instances does not make sense for
> > no
> > > database service.
> > >
> > > > Am 27.09.2019 um 19:01 schrieb Yue Shen <sh...@gmail.com>:
> > > >
> > > > Dear ZooKeeper users,
> > > >
> > > > I have a special use case, in which I use AWS lambda service.
> > > >
> > > > Inside the lambda service logic, it goes to ZooKeeper to check the
> > worker
> > > > for the data, if exists,  connect to the worker endpoint and send the
> > > data.
> > > > If the worker isn't assigned, the logic will post a new assignment,
> and
> > > > wait for it to be assigned to a worker. There is a coordinator to
> watch
> > > the
> > > > new assignment and assign tasks.
> > > >
> > > > My problem comes with AWS Lambda service, which can launch tens of
> > > > thousands of calls. When this happens, I found many calls get
> timeout.
> > > The
> > > > active connections to ZooKeeper plateau around 6500.
> > > >
> > > > BTW, I run ZooKeeper as 3 node ensemble, run on Quorum.
> > > >
> > > > How can I scale ZooKeeper to support more concurrent connections?
> > > >
> > > > Thank you,
> > > > Yue
> > >
> >
>

Re: How to scale ZooKeeper to support 10K concurrent connections?

Posted by Michael Han <ha...@apache.org>.
>> can launch tens of thousands of calls

Is it possible for you to quantify this in a form of (read and write)
request per second, and the average request payload if it's OK to disclose?
This information is critical on shaping the best scaling solution.

Without knowing any of ballpark numbers of your system and workload
characteristics, one immediate experiment you could do is to set up
Observer servers, remove quorum servers from serving client traffic by
redirecting all of your client traffic to observer servers. This will at
least scale your concurrent connections linearly with the number of
observers. This will also scale concurrent requests processing capabilities
for read requests (and to a limited extend, for write requests as well),
but request processing scaling is harder and depends on your workload
characteristics.



On Fri, Sep 27, 2019 at 12:07 PM Yue Shen <sh...@gmail.com> wrote:

> Thank you, Jorn.
>
> We don't use Solr. We inherited this architecture from another team, and we
> don't have time to redesign a new system to scale in 2 months.
>
> As you said, if I were to design it, I would definitely put a queue in
> front of Lambda service, our new design is actually on the way with Kafka
> upfront. However we need to scale it out with the coming holiday
> season before we can roll out the new system, which is just kicked off a
> couple of weeks ago.
>
> At this point, we want to tune ZooKeeper so it can handle 10K concurrent
> calls. Any suggestions?
>
> Thank you,
> Yue
>
> On Fri, Sep 27, 2019 at 10:39 AM Jörn Franke <jo...@gmail.com> wrote:
>
> > Put the Solr request on a SQS queue using your 10k instances and have 10
> > or so worker working on the queue to put it in Solr. Having 10k
> connections
> > just because lambda creates that many instances does not make sense for
> no
> > database service.
> >
> > > Am 27.09.2019 um 19:01 schrieb Yue Shen <sh...@gmail.com>:
> > >
> > > Dear ZooKeeper users,
> > >
> > > I have a special use case, in which I use AWS lambda service.
> > >
> > > Inside the lambda service logic, it goes to ZooKeeper to check the
> worker
> > > for the data, if exists,  connect to the worker endpoint and send the
> > data.
> > > If the worker isn't assigned, the logic will post a new assignment, and
> > > wait for it to be assigned to a worker. There is a coordinator to watch
> > the
> > > new assignment and assign tasks.
> > >
> > > My problem comes with AWS Lambda service, which can launch tens of
> > > thousands of calls. When this happens, I found many calls get timeout.
> > The
> > > active connections to ZooKeeper plateau around 6500.
> > >
> > > BTW, I run ZooKeeper as 3 node ensemble, run on Quorum.
> > >
> > > How can I scale ZooKeeper to support more concurrent connections?
> > >
> > > Thank you,
> > > Yue
> >
>

Re: How to scale ZooKeeper to support 10K concurrent connections?

Posted by Yue Shen <sh...@gmail.com>.
Thank you, Jorn.

We don't use Solr. We inherited this architecture from another team, and we
don't have time to redesign a new system to scale in 2 months.

As you said, if I were to design it, I would definitely put a queue in
front of Lambda service, our new design is actually on the way with Kafka
upfront. However we need to scale it out with the coming holiday
season before we can roll out the new system, which is just kicked off a
couple of weeks ago.

At this point, we want to tune ZooKeeper so it can handle 10K concurrent
calls. Any suggestions?

Thank you,
Yue

On Fri, Sep 27, 2019 at 10:39 AM Jörn Franke <jo...@gmail.com> wrote:

> Put the Solr request on a SQS queue using your 10k instances and have 10
> or so worker working on the queue to put it in Solr. Having 10k connections
> just because lambda creates that many instances does not make sense for no
> database service.
>
> > Am 27.09.2019 um 19:01 schrieb Yue Shen <sh...@gmail.com>:
> >
> > Dear ZooKeeper users,
> >
> > I have a special use case, in which I use AWS lambda service.
> >
> > Inside the lambda service logic, it goes to ZooKeeper to check the worker
> > for the data, if exists,  connect to the worker endpoint and send the
> data.
> > If the worker isn't assigned, the logic will post a new assignment, and
> > wait for it to be assigned to a worker. There is a coordinator to watch
> the
> > new assignment and assign tasks.
> >
> > My problem comes with AWS Lambda service, which can launch tens of
> > thousands of calls. When this happens, I found many calls get timeout.
> The
> > active connections to ZooKeeper plateau around 6500.
> >
> > BTW, I run ZooKeeper as 3 node ensemble, run on Quorum.
> >
> > How can I scale ZooKeeper to support more concurrent connections?
> >
> > Thank you,
> > Yue
>

Re: How to scale ZooKeeper to support 10K concurrent connections?

Posted by Jörn Franke <jo...@gmail.com>.
Put the Solr request on a SQS queue using your 10k instances and have 10 or so worker working on the queue to put it in Solr. Having 10k connections just because lambda creates that many instances does not make sense for no database service.

> Am 27.09.2019 um 19:01 schrieb Yue Shen <sh...@gmail.com>:
> 
> Dear ZooKeeper users,
> 
> I have a special use case, in which I use AWS lambda service.
> 
> Inside the lambda service logic, it goes to ZooKeeper to check the worker
> for the data, if exists,  connect to the worker endpoint and send the data.
> If the worker isn't assigned, the logic will post a new assignment, and
> wait for it to be assigned to a worker. There is a coordinator to watch the
> new assignment and assign tasks.
> 
> My problem comes with AWS Lambda service, which can launch tens of
> thousands of calls. When this happens, I found many calls get timeout. The
> active connections to ZooKeeper plateau around 6500.
> 
> BTW, I run ZooKeeper as 3 node ensemble, run on Quorum.
> 
> How can I scale ZooKeeper to support more concurrent connections?
> 
> Thank you,
> Yue