You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by Varun Sharma <va...@pinterest.com> on 2015/02/03 01:23:28 UTC

Excessive ZooKeeper load

Hi,

We are serving a few different resources whose total # of partitions is ~
30K. We just did a rolling restart fo the cluster and the clients which use
the RoutingTableProvider are stuck in a bad state where they are constantly
subscribing to changes in the external view of a cluster. Here is the helix
log on the client after our rolling restart was finished - the client is
constantly polling ZK. The zookeeper node is pushing 300mbps right now and
most of the traffic is being pulled by clients. Is this a race condition -
also is there an easy way to make the clients not poll so aggressively. We
restarted one of the clients and we don't see these same messages anymore.
Also is it possible to just propagate external view diffs instead of the
whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
child-change. path: /main_a/EXTERNALVIEW, listener:
org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
child-change. path: /main_a/EXTERNALVIEW, listener:
org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
/main_a/EXTERNALVIEW
listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by kishore g <g....@gmail.com>.

Hi Varun,

It does not distinguish between the add/updates/deletes. It simply reads
every thing and dumps the old information. In general, coding against
against delta is hard and error prone. This is also incorrect in case of
Zookeeper because the watches are one time event. For example, lets say A
changed at T0, when we get the notification we need to set the watch again
to get notified of changes. But we will miss any change that occurs between
the two operations (receiving a notification and adding the watcher back).
So the right thing to do is once you get a notification, set the watcher
again and then read the contents. This ensures that we dont miss any
changes but this also means we cant figure out the exact changes and their
order. We can however compute the delta on the client side by maintaining a
cache.




On Fri, Feb 6, 2015 at 10:27 AM, Varun Sharma <va...@pinterest.com> wrote:

> How does the original RoutingTableProvider distinguish deletions from
> add/updates ?
>
> On Fri, Feb 6, 2015 at 10:23 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  Hi Varun, for the batching update. Helix controller is not updating
>> external view on every update. Normally Helix controller will aggregate
>> updates during a period of time. Say for 100 partitions, if they are
>> updated roughly as the same time, then Helix controller will update
>> external view only once. For routing table, what do you mean by ignoring
>> delete events? RoutingTable will always be updated by ZK callbacks and sync
>> up with the corresponding external views on ZK.
>>
>>  Thanks,
>> Jason
>>
>>  ------------------------------
>> *From:* Varun Sharma [varun@pinterest.com]
>> *Sent:* Thursday, February 05, 2015 9:17 PM
>>
>> *To:* user@helix.apache.org
>> *Subject:* Re: Excessive ZooKeeper load
>>
>>   One more question for the routing table provider - is it possible to
>> distinguish b/w add/modify and delete - I essentially want to ignore the
>> delete events - can that be found by looking at the list of ExternalView(s)
>> being passed ?
>>
>>  Thanks
>> Varun
>>
>> On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <va...@pinterest.com> wrote:
>>
>>> I see - one more thing - there was talk of a batching mode where Helix
>>> can batch updates - can it batch multiple updates  to the external view and
>>> write once into zookeeper instead of writing for every update. For example,
>>> consider the case when lots of partitions are being onlined - if we could
>>> batch updates to the external view into batches of 100 ? Is that supported
>>> in Helix 0.6.4
>>>
>>>  Thanks !
>>>  Varun
>>>
>>> On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>
>>>>  Yes. the listener will be notified on add/delete/modify. You can
>>>> distinguish if you have a local cache and compare to get the delta.
>>>> Currently the API doesn't expose this.
>>>>
>>>>  ------------------------------
>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>> *Sent:* Thursday, February 05, 2015 1:53 PM
>>>>
>>>> *To:* user@helix.apache.org
>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>
>>>>    I assume that it also gets called when external views get modified
>>>> ? How can i distinguish if there was an Add, a modify or a delete ?
>>>>
>>>>  Thanks
>>>> Varun
>>>>
>>>> On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>>
>>>>>  Yes. It will get invoked when external views are added or deleted.
>>>>>  ------------------------------
>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>> *Sent:* Thursday, February 05, 2015 1:27 AM
>>>>>
>>>>> *To:* user@helix.apache.org
>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>
>>>>>    I had another question - does the RoutingTableProvider
>>>>> onExternalViewChange call get invoked when a resource gets deleted (and
>>>>> hence its external view znode) ?
>>>>>
>>>>> On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>
>>>>> wrote:
>>>>>
>>>>>>  Yes. I think we did this in the incubating stage or even before.
>>>>>> It's probably in a separate branch for some performance evaluation.
>>>>>>
>>>>>>  ------------------------------
>>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>>> *Sent:* Wednesday, February 04, 2015 9:54 PM
>>>>>>
>>>>>> *To:* user@helix.apache.org
>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>
>>>>>>    Jason, I remember having the ability to compress/decompress and
>>>>>> before we added the support to bucketize, compression was used to support
>>>>>> large number of partitions. However I dont see the code anywhere. Did we do
>>>>>> this on a separate branch?
>>>>>>
>>>>>>  thanks,
>>>>>> Kishore G
>>>>>>
>>>>>> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>> wrote:
>>>>>>
>>>>>>>  Hi Varun, we can certainly add compression and have a config for
>>>>>>> turning it on/off. We do have implemented compression in our own zkclient
>>>>>>> before. The issue for compression might be:
>>>>>>> 1) cpu consumption on controller will increase.
>>>>>>> 2) hard to debug
>>>>>>>
>>>>>>>  Thanks,
>>>>>>> Jason
>>>>>>>  ------------------------------
>>>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>>>> *Sent:* Wednesday, February 04, 2015 3:08 PM
>>>>>>>
>>>>>>> *To:* user@helix.apache.org
>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>
>>>>>>>    we do have the ability to compress the data. I am not sure if
>>>>>>> there is a easy way to turn on/off the compression.
>>>>>>>
>>>>>>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am wondering if its possible to gzip the external view znode - a
>>>>>>>> simple gzip cut down the data size by 25X. Is it possible to plug in
>>>>>>>> compression/decompression as zookeeper nodes are read ?
>>>>>>>>
>>>>>>>>  Varun
>>>>>>>>
>>>>>>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> There are multiple options we can try here.
>>>>>>>>> what if we used cacheddataaccessor for this use case?.clients will
>>>>>>>>> only read if node has changed. This optimization can benefit all use cases.
>>>>>>>>>
>>>>>>>>> What about batching the watch triggers. Not sure which version of
>>>>>>>>> helix has this option.
>>>>>>>>>
>>>>>>>>> Another option is to use a poll based roundtable instead of watch
>>>>>>>>> based. This can coupled with cacheddataaccessor can be over efficient.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kishore G
>>>>>>>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> My total external view across all resources is roughly 3M in size
>>>>>>>>>> and there are 100 clients downloading it twice for every node restart -
>>>>>>>>>> thats 600M of data for every restart. So I guess that is causing this
>>>>>>>>>> issue. We are thinking of doing some tricks to limit the # of clients to 1
>>>>>>>>>> from 100. I guess that should help significantly.
>>>>>>>>>>
>>>>>>>>>>  Varun
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>  Hey Varun,
>>>>>>>>>>>
>>>>>>>>>>>  I guess your external view is pretty large, since each
>>>>>>>>>>> external view callback takes ~3s. The RoutingTableProvider is
>>>>>>>>>>> callback based, so only when there is a change in the external view,
>>>>>>>>>>> RoutingTableProvider will read the entire external view from ZK. During the
>>>>>>>>>>> rolling upgrade, there are lots of live instance change, which may lead to
>>>>>>>>>>> a lot of changes in the external view. One possible way to mitigate the
>>>>>>>>>>> issue is to smooth the traffic by having some delays in between bouncing
>>>>>>>>>>> nodes. We can do a rough estimation on how many external view changes you
>>>>>>>>>>> might have during the upgrade, how many listeners you have, and how large
>>>>>>>>>>> is the external views. Once we have these numbers, we might know the ZK
>>>>>>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK
>>>>>>>>>>> observers.
>>>>>>>>>>>
>>>>>>>>>>>  ZK watcher is one time only, so every time a listener receives
>>>>>>>>>>> a callback, it will re-register its watcher again to ZK.
>>>>>>>>>>>
>>>>>>>>>>>  It's normally unreliable to depend on delta changes instead of
>>>>>>>>>>> reading the entire znode. There might be some corner cases where you would
>>>>>>>>>>> lose delta changes if you depend on that.
>>>>>>>>>>>
>>>>>>>>>>>  For the ZK connection issue, do you have any log on the ZK
>>>>>>>>>>> server side regarding this connection?
>>>>>>>>>>>
>>>>>>>>>>>  Thanks,
>>>>>>>>>>> Jason
>>>>>>>>>>>
>>>>>>>>>>>   ------------------------------
>>>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>>>>>
>>>>>>>>>>>    I believe there is a misbehaving client. Here is a stack
>>>>>>>>>>> trace - it probably lost connection and is now stampeding it:
>>>>>>>>>>>
>>>>>>>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>>>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10
>>>>>>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>>>>>>>
>>>>>>>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>>>>>>>
>>>>>>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>>>>>>
>>>>>>>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>>>>>>>
>>>>>>>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>>>>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>> client.ZkClient$11.call(ZkClient.java:823)
>>>>>>>>>>>
>>>>>>>>>>> *        at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>>>>>>>
>>>>>>>>>>> *        at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>>>>>>>
>>>>>>>>>>> *        at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>>>>>>>
>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>>>>>>>
>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>>>>>>>
>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>>>>>>
>>>>>>>>>>>         - locked <0x000000056b75a948> (a
>>>>>>>>>>> org.apache.helix.manager.zk.ZKHelixManager)
>>>>>>>>>>>
>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <
>>>>>>>>>>> varun@pinterest.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am wondering what is causing the zk subscription to happen
>>>>>>>>>>>> every 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>>>>>>>
>>>>>>>>>>>>  Thanks
>>>>>>>>>>>>  Varun
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <
>>>>>>>>>>>> varun@pinterest.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>  We are serving a few different resources whose total # of
>>>>>>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

RE: Excessive ZooKeeper load

Posted by Zhen Zhang <zz...@linkedin.com>.

yes. I am on Helix IRC. Feel free to join.

________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Friday, February 06, 2015 10:40 AM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

So does the routing table provider receive updates for all the external views - does the "list<ExternalView>" always contain all the external views ?

Varun

On Fri, Feb 6, 2015 at 10:37 AM, Zhen Zhang <zz...@linkedin.com>> wrote:
It doesn't distinguish. RoutingTableProvide is always trying to keep its content the same as those on ZK.

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Friday, February 06, 2015 10:27 AM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

How does the original RoutingTableProvider distinguish deletions from add/updates ?

On Fri, Feb 6, 2015 at 10:23 AM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hi Varun, for the batching update. Helix controller is not updating external view on every update. Normally Helix controller will aggregate updates during a period of time. Say for 100 partitions, if they are updated roughly as the same time, then Helix controller will update external view only once. For routing table, what do you mean by ignoring delete events? RoutingTable will always be updated by ZK callbacks and sync up with the corresponding external views on ZK.

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 9:17 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

One more question for the routing table provider - is it possible to distinguish b/w add/modify and delete - I essentially want to ignore the delete events - can that be found by looking at the list of ExternalView(s) being passed ?

Thanks
Varun

On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <va...@pinterest.com>> wrote:
I see - one more thing - there was talk of a batching mode where Helix can batch updates - can it batch multiple updates  to the external view and write once into zookeeper instead of writing for every update. For example, consider the case when lots of partitions are being onlined - if we could batch updates to the external view into batches of 100 ? Is that supported in Helix 0.6.4

Thanks !
Varun

On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. the listener will be notified on add/delete/modify. You can distinguish if you have a local cache and compare to get the delta. Currently the API doesn't expose this.

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 1:53 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I assume that it also gets called when external views get modified ? How can i distinguish if there was an Add, a modify or a delete ?

Thanks
Varun

On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. It will get invoked when external views are added or deleted.
________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 1:27 AM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I had another question - does the RoutingTableProvider onExternalViewChange call get invoked when a resource gets deleted (and hence its external view znode) ?

On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. I think we did this in the incubating stage or even before. It's probably in a separate branch for some performance evaluation.

________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 9:54 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

Jason, I remember having the ability to compress/decompress and  before we added the support to bucketize, compression was used to support large number of partitions. However I dont see the code anywhere. Did we do this on a separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hi Varun, we can certainly add compression and have a config for turning it on/off. We do have implemented compression in our own zkclient before. The issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason
________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 3:08 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure if there is a easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering if its possible to gzip the external view znode - a simple gzip cut down the data size by 25X. Is it possible to plug in compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only read if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has this option.

Another option is to use a poll based roundtable instead of watch based. This can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>> wrote:
My total external view across all resources is roughly 3M in size and there are 100 clients downloading it twice for every node restart - thats 600M of data for every restart. So I guess that is causing this issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hey Varun,

I guess your external view is pretty large, since each external view callback takes ~3s. The RoutingTableProvider is callback based, so only when there is a change in the external view, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live instance change, which may lead to a lot of changes in the external view. One possible way to mitigate the issue is to smooth the traffic by having some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might have during the upgrade, how many listeners you have, and how large is the external views. Once we have these numbers, we might know the ZK bandwidth requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the entire znode. There might be some corner cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side regarding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 30K. We just did a rolling restart fo the cluster and the clients which use the RoutingTableProvider are stuck in a bad state where they are constantly subscribing to changes in the external view of a cluster. Here is the helix log on the client after our rolling restart was finished - the client is constantly polling ZK. The zookeeper node is pushing 300mbps right now and most of the traffic is being pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so aggressively. We restarted one of the clients and we don't see these same messages anymore. Also is it possible to just propagate external view diffs instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by kishore g <g....@gmail.com>.

Yes, that is correct. If you want to know the changes. Simply maintain a
local map of <resourceId, version> the version is the zookeeper znode
version, every time there is a change  in external view for a resource, the
version gets incremented.

On Fri, Feb 6, 2015 at 10:40 AM, Varun Sharma <va...@pinterest.com> wrote:

> So does the routing table provider receive updates for all the external
> views - does the "list<ExternalView>" always contain all the external views
> ?
>
> Varun
>
> On Fri, Feb 6, 2015 at 10:37 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  It doesn't distinguish. RoutingTableProvide is always trying to keep
>> its content the same as those on ZK.
>>
>>  ------------------------------
>> *From:* Varun Sharma [varun@pinterest.com]
>> *Sent:* Friday, February 06, 2015 10:27 AM
>>
>> *To:* user@helix.apache.org
>> *Subject:* Re: Excessive ZooKeeper load
>>
>>   How does the original RoutingTableProvider distinguish deletions from
>> add/updates ?
>>
>> On Fri, Feb 6, 2015 at 10:23 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>>
>>>  Hi Varun, for the batching update. Helix controller is not updating
>>> external view on every update. Normally Helix controller will aggregate
>>> updates during a period of time. Say for 100 partitions, if they are
>>> updated roughly as the same time, then Helix controller will update
>>> external view only once. For routing table, what do you mean by ignoring
>>> delete events? RoutingTable will always be updated by ZK callbacks and sync
>>> up with the corresponding external views on ZK.
>>>
>>>  Thanks,
>>> Jason
>>>
>>>  ------------------------------
>>> *From:* Varun Sharma [varun@pinterest.com]
>>> *Sent:* Thursday, February 05, 2015 9:17 PM
>>>
>>> *To:* user@helix.apache.org
>>> *Subject:* Re: Excessive ZooKeeper load
>>>
>>>    One more question for the routing table provider - is it possible to
>>> distinguish b/w add/modify and delete - I essentially want to ignore the
>>> delete events - can that be found by looking at the list of ExternalView(s)
>>> being passed ?
>>>
>>>  Thanks
>>> Varun
>>>
>>> On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <va...@pinterest.com>
>>> wrote:
>>>
>>>> I see - one more thing - there was talk of a batching mode where Helix
>>>> can batch updates - can it batch multiple updates  to the external view and
>>>> write once into zookeeper instead of writing for every update. For example,
>>>> consider the case when lots of partitions are being onlined - if we could
>>>> batch updates to the external view into batches of 100 ? Is that supported
>>>> in Helix 0.6.4
>>>>
>>>>  Thanks !
>>>>  Varun
>>>>
>>>> On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>>
>>>>>  Yes. the listener will be notified on add/delete/modify. You can
>>>>> distinguish if you have a local cache and compare to get the delta.
>>>>> Currently the API doesn't expose this.
>>>>>
>>>>>  ------------------------------
>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>> *Sent:* Thursday, February 05, 2015 1:53 PM
>>>>>
>>>>> *To:* user@helix.apache.org
>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>
>>>>>    I assume that it also gets called when external views get modified
>>>>> ? How can i distinguish if there was an Add, a modify or a delete ?
>>>>>
>>>>>  Thanks
>>>>> Varun
>>>>>
>>>>> On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com>
>>>>> wrote:
>>>>>
>>>>>>  Yes. It will get invoked when external views are added or deleted.
>>>>>>  ------------------------------
>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>> *Sent:* Thursday, February 05, 2015 1:27 AM
>>>>>>
>>>>>> *To:* user@helix.apache.org
>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>
>>>>>>    I had another question - does the RoutingTableProvider
>>>>>> onExternalViewChange call get invoked when a resource gets deleted (and
>>>>>> hence its external view znode) ?
>>>>>>
>>>>>> On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>> wrote:
>>>>>>
>>>>>>>  Yes. I think we did this in the incubating stage or even before.
>>>>>>> It's probably in a separate branch for some performance evaluation.
>>>>>>>
>>>>>>>  ------------------------------
>>>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>>>> *Sent:* Wednesday, February 04, 2015 9:54 PM
>>>>>>>
>>>>>>> *To:* user@helix.apache.org
>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>
>>>>>>>    Jason, I remember having the ability to compress/decompress and
>>>>>>> before we added the support to bucketize, compression was used to support
>>>>>>> large number of partitions. However I dont see the code anywhere. Did we do
>>>>>>> this on a separate branch?
>>>>>>>
>>>>>>>  thanks,
>>>>>>> Kishore G
>>>>>>>
>>>>>>> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  Hi Varun, we can certainly add compression and have a config for
>>>>>>>> turning it on/off. We do have implemented compression in our own zkclient
>>>>>>>> before. The issue for compression might be:
>>>>>>>> 1) cpu consumption on controller will increase.
>>>>>>>> 2) hard to debug
>>>>>>>>
>>>>>>>>  Thanks,
>>>>>>>> Jason
>>>>>>>>  ------------------------------
>>>>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>>>>> *Sent:* Wednesday, February 04, 2015 3:08 PM
>>>>>>>>
>>>>>>>> *To:* user@helix.apache.org
>>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>>
>>>>>>>>    we do have the ability to compress the data. I am not sure if
>>>>>>>> there is a easy way to turn on/off the compression.
>>>>>>>>
>>>>>>>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am wondering if its possible to gzip the external view znode - a
>>>>>>>>> simple gzip cut down the data size by 25X. Is it possible to plug in
>>>>>>>>> compression/decompression as zookeeper nodes are read ?
>>>>>>>>>
>>>>>>>>>  Varun
>>>>>>>>>
>>>>>>>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> There are multiple options we can try here.
>>>>>>>>>> what if we used cacheddataaccessor for this use case?.clients
>>>>>>>>>> will only read if node has changed. This optimization can benefit all use
>>>>>>>>>> cases.
>>>>>>>>>>
>>>>>>>>>> What about batching the watch triggers. Not sure which version of
>>>>>>>>>> helix has this option.
>>>>>>>>>>
>>>>>>>>>> Another option is to use a poll based roundtable instead of watch
>>>>>>>>>> based. This can coupled with cacheddataaccessor can be over efficient.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Kishore G
>>>>>>>>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> My total external view across all resources is roughly 3M in
>>>>>>>>>>> size and there are 100 clients downloading it twice for every node restart
>>>>>>>>>>> - thats 600M of data for every restart. So I guess that is causing this
>>>>>>>>>>> issue. We are thinking of doing some tricks to limit the # of clients to 1
>>>>>>>>>>> from 100. I guess that should help significantly.
>>>>>>>>>>>
>>>>>>>>>>>  Varun
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  Hey Varun,
>>>>>>>>>>>>
>>>>>>>>>>>>  I guess your external view is pretty large, since each
>>>>>>>>>>>> external view callback takes ~3s. The RoutingTableProvider is
>>>>>>>>>>>> callback based, so only when there is a change in the external view,
>>>>>>>>>>>> RoutingTableProvider will read the entire external view from ZK. During the
>>>>>>>>>>>> rolling upgrade, there are lots of live instance change, which may lead to
>>>>>>>>>>>> a lot of changes in the external view. One possible way to mitigate the
>>>>>>>>>>>> issue is to smooth the traffic by having some delays in between bouncing
>>>>>>>>>>>> nodes. We can do a rough estimation on how many external view changes you
>>>>>>>>>>>> might have during the upgrade, how many listeners you have, and how large
>>>>>>>>>>>> is the external views. Once we have these numbers, we might know the ZK
>>>>>>>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK
>>>>>>>>>>>> observers.
>>>>>>>>>>>>
>>>>>>>>>>>>  ZK watcher is one time only, so every time a listener
>>>>>>>>>>>> receives a callback, it will re-register its watcher again to ZK.
>>>>>>>>>>>>
>>>>>>>>>>>>  It's normally unreliable to depend on delta changes instead
>>>>>>>>>>>> of reading the entire znode. There might be some corner cases where you
>>>>>>>>>>>> would lose delta changes if you depend on that.
>>>>>>>>>>>>
>>>>>>>>>>>>  For the ZK connection issue, do you have any log on the ZK
>>>>>>>>>>>> server side regarding this connection?
>>>>>>>>>>>>
>>>>>>>>>>>>  Thanks,
>>>>>>>>>>>> Jason
>>>>>>>>>>>>
>>>>>>>>>>>>   ------------------------------
>>>>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>>>>>>
>>>>>>>>>>>>    I believe there is a misbehaving client. Here is a stack
>>>>>>>>>>>> trace - it probably lost connection and is now stampeding it:
>>>>>>>>>>>>
>>>>>>>>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>>>>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10
>>>>>>>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>>>>>>>>
>>>>>>>>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>>>>>>>>
>>>>>>>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>>>>>>>
>>>>>>>>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>>>>>>>>
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>>>>>>>>
>>>>>>>>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>>>>>>>>
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>>>>>>>>
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>>>>>>>>
>>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>>>>>>>>
>>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>>> client.ZkClient$11.call(ZkClient.java:823)
>>>>>>>>>>>>
>>>>>>>>>>>> *        at
>>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>>>>>>>>
>>>>>>>>>>>> *        at
>>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>>>>>>>>
>>>>>>>>>>>> *        at
>>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>>>>>>>>
>>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>>>>>>>>
>>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>>>>>>>>
>>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>>>>>>>
>>>>>>>>>>>>         - locked <0x000000056b75a948> (a
>>>>>>>>>>>> org.apache.helix.manager.zk.ZKHelixManager)
>>>>>>>>>>>>
>>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>>>>>>>>
>>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>>> client.ZkClient$6.run(ZkClient.java:547)
>>>>>>>>>>>>
>>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <
>>>>>>>>>>>> varun@pinterest.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am wondering what is causing the zk subscription to happen
>>>>>>>>>>>>> every 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Thanks
>>>>>>>>>>>>>  Varun
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <
>>>>>>>>>>>>> varun@pinterest.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  We are serving a few different resources whose total # of
>>>>>>>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>>>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>>>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>>>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>>>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>>>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>>>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

So does the routing table provider receive updates for all the external
views - does the "list<ExternalView>" always contain all the external views
?

Varun

On Fri, Feb 6, 2015 at 10:37 AM, Zhen Zhang <zz...@linkedin.com> wrote:

>  It doesn't distinguish. RoutingTableProvide is always trying to keep its
> content the same as those on ZK.
>
>  ------------------------------
> *From:* Varun Sharma [varun@pinterest.com]
> *Sent:* Friday, February 06, 2015 10:27 AM
>
> *To:* user@helix.apache.org
> *Subject:* Re: Excessive ZooKeeper load
>
>   How does the original RoutingTableProvider distinguish deletions from
> add/updates ?
>
> On Fri, Feb 6, 2015 at 10:23 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  Hi Varun, for the batching update. Helix controller is not updating
>> external view on every update. Normally Helix controller will aggregate
>> updates during a period of time. Say for 100 partitions, if they are
>> updated roughly as the same time, then Helix controller will update
>> external view only once. For routing table, what do you mean by ignoring
>> delete events? RoutingTable will always be updated by ZK callbacks and sync
>> up with the corresponding external views on ZK.
>>
>>  Thanks,
>> Jason
>>
>>  ------------------------------
>> *From:* Varun Sharma [varun@pinterest.com]
>> *Sent:* Thursday, February 05, 2015 9:17 PM
>>
>> *To:* user@helix.apache.org
>> *Subject:* Re: Excessive ZooKeeper load
>>
>>    One more question for the routing table provider - is it possible to
>> distinguish b/w add/modify and delete - I essentially want to ignore the
>> delete events - can that be found by looking at the list of ExternalView(s)
>> being passed ?
>>
>>  Thanks
>> Varun
>>
>> On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <va...@pinterest.com> wrote:
>>
>>> I see - one more thing - there was talk of a batching mode where Helix
>>> can batch updates - can it batch multiple updates  to the external view and
>>> write once into zookeeper instead of writing for every update. For example,
>>> consider the case when lots of partitions are being onlined - if we could
>>> batch updates to the external view into batches of 100 ? Is that supported
>>> in Helix 0.6.4
>>>
>>>  Thanks !
>>>  Varun
>>>
>>> On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>
>>>>  Yes. the listener will be notified on add/delete/modify. You can
>>>> distinguish if you have a local cache and compare to get the delta.
>>>> Currently the API doesn't expose this.
>>>>
>>>>  ------------------------------
>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>> *Sent:* Thursday, February 05, 2015 1:53 PM
>>>>
>>>> *To:* user@helix.apache.org
>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>
>>>>    I assume that it also gets called when external views get modified
>>>> ? How can i distinguish if there was an Add, a modify or a delete ?
>>>>
>>>>  Thanks
>>>> Varun
>>>>
>>>> On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>>
>>>>>  Yes. It will get invoked when external views are added or deleted.
>>>>>  ------------------------------
>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>> *Sent:* Thursday, February 05, 2015 1:27 AM
>>>>>
>>>>> *To:* user@helix.apache.org
>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>
>>>>>    I had another question - does the RoutingTableProvider
>>>>> onExternalViewChange call get invoked when a resource gets deleted (and
>>>>> hence its external view znode) ?
>>>>>
>>>>> On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>
>>>>> wrote:
>>>>>
>>>>>>  Yes. I think we did this in the incubating stage or even before.
>>>>>> It's probably in a separate branch for some performance evaluation.
>>>>>>
>>>>>>  ------------------------------
>>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>>> *Sent:* Wednesday, February 04, 2015 9:54 PM
>>>>>>
>>>>>> *To:* user@helix.apache.org
>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>
>>>>>>    Jason, I remember having the ability to compress/decompress and
>>>>>> before we added the support to bucketize, compression was used to support
>>>>>> large number of partitions. However I dont see the code anywhere. Did we do
>>>>>> this on a separate branch?
>>>>>>
>>>>>>  thanks,
>>>>>> Kishore G
>>>>>>
>>>>>> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>> wrote:
>>>>>>
>>>>>>>  Hi Varun, we can certainly add compression and have a config for
>>>>>>> turning it on/off. We do have implemented compression in our own zkclient
>>>>>>> before. The issue for compression might be:
>>>>>>> 1) cpu consumption on controller will increase.
>>>>>>> 2) hard to debug
>>>>>>>
>>>>>>>  Thanks,
>>>>>>> Jason
>>>>>>>  ------------------------------
>>>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>>>> *Sent:* Wednesday, February 04, 2015 3:08 PM
>>>>>>>
>>>>>>> *To:* user@helix.apache.org
>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>
>>>>>>>    we do have the ability to compress the data. I am not sure if
>>>>>>> there is a easy way to turn on/off the compression.
>>>>>>>
>>>>>>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am wondering if its possible to gzip the external view znode - a
>>>>>>>> simple gzip cut down the data size by 25X. Is it possible to plug in
>>>>>>>> compression/decompression as zookeeper nodes are read ?
>>>>>>>>
>>>>>>>>  Varun
>>>>>>>>
>>>>>>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> There are multiple options we can try here.
>>>>>>>>> what if we used cacheddataaccessor for this use case?.clients will
>>>>>>>>> only read if node has changed. This optimization can benefit all use cases.
>>>>>>>>>
>>>>>>>>> What about batching the watch triggers. Not sure which version of
>>>>>>>>> helix has this option.
>>>>>>>>>
>>>>>>>>> Another option is to use a poll based roundtable instead of watch
>>>>>>>>> based. This can coupled with cacheddataaccessor can be over efficient.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kishore G
>>>>>>>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> My total external view across all resources is roughly 3M in size
>>>>>>>>>> and there are 100 clients downloading it twice for every node restart -
>>>>>>>>>> thats 600M of data for every restart. So I guess that is causing this
>>>>>>>>>> issue. We are thinking of doing some tricks to limit the # of clients to 1
>>>>>>>>>> from 100. I guess that should help significantly.
>>>>>>>>>>
>>>>>>>>>>  Varun
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>  Hey Varun,
>>>>>>>>>>>
>>>>>>>>>>>  I guess your external view is pretty large, since each
>>>>>>>>>>> external view callback takes ~3s. The RoutingTableProvider is
>>>>>>>>>>> callback based, so only when there is a change in the external view,
>>>>>>>>>>> RoutingTableProvider will read the entire external view from ZK. During the
>>>>>>>>>>> rolling upgrade, there are lots of live instance change, which may lead to
>>>>>>>>>>> a lot of changes in the external view. One possible way to mitigate the
>>>>>>>>>>> issue is to smooth the traffic by having some delays in between bouncing
>>>>>>>>>>> nodes. We can do a rough estimation on how many external view changes you
>>>>>>>>>>> might have during the upgrade, how many listeners you have, and how large
>>>>>>>>>>> is the external views. Once we have these numbers, we might know the ZK
>>>>>>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK
>>>>>>>>>>> observers.
>>>>>>>>>>>
>>>>>>>>>>>  ZK watcher is one time only, so every time a listener receives
>>>>>>>>>>> a callback, it will re-register its watcher again to ZK.
>>>>>>>>>>>
>>>>>>>>>>>  It's normally unreliable to depend on delta changes instead of
>>>>>>>>>>> reading the entire znode. There might be some corner cases where you would
>>>>>>>>>>> lose delta changes if you depend on that.
>>>>>>>>>>>
>>>>>>>>>>>  For the ZK connection issue, do you have any log on the ZK
>>>>>>>>>>> server side regarding this connection?
>>>>>>>>>>>
>>>>>>>>>>>  Thanks,
>>>>>>>>>>> Jason
>>>>>>>>>>>
>>>>>>>>>>>   ------------------------------
>>>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>>>>>
>>>>>>>>>>>    I believe there is a misbehaving client. Here is a stack
>>>>>>>>>>> trace - it probably lost connection and is now stampeding it:
>>>>>>>>>>>
>>>>>>>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>>>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10
>>>>>>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>>>>>>>
>>>>>>>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>>>>>>>
>>>>>>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>>>>>>
>>>>>>>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>>>>>>>
>>>>>>>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>>>>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>> client.ZkClient$11.call(ZkClient.java:823)
>>>>>>>>>>>
>>>>>>>>>>> *        at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>>>>>>>
>>>>>>>>>>> *        at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>>>>>>>
>>>>>>>>>>> *        at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>>>>>>>
>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>>>>>>>
>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>>>>>>>
>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>>>>>>
>>>>>>>>>>>         - locked <0x000000056b75a948> (a
>>>>>>>>>>> org.apache.helix.manager.zk.ZKHelixManager)
>>>>>>>>>>>
>>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <
>>>>>>>>>>> varun@pinterest.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am wondering what is causing the zk subscription to happen
>>>>>>>>>>>> every 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>>>>>>>
>>>>>>>>>>>>  Thanks
>>>>>>>>>>>>  Varun
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <
>>>>>>>>>>>> varun@pinterest.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>  We are serving a few different resources whose total # of
>>>>>>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

RE: Excessive ZooKeeper load

Posted by Zhen Zhang <zz...@linkedin.com>.

It doesn't distinguish. RoutingTableProvide is always trying to keep its content the same as those on ZK.

________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Friday, February 06, 2015 10:27 AM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

How does the original RoutingTableProvider distinguish deletions from add/updates ?

On Fri, Feb 6, 2015 at 10:23 AM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hi Varun, for the batching update. Helix controller is not updating external view on every update. Normally Helix controller will aggregate updates during a period of time. Say for 100 partitions, if they are updated roughly as the same time, then Helix controller will update external view only once. For routing table, what do you mean by ignoring delete events? RoutingTable will always be updated by ZK callbacks and sync up with the corresponding external views on ZK.

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 9:17 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

One more question for the routing table provider - is it possible to distinguish b/w add/modify and delete - I essentially want to ignore the delete events - can that be found by looking at the list of ExternalView(s) being passed ?

Thanks
Varun

On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <va...@pinterest.com>> wrote:
I see - one more thing - there was talk of a batching mode where Helix can batch updates - can it batch multiple updates  to the external view and write once into zookeeper instead of writing for every update. For example, consider the case when lots of partitions are being onlined - if we could batch updates to the external view into batches of 100 ? Is that supported in Helix 0.6.4

Thanks !
Varun

On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. the listener will be notified on add/delete/modify. You can distinguish if you have a local cache and compare to get the delta. Currently the API doesn't expose this.

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 1:53 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I assume that it also gets called when external views get modified ? How can i distinguish if there was an Add, a modify or a delete ?

Thanks
Varun

On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. It will get invoked when external views are added or deleted.
________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 1:27 AM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I had another question - does the RoutingTableProvider onExternalViewChange call get invoked when a resource gets deleted (and hence its external view znode) ?

On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. I think we did this in the incubating stage or even before. It's probably in a separate branch for some performance evaluation.

________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 9:54 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

Jason, I remember having the ability to compress/decompress and  before we added the support to bucketize, compression was used to support large number of partitions. However I dont see the code anywhere. Did we do this on a separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hi Varun, we can certainly add compression and have a config for turning it on/off. We do have implemented compression in our own zkclient before. The issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason
________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 3:08 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure if there is a easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering if its possible to gzip the external view znode - a simple gzip cut down the data size by 25X. Is it possible to plug in compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only read if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has this option.

Another option is to use a poll based roundtable instead of watch based. This can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>> wrote:
My total external view across all resources is roughly 3M in size and there are 100 clients downloading it twice for every node restart - thats 600M of data for every restart. So I guess that is causing this issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hey Varun,

I guess your external view is pretty large, since each external view callback takes ~3s. The RoutingTableProvider is callback based, so only when there is a change in the external view, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live instance change, which may lead to a lot of changes in the external view. One possible way to mitigate the issue is to smooth the traffic by having some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might have during the upgrade, how many listeners you have, and how large is the external views. Once we have these numbers, we might know the ZK bandwidth requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the entire znode. There might be some corner cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side regarding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 30K. We just did a rolling restart fo the cluster and the clients which use the RoutingTableProvider are stuck in a bad state where they are constantly subscribing to changes in the external view of a cluster. Here is the helix log on the client after our rolling restart was finished - the client is constantly polling ZK. The zookeeper node is pushing 300mbps right now and most of the traffic is being pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so aggressively. We restarted one of the clients and we don't see these same messages anymore. Also is it possible to just propagate external view diffs instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

How does the original RoutingTableProvider distinguish deletions from
add/updates ?

On Fri, Feb 6, 2015 at 10:23 AM, Zhen Zhang <zz...@linkedin.com> wrote:

>  Hi Varun, for the batching update. Helix controller is not updating
> external view on every update. Normally Helix controller will aggregate
> updates during a period of time. Say for 100 partitions, if they are
> updated roughly as the same time, then Helix controller will update
> external view only once. For routing table, what do you mean by ignoring
> delete events? RoutingTable will always be updated by ZK callbacks and sync
> up with the corresponding external views on ZK.
>
>  Thanks,
> Jason
>
>  ------------------------------
> *From:* Varun Sharma [varun@pinterest.com]
> *Sent:* Thursday, February 05, 2015 9:17 PM
>
> *To:* user@helix.apache.org
> *Subject:* Re: Excessive ZooKeeper load
>
>   One more question for the routing table provider - is it possible to
> distinguish b/w add/modify and delete - I essentially want to ignore the
> delete events - can that be found by looking at the list of ExternalView(s)
> being passed ?
>
>  Thanks
> Varun
>
> On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> I see - one more thing - there was talk of a batching mode where Helix
>> can batch updates - can it batch multiple updates  to the external view and
>> write once into zookeeper instead of writing for every update. For example,
>> consider the case when lots of partitions are being onlined - if we could
>> batch updates to the external view into batches of 100 ? Is that supported
>> in Helix 0.6.4
>>
>>  Thanks !
>>  Varun
>>
>> On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>
>>>  Yes. the listener will be notified on add/delete/modify. You can
>>> distinguish if you have a local cache and compare to get the delta.
>>> Currently the API doesn't expose this.
>>>
>>>  ------------------------------
>>> *From:* Varun Sharma [varun@pinterest.com]
>>> *Sent:* Thursday, February 05, 2015 1:53 PM
>>>
>>> *To:* user@helix.apache.org
>>> *Subject:* Re: Excessive ZooKeeper load
>>>
>>>    I assume that it also gets called when external views get modified ?
>>> How can i distinguish if there was an Add, a modify or a delete ?
>>>
>>>  Thanks
>>> Varun
>>>
>>> On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>
>>>>  Yes. It will get invoked when external views are added or deleted.
>>>>  ------------------------------
>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>> *Sent:* Thursday, February 05, 2015 1:27 AM
>>>>
>>>> *To:* user@helix.apache.org
>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>
>>>>    I had another question - does the RoutingTableProvider
>>>> onExternalViewChange call get invoked when a resource gets deleted (and
>>>> hence its external view znode) ?
>>>>
>>>> On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>
>>>> wrote:
>>>>
>>>>>  Yes. I think we did this in the incubating stage or even before.
>>>>> It's probably in a separate branch for some performance evaluation.
>>>>>
>>>>>  ------------------------------
>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>> *Sent:* Wednesday, February 04, 2015 9:54 PM
>>>>>
>>>>> *To:* user@helix.apache.org
>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>
>>>>>    Jason, I remember having the ability to compress/decompress and
>>>>> before we added the support to bucketize, compression was used to support
>>>>> large number of partitions. However I dont see the code anywhere. Did we do
>>>>> this on a separate branch?
>>>>>
>>>>>  thanks,
>>>>> Kishore G
>>>>>
>>>>> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>
>>>>> wrote:
>>>>>
>>>>>>  Hi Varun, we can certainly add compression and have a config for
>>>>>> turning it on/off. We do have implemented compression in our own zkclient
>>>>>> before. The issue for compression might be:
>>>>>> 1) cpu consumption on controller will increase.
>>>>>> 2) hard to debug
>>>>>>
>>>>>>  Thanks,
>>>>>> Jason
>>>>>>  ------------------------------
>>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>>> *Sent:* Wednesday, February 04, 2015 3:08 PM
>>>>>>
>>>>>> *To:* user@helix.apache.org
>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>
>>>>>>    we do have the ability to compress the data. I am not sure if
>>>>>> there is a easy way to turn on/off the compression.
>>>>>>
>>>>>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am wondering if its possible to gzip the external view znode - a
>>>>>>> simple gzip cut down the data size by 25X. Is it possible to plug in
>>>>>>> compression/decompression as zookeeper nodes are read ?
>>>>>>>
>>>>>>>  Varun
>>>>>>>
>>>>>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> There are multiple options we can try here.
>>>>>>>> what if we used cacheddataaccessor for this use case?.clients will
>>>>>>>> only read if node has changed. This optimization can benefit all use cases.
>>>>>>>>
>>>>>>>> What about batching the watch triggers. Not sure which version of
>>>>>>>> helix has this option.
>>>>>>>>
>>>>>>>> Another option is to use a poll based roundtable instead of watch
>>>>>>>> based. This can coupled with cacheddataaccessor can be over efficient.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kishore G
>>>>>>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> My total external view across all resources is roughly 3M in size
>>>>>>>>> and there are 100 clients downloading it twice for every node restart -
>>>>>>>>> thats 600M of data for every restart. So I guess that is causing this
>>>>>>>>> issue. We are thinking of doing some tricks to limit the # of clients to 1
>>>>>>>>> from 100. I guess that should help significantly.
>>>>>>>>>
>>>>>>>>>  Varun
>>>>>>>>>
>>>>>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>  Hey Varun,
>>>>>>>>>>
>>>>>>>>>>  I guess your external view is pretty large, since each external
>>>>>>>>>> view callback takes ~3s. The RoutingTableProvider is callback
>>>>>>>>>> based, so only when there is a change in the external view,
>>>>>>>>>> RoutingTableProvider will read the entire external view from ZK. During the
>>>>>>>>>> rolling upgrade, there are lots of live instance change, which may lead to
>>>>>>>>>> a lot of changes in the external view. One possible way to mitigate the
>>>>>>>>>> issue is to smooth the traffic by having some delays in between bouncing
>>>>>>>>>> nodes. We can do a rough estimation on how many external view changes you
>>>>>>>>>> might have during the upgrade, how many listeners you have, and how large
>>>>>>>>>> is the external views. Once we have these numbers, we might know the ZK
>>>>>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK
>>>>>>>>>> observers.
>>>>>>>>>>
>>>>>>>>>>  ZK watcher is one time only, so every time a listener receives
>>>>>>>>>> a callback, it will re-register its watcher again to ZK.
>>>>>>>>>>
>>>>>>>>>>  It's normally unreliable to depend on delta changes instead of
>>>>>>>>>> reading the entire znode. There might be some corner cases where you would
>>>>>>>>>> lose delta changes if you depend on that.
>>>>>>>>>>
>>>>>>>>>>  For the ZK connection issue, do you have any log on the ZK
>>>>>>>>>> server side regarding this connection?
>>>>>>>>>>
>>>>>>>>>>  Thanks,
>>>>>>>>>> Jason
>>>>>>>>>>
>>>>>>>>>>   ------------------------------
>>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>>>>
>>>>>>>>>>    I believe there is a misbehaving client. Here is a stack
>>>>>>>>>> trace - it probably lost connection and is now stampeding it:
>>>>>>>>>>
>>>>>>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10
>>>>>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>>>>>>
>>>>>>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>>>>>>
>>>>>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>>>>>
>>>>>>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>>>>>>
>>>>>>>>>>         at
>>>>>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>>>>>>
>>>>>>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>>>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>>>>>>
>>>>>>>>>>         at
>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>>>>>>
>>>>>>>>>>         at
>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>>>>>>
>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>>>>>>
>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>> client.ZkClient$11.call(ZkClient.java:823)
>>>>>>>>>>
>>>>>>>>>> *        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>>>>>>
>>>>>>>>>> *        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>>>>>>
>>>>>>>>>> *        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>>>>>>
>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>>>>>>
>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>>>>>>
>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>>>>>
>>>>>>>>>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.
>>>>>>>>>> zk.ZKHelixManager)
>>>>>>>>>>
>>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>>>>>>
>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>>>>>>>
>>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <varun@pinterest.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> I am wondering what is causing the zk subscription to happen
>>>>>>>>>>> every 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>>>>>>
>>>>>>>>>>>  Thanks
>>>>>>>>>>>  Varun
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <
>>>>>>>>>>> varun@pinterest.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>  We are serving a few different resources whose total # of
>>>>>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>>>>>>
>>>>>>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>>>>>>
>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>
>>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>>
>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>>>>>>
>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>
>>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>>
>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>>>>>>
>>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

RE: Excessive ZooKeeper load

Posted by Zhen Zhang <zz...@linkedin.com>.

Hi Varun, for the batching update. Helix controller is not updating external view on every update. Normally Helix controller will aggregate updates during a period of time. Say for 100 partitions, if they are updated roughly as the same time, then Helix controller will update external view only once. For routing table, what do you mean by ignoring delete events? RoutingTable will always be updated by ZK callbacks and sync up with the corresponding external views on ZK.

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Thursday, February 05, 2015 9:17 PM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

One more question for the routing table provider - is it possible to distinguish b/w add/modify and delete - I essentially want to ignore the delete events - can that be found by looking at the list of ExternalView(s) being passed ?

Thanks
Varun

On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <va...@pinterest.com>> wrote:
I see - one more thing - there was talk of a batching mode where Helix can batch updates - can it batch multiple updates  to the external view and write once into zookeeper instead of writing for every update. For example, consider the case when lots of partitions are being onlined - if we could batch updates to the external view into batches of 100 ? Is that supported in Helix 0.6.4

Thanks !
Varun

On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. the listener will be notified on add/delete/modify. You can distinguish if you have a local cache and compare to get the delta. Currently the API doesn't expose this.

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 1:53 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I assume that it also gets called when external views get modified ? How can i distinguish if there was an Add, a modify or a delete ?

Thanks
Varun

On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. It will get invoked when external views are added or deleted.
________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 1:27 AM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I had another question - does the RoutingTableProvider onExternalViewChange call get invoked when a resource gets deleted (and hence its external view znode) ?

On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. I think we did this in the incubating stage or even before. It's probably in a separate branch for some performance evaluation.

________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 9:54 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

Jason, I remember having the ability to compress/decompress and  before we added the support to bucketize, compression was used to support large number of partitions. However I dont see the code anywhere. Did we do this on a separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hi Varun, we can certainly add compression and have a config for turning it on/off. We do have implemented compression in our own zkclient before. The issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason
________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 3:08 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure if there is a easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering if its possible to gzip the external view znode - a simple gzip cut down the data size by 25X. Is it possible to plug in compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only read if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has this option.

Another option is to use a poll based roundtable instead of watch based. This can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>> wrote:
My total external view across all resources is roughly 3M in size and there are 100 clients downloading it twice for every node restart - thats 600M of data for every restart. So I guess that is causing this issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hey Varun,

I guess your external view is pretty large, since each external view callback takes ~3s. The RoutingTableProvider is callback based, so only when there is a change in the external view, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live instance change, which may lead to a lot of changes in the external view. One possible way to mitigate the issue is to smooth the traffic by having some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might have during the upgrade, how many listeners you have, and how large is the external views. Once we have these numbers, we might know the ZK bandwidth requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the entire znode. There might be some corner cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side regarding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 30K. We just did a rolling restart fo the cluster and the clients which use the RoutingTableProvider are stuck in a bad state where they are constantly subscribing to changes in the external view of a cluster. Here is the helix log on the client after our rolling restart was finished - the client is constantly polling ZK. The zookeeper node is pushing 300mbps right now and most of the traffic is being pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so aggressively. We restarted one of the clients and we don't see these same messages anymore. Also is it possible to just propagate external view diffs instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

One more question for the routing table provider - is it possible to
distinguish b/w add/modify and delete - I essentially want to ignore the
delete events - can that be found by looking at the list of ExternalView(s)
being passed ?

Thanks
Varun

On Thu, Feb 5, 2015 at 8:48 PM, Varun Sharma <va...@pinterest.com> wrote:

> I see - one more thing - there was talk of a batching mode where Helix can
> batch updates - can it batch multiple updates  to the external view and
> write once into zookeeper instead of writing for every update. For example,
> consider the case when lots of partitions are being onlined - if we could
> batch updates to the external view into batches of 100 ? Is that supported
> in Helix 0.6.4
>
> Thanks !
> Varun
>
> On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  Yes. the listener will be notified on add/delete/modify. You can
>> distinguish if you have a local cache and compare to get the delta.
>> Currently the API doesn't expose this.
>>
>>  ------------------------------
>> *From:* Varun Sharma [varun@pinterest.com]
>> *Sent:* Thursday, February 05, 2015 1:53 PM
>>
>> *To:* user@helix.apache.org
>> *Subject:* Re: Excessive ZooKeeper load
>>
>>   I assume that it also gets called when external views get modified ?
>> How can i distinguish if there was an Add, a modify or a delete ?
>>
>>  Thanks
>> Varun
>>
>> On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>>
>>>  Yes. It will get invoked when external views are added or deleted.
>>>  ------------------------------
>>> *From:* Varun Sharma [varun@pinterest.com]
>>> *Sent:* Thursday, February 05, 2015 1:27 AM
>>>
>>> *To:* user@helix.apache.org
>>> *Subject:* Re: Excessive ZooKeeper load
>>>
>>>    I had another question - does the RoutingTableProvider
>>> onExternalViewChange call get invoked when a resource gets deleted (and
>>> hence its external view znode) ?
>>>
>>> On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>
>>>>  Yes. I think we did this in the incubating stage or even before. It's
>>>> probably in a separate branch for some performance evaluation.
>>>>
>>>>  ------------------------------
>>>> *From:* kishore g [g.kishore@gmail.com]
>>>> *Sent:* Wednesday, February 04, 2015 9:54 PM
>>>>
>>>> *To:* user@helix.apache.org
>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>
>>>>    Jason, I remember having the ability to compress/decompress and
>>>> before we added the support to bucketize, compression was used to support
>>>> large number of partitions. However I dont see the code anywhere. Did we do
>>>> this on a separate branch?
>>>>
>>>>  thanks,
>>>> Kishore G
>>>>
>>>> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>>
>>>>>  Hi Varun, we can certainly add compression and have a config for
>>>>> turning it on/off. We do have implemented compression in our own zkclient
>>>>> before. The issue for compression might be:
>>>>> 1) cpu consumption on controller will increase.
>>>>> 2) hard to debug
>>>>>
>>>>>  Thanks,
>>>>> Jason
>>>>>  ------------------------------
>>>>> *From:* kishore g [g.kishore@gmail.com]
>>>>> *Sent:* Wednesday, February 04, 2015 3:08 PM
>>>>>
>>>>> *To:* user@helix.apache.org
>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>
>>>>>    we do have the ability to compress the data. I am not sure if
>>>>> there is a easy way to turn on/off the compression.
>>>>>
>>>>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>
>>>>> wrote:
>>>>>
>>>>>> I am wondering if its possible to gzip the external view znode - a
>>>>>> simple gzip cut down the data size by 25X. Is it possible to plug in
>>>>>> compression/decompression as zookeeper nodes are read ?
>>>>>>
>>>>>>  Varun
>>>>>>
>>>>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> There are multiple options we can try here.
>>>>>>> what if we used cacheddataaccessor for this use case?.clients will
>>>>>>> only read if node has changed. This optimization can benefit all use cases.
>>>>>>>
>>>>>>> What about batching the watch triggers. Not sure which version of
>>>>>>> helix has this option.
>>>>>>>
>>>>>>> Another option is to use a poll based roundtable instead of watch
>>>>>>> based. This can coupled with cacheddataaccessor can be over efficient.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kishore G
>>>>>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>>>>>>>
>>>>>>>> My total external view across all resources is roughly 3M in size
>>>>>>>> and there are 100 clients downloading it twice for every node restart -
>>>>>>>> thats 600M of data for every restart. So I guess that is causing this
>>>>>>>> issue. We are thinking of doing some tricks to limit the # of clients to 1
>>>>>>>> from 100. I guess that should help significantly.
>>>>>>>>
>>>>>>>>  Varun
>>>>>>>>
>>>>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>  Hey Varun,
>>>>>>>>>
>>>>>>>>>  I guess your external view is pretty large, since each external
>>>>>>>>> view callback takes ~3s. The RoutingTableProvider is callback
>>>>>>>>> based, so only when there is a change in the external view,
>>>>>>>>> RoutingTableProvider will read the entire external view from ZK. During the
>>>>>>>>> rolling upgrade, there are lots of live instance change, which may lead to
>>>>>>>>> a lot of changes in the external view. One possible way to mitigate the
>>>>>>>>> issue is to smooth the traffic by having some delays in between bouncing
>>>>>>>>> nodes. We can do a rough estimation on how many external view changes you
>>>>>>>>> might have during the upgrade, how many listeners you have, and how large
>>>>>>>>> is the external views. Once we have these numbers, we might know the ZK
>>>>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK
>>>>>>>>> observers.
>>>>>>>>>
>>>>>>>>>  ZK watcher is one time only, so every time a listener receives a
>>>>>>>>> callback, it will re-register its watcher again to ZK.
>>>>>>>>>
>>>>>>>>>  It's normally unreliable to depend on delta changes instead of
>>>>>>>>> reading the entire znode. There might be some corner cases where you would
>>>>>>>>> lose delta changes if you depend on that.
>>>>>>>>>
>>>>>>>>>  For the ZK connection issue, do you have any log on the ZK
>>>>>>>>> server side regarding this connection?
>>>>>>>>>
>>>>>>>>>  Thanks,
>>>>>>>>> Jason
>>>>>>>>>
>>>>>>>>>   ------------------------------
>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>>>
>>>>>>>>>    I believe there is a misbehaving client. Here is a stack trace
>>>>>>>>> - it probably lost connection and is now stampeding it:
>>>>>>>>>
>>>>>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10
>>>>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>>>>>
>>>>>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>>>>>
>>>>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>>>>
>>>>>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>>>>>
>>>>>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>>>>>
>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>>>>>
>>>>>>>>>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>>>>>>>>>
>>>>>>>>> *        at
>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>>>>>
>>>>>>>>> *        at
>>>>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>>>>>
>>>>>>>>> *        at
>>>>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>>>>>
>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>>>>>
>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>>>>>
>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>>>>
>>>>>>>>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.
>>>>>>>>> zk.ZKHelixManager)
>>>>>>>>>
>>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>>>>>
>>>>>>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>>>>>>
>>>>>>>>>         at org.I0Itec.zk
>>>>>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>>>>
>>>>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I am wondering what is causing the zk subscription to happen
>>>>>>>>>> every 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>>>>>
>>>>>>>>>>  Thanks
>>>>>>>>>>  Varun
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <varun@pinterest.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>  We are serving a few different resources whose total # of
>>>>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>>>>>
>>>>>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>>>>>
>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>
>>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>
>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>>>>>
>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>
>>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084
>>>>>>>>>>> subscribes child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>>
>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>>>>>
>>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

I see - one more thing - there was talk of a batching mode where Helix can
batch updates - can it batch multiple updates  to the external view and
write once into zookeeper instead of writing for every update. For example,
consider the case when lots of partitions are being onlined - if we could
batch updates to the external view into batches of 100 ? Is that supported
in Helix 0.6.4

Thanks !
Varun

On Thu, Feb 5, 2015 at 5:23 PM, Zhen Zhang <zz...@linkedin.com> wrote:

>  Yes. the listener will be notified on add/delete/modify. You can
> distinguish if you have a local cache and compare to get the delta.
> Currently the API doesn't expose this.
>
>  ------------------------------
> *From:* Varun Sharma [varun@pinterest.com]
> *Sent:* Thursday, February 05, 2015 1:53 PM
>
> *To:* user@helix.apache.org
> *Subject:* Re: Excessive ZooKeeper load
>
>   I assume that it also gets called when external views get modified ?
> How can i distinguish if there was an Add, a modify or a delete ?
>
>  Thanks
> Varun
>
> On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  Yes. It will get invoked when external views are added or deleted.
>>  ------------------------------
>> *From:* Varun Sharma [varun@pinterest.com]
>> *Sent:* Thursday, February 05, 2015 1:27 AM
>>
>> *To:* user@helix.apache.org
>> *Subject:* Re: Excessive ZooKeeper load
>>
>>    I had another question - does the RoutingTableProvider
>> onExternalViewChange call get invoked when a resource gets deleted (and
>> hence its external view znode) ?
>>
>> On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>
>>>  Yes. I think we did this in the incubating stage or even before. It's
>>> probably in a separate branch for some performance evaluation.
>>>
>>>  ------------------------------
>>> *From:* kishore g [g.kishore@gmail.com]
>>> *Sent:* Wednesday, February 04, 2015 9:54 PM
>>>
>>> *To:* user@helix.apache.org
>>> *Subject:* Re: Excessive ZooKeeper load
>>>
>>>    Jason, I remember having the ability to compress/decompress and
>>> before we added the support to bucketize, compression was used to support
>>> large number of partitions. However I dont see the code anywhere. Did we do
>>> this on a separate branch?
>>>
>>>  thanks,
>>> Kishore G
>>>
>>> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>
>>>>  Hi Varun, we can certainly add compression and have a config for
>>>> turning it on/off. We do have implemented compression in our own zkclient
>>>> before. The issue for compression might be:
>>>> 1) cpu consumption on controller will increase.
>>>> 2) hard to debug
>>>>
>>>>  Thanks,
>>>> Jason
>>>>  ------------------------------
>>>> *From:* kishore g [g.kishore@gmail.com]
>>>> *Sent:* Wednesday, February 04, 2015 3:08 PM
>>>>
>>>> *To:* user@helix.apache.org
>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>
>>>>    we do have the ability to compress the data. I am not sure if there
>>>> is a easy way to turn on/off the compression.
>>>>
>>>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>
>>>> wrote:
>>>>
>>>>> I am wondering if its possible to gzip the external view znode - a
>>>>> simple gzip cut down the data size by 25X. Is it possible to plug in
>>>>> compression/decompression as zookeeper nodes are read ?
>>>>>
>>>>>  Varun
>>>>>
>>>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com> wrote:
>>>>>
>>>>>> There are multiple options we can try here.
>>>>>> what if we used cacheddataaccessor for this use case?.clients will
>>>>>> only read if node has changed. This optimization can benefit all use cases.
>>>>>>
>>>>>> What about batching the watch triggers. Not sure which version of
>>>>>> helix has this option.
>>>>>>
>>>>>> Another option is to use a poll based roundtable instead of watch
>>>>>> based. This can coupled with cacheddataaccessor can be over efficient.
>>>>>>
>>>>>> Thanks,
>>>>>> Kishore G
>>>>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>>>>>>
>>>>>>> My total external view across all resources is roughly 3M in size
>>>>>>> and there are 100 clients downloading it twice for every node restart -
>>>>>>> thats 600M of data for every restart. So I guess that is causing this
>>>>>>> issue. We are thinking of doing some tricks to limit the # of clients to 1
>>>>>>> from 100. I guess that should help significantly.
>>>>>>>
>>>>>>>  Varun
>>>>>>>
>>>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  Hey Varun,
>>>>>>>>
>>>>>>>>  I guess your external view is pretty large, since each external
>>>>>>>> view callback takes ~3s. The RoutingTableProvider is callback
>>>>>>>> based, so only when there is a change in the external view,
>>>>>>>> RoutingTableProvider will read the entire external view from ZK. During the
>>>>>>>> rolling upgrade, there are lots of live instance change, which may lead to
>>>>>>>> a lot of changes in the external view. One possible way to mitigate the
>>>>>>>> issue is to smooth the traffic by having some delays in between bouncing
>>>>>>>> nodes. We can do a rough estimation on how many external view changes you
>>>>>>>> might have during the upgrade, how many listeners you have, and how large
>>>>>>>> is the external views. Once we have these numbers, we might know the ZK
>>>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK
>>>>>>>> observers.
>>>>>>>>
>>>>>>>>  ZK watcher is one time only, so every time a listener receives a
>>>>>>>> callback, it will re-register its watcher again to ZK.
>>>>>>>>
>>>>>>>>  It's normally unreliable to depend on delta changes instead of
>>>>>>>> reading the entire znode. There might be some corner cases where you would
>>>>>>>> lose delta changes if you depend on that.
>>>>>>>>
>>>>>>>>  For the ZK connection issue, do you have any log on the ZK server
>>>>>>>> side regarding this connection?
>>>>>>>>
>>>>>>>>  Thanks,
>>>>>>>> Jason
>>>>>>>>
>>>>>>>>   ------------------------------
>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>>>>> *To:* user@helix.apache.org
>>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>>
>>>>>>>>    I believe there is a misbehaving client. Here is a stack trace
>>>>>>>> - it probably lost connection and is now stampeding it:
>>>>>>>>
>>>>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10
>>>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>>>>
>>>>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>>>>
>>>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>>>
>>>>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>>>>
>>>>>>>>         at
>>>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>>>>
>>>>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>>>>
>>>>>>>>         at
>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>>>>
>>>>>>>>         at
>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>>>>
>>>>>>>>         at org.I0Itec.zk
>>>>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>>>>
>>>>>>>>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>>>>>>>>
>>>>>>>> *        at
>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>>>>
>>>>>>>> *        at
>>>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>>>>
>>>>>>>> *        at
>>>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>>>>
>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>>>>
>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>>>>
>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>>>
>>>>>>>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.
>>>>>>>> zk.ZKHelixManager)
>>>>>>>>
>>>>>>>>         at org.apache.helix.manager.zk
>>>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>>>>
>>>>>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>>>>>
>>>>>>>>         at org.I0Itec.zk
>>>>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>>>
>>>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am wondering what is causing the zk subscription to happen every
>>>>>>>>> 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>>>>
>>>>>>>>>  Thanks
>>>>>>>>>  Varun
>>>>>>>>>
>>>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>  We are serving a few different resources whose total # of
>>>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>>>>
>>>>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>>>>
>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>
>>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>
>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>>>>
>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>
>>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>>
>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>>>>
>>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

RE: Excessive ZooKeeper load

Posted by Zhen Zhang <zz...@linkedin.com>.

Yes. the listener will be notified on add/delete/modify. You can distinguish if you have a local cache and compare to get the delta. Currently the API doesn't expose this.

________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Thursday, February 05, 2015 1:53 PM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

I assume that it also gets called when external views get modified ? How can i distinguish if there was an Add, a modify or a delete ?

Thanks
Varun

On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. It will get invoked when external views are added or deleted.
________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Thursday, February 05, 2015 1:27 AM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I had another question - does the RoutingTableProvider onExternalViewChange call get invoked when a resource gets deleted (and hence its external view znode) ?

On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. I think we did this in the incubating stage or even before. It's probably in a separate branch for some performance evaluation.

________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 9:54 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

Jason, I remember having the ability to compress/decompress and  before we added the support to bucketize, compression was used to support large number of partitions. However I dont see the code anywhere. Did we do this on a separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hi Varun, we can certainly add compression and have a config for turning it on/off. We do have implemented compression in our own zkclient before. The issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason
________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 3:08 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure if there is a easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering if its possible to gzip the external view znode - a simple gzip cut down the data size by 25X. Is it possible to plug in compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only read if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has this option.

Another option is to use a poll based roundtable instead of watch based. This can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>> wrote:
My total external view across all resources is roughly 3M in size and there are 100 clients downloading it twice for every node restart - thats 600M of data for every restart. So I guess that is causing this issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hey Varun,

I guess your external view is pretty large, since each external view callback takes ~3s. The RoutingTableProvider is callback based, so only when there is a change in the external view, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live instance change, which may lead to a lot of changes in the external view. One possible way to mitigate the issue is to smooth the traffic by having some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might have during the upgrade, how many listeners you have, and how large is the external views. Once we have these numbers, we might know the ZK bandwidth requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the entire znode. There might be some corner cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side regarding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 30K. We just did a rolling restart fo the cluster and the clients which use the RoutingTableProvider are stuck in a bad state where they are constantly subscribing to changes in the external view of a cluster. Here is the helix log on the client after our rolling restart was finished - the client is constantly polling ZK. The zookeeper node is pushing 300mbps right now and most of the traffic is being pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so aggressively. We restarted one of the clients and we don't see these same messages anymore. Also is it possible to just propagate external view diffs instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

I assume that it also gets called when external views get modified ? How
can i distinguish if there was an Add, a modify or a delete ?

Thanks
Varun

On Thu, Feb 5, 2015 at 9:27 AM, Zhen Zhang <zz...@linkedin.com> wrote:

>  Yes. It will get invoked when external views are added or deleted.
>  ------------------------------
> *From:* Varun Sharma [varun@pinterest.com]
> *Sent:* Thursday, February 05, 2015 1:27 AM
>
> *To:* user@helix.apache.org
> *Subject:* Re: Excessive ZooKeeper load
>
>   I had another question - does the RoutingTableProvider
> onExternalViewChange call get invoked when a resource gets deleted (and
> hence its external view znode) ?
>
> On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  Yes. I think we did this in the incubating stage or even before. It's
>> probably in a separate branch for some performance evaluation.
>>
>>  ------------------------------
>> *From:* kishore g [g.kishore@gmail.com]
>> *Sent:* Wednesday, February 04, 2015 9:54 PM
>>
>> *To:* user@helix.apache.org
>> *Subject:* Re: Excessive ZooKeeper load
>>
>>    Jason, I remember having the ability to compress/decompress and
>> before we added the support to bucketize, compression was used to support
>> large number of partitions. However I dont see the code anywhere. Did we do
>> this on a separate branch?
>>
>>  thanks,
>> Kishore G
>>
>> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>
>>>  Hi Varun, we can certainly add compression and have a config for
>>> turning it on/off. We do have implemented compression in our own zkclient
>>> before. The issue for compression might be:
>>> 1) cpu consumption on controller will increase.
>>> 2) hard to debug
>>>
>>>  Thanks,
>>> Jason
>>>  ------------------------------
>>> *From:* kishore g [g.kishore@gmail.com]
>>> *Sent:* Wednesday, February 04, 2015 3:08 PM
>>>
>>> *To:* user@helix.apache.org
>>> *Subject:* Re: Excessive ZooKeeper load
>>>
>>>    we do have the ability to compress the data. I am not sure if there
>>> is a easy way to turn on/off the compression.
>>>
>>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>
>>> wrote:
>>>
>>>> I am wondering if its possible to gzip the external view znode - a
>>>> simple gzip cut down the data size by 25X. Is it possible to plug in
>>>> compression/decompression as zookeeper nodes are read ?
>>>>
>>>>  Varun
>>>>
>>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com> wrote:
>>>>
>>>>> There are multiple options we can try here.
>>>>> what if we used cacheddataaccessor for this use case?.clients will
>>>>> only read if node has changed. This optimization can benefit all use cases.
>>>>>
>>>>> What about batching the watch triggers. Not sure which version of
>>>>> helix has this option.
>>>>>
>>>>> Another option is to use a poll based roundtable instead of watch
>>>>> based. This can coupled with cacheddataaccessor can be over efficient.
>>>>>
>>>>> Thanks,
>>>>> Kishore G
>>>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>>>>>
>>>>>> My total external view across all resources is roughly 3M in size and
>>>>>> there are 100 clients downloading it twice for every node restart - thats
>>>>>> 600M of data for every restart. So I guess that is causing this issue. We
>>>>>> are thinking of doing some tricks to limit the # of clients to 1 from 100.
>>>>>> I guess that should help significantly.
>>>>>>
>>>>>>  Varun
>>>>>>
>>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>
>>>>>> wrote:
>>>>>>
>>>>>>>  Hey Varun,
>>>>>>>
>>>>>>>  I guess your external view is pretty large, since each external
>>>>>>> view callback takes ~3s. The RoutingTableProvider is callback
>>>>>>> based, so only when there is a change in the external view,
>>>>>>> RoutingTableProvider will read the entire external view from ZK. During the
>>>>>>> rolling upgrade, there are lots of live instance change, which may lead to
>>>>>>> a lot of changes in the external view. One possible way to mitigate the
>>>>>>> issue is to smooth the traffic by having some delays in between bouncing
>>>>>>> nodes. We can do a rough estimation on how many external view changes you
>>>>>>> might have during the upgrade, how many listeners you have, and how large
>>>>>>> is the external views. Once we have these numbers, we might know the ZK
>>>>>>> bandwidth requirement. ZK read bandwidth can be scaled by adding ZK
>>>>>>> observers.
>>>>>>>
>>>>>>>  ZK watcher is one time only, so every time a listener receives a
>>>>>>> callback, it will re-register its watcher again to ZK.
>>>>>>>
>>>>>>>  It's normally unreliable to depend on delta changes instead of
>>>>>>> reading the entire znode. There might be some corner cases where you would
>>>>>>> lose delta changes if you depend on that.
>>>>>>>
>>>>>>>  For the ZK connection issue, do you have any log on the ZK server
>>>>>>> side regarding this connection?
>>>>>>>
>>>>>>>  Thanks,
>>>>>>> Jason
>>>>>>>
>>>>>>>   ------------------------------
>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>>>> *To:* user@helix.apache.org
>>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>>
>>>>>>>    I believe there is a misbehaving client. Here is a stack trace -
>>>>>>> it probably lost connection and is now stampeding it:
>>>>>>>
>>>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10
>>>>>>> tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>>>
>>>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>>>
>>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>>
>>>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>>>
>>>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>>>
>>>>>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>>>
>>>>>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>>>
>>>>>>>         at org.I0Itec.zk
>>>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>>>
>>>>>>>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>>>>>>>
>>>>>>> *        at
>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>>>
>>>>>>> *        at
>>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>>>
>>>>>>> *        at
>>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>>>
>>>>>>>         at org.apache.helix.manager.zk
>>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>>>
>>>>>>>         at org.apache.helix.manager.zk
>>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>>>
>>>>>>>         at org.apache.helix.manager.zk
>>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>>
>>>>>>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
>>>>>>> .ZKHelixManager)
>>>>>>>
>>>>>>>         at org.apache.helix.manager.zk
>>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>>>
>>>>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>>>>
>>>>>>>         at org.I0Itec.zk
>>>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>>
>>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am wondering what is causing the zk subscription to happen every
>>>>>>>> 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>>>
>>>>>>>>  Thanks
>>>>>>>>  Varun
>>>>>>>>
>>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>  We are serving a few different resources whose total # of
>>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>>>
>>>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>>>
>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>
>>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>
>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>>>
>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>
>>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>>
>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>>>
>>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

RE: Excessive ZooKeeper load

Posted by Zhen Zhang <zz...@linkedin.com>.

Yes. It will get invoked when external views are added or deleted.
________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Thursday, February 05, 2015 1:27 AM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

I had another question - does the RoutingTableProvider onExternalViewChange call get invoked when a resource gets deleted (and hence its external view znode) ?

On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Yes. I think we did this in the incubating stage or even before. It's probably in a separate branch for some performance evaluation.

________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 9:54 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

Jason, I remember having the ability to compress/decompress and  before we added the support to bucketize, compression was used to support large number of partitions. However I dont see the code anywhere. Did we do this on a separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hi Varun, we can certainly add compression and have a config for turning it on/off. We do have implemented compression in our own zkclient before. The issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason
________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 3:08 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure if there is a easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering if its possible to gzip the external view znode - a simple gzip cut down the data size by 25X. Is it possible to plug in compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only read if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has this option.

Another option is to use a poll based roundtable instead of watch based. This can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>> wrote:
My total external view across all resources is roughly 3M in size and there are 100 clients downloading it twice for every node restart - thats 600M of data for every restart. So I guess that is causing this issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hey Varun,

I guess your external view is pretty large, since each external view callback takes ~3s. The RoutingTableProvider is callback based, so only when there is a change in the external view, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live instance change, which may lead to a lot of changes in the external view. One possible way to mitigate the issue is to smooth the traffic by having some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might have during the upgrade, how many listeners you have, and how large is the external views. Once we have these numbers, we might know the ZK bandwidth requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the entire znode. There might be some corner cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side regarding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 30K. We just did a rolling restart fo the cluster and the clients which use the RoutingTableProvider are stuck in a bad state where they are constantly subscribing to changes in the external view of a cluster. Here is the helix log on the client after our rolling restart was finished - the client is constantly polling ZK. The zookeeper node is pushing 300mbps right now and most of the traffic is being pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so aggressively. We restarted one of the clients and we don't see these same messages anymore. Also is it possible to just propagate external view diffs instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

I had another question - does the RoutingTableProvider onExternalViewChange
call get invoked when a resource gets deleted (and hence its external view
znode) ?

On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zz...@linkedin.com> wrote:

>  Yes. I think we did this in the incubating stage or even before. It's
> probably in a separate branch for some performance evaluation.
>
>  ------------------------------
> *From:* kishore g [g.kishore@gmail.com]
> *Sent:* Wednesday, February 04, 2015 9:54 PM
>
> *To:* user@helix.apache.org
> *Subject:* Re: Excessive ZooKeeper load
>
>   Jason, I remember having the ability to compress/decompress and  before
> we added the support to bucketize, compression was used to support large
> number of partitions. However I dont see the code anywhere. Did we do this
> on a separate branch?
>
>  thanks,
> Kishore G
>
> On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  Hi Varun, we can certainly add compression and have a config for
>> turning it on/off. We do have implemented compression in our own zkclient
>> before. The issue for compression might be:
>> 1) cpu consumption on controller will increase.
>> 2) hard to debug
>>
>>  Thanks,
>> Jason
>>  ------------------------------
>> *From:* kishore g [g.kishore@gmail.com]
>> *Sent:* Wednesday, February 04, 2015 3:08 PM
>>
>> *To:* user@helix.apache.org
>> *Subject:* Re: Excessive ZooKeeper load
>>
>>    we do have the ability to compress the data. I am not sure if there
>> is a easy way to turn on/off the compression.
>>
>> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com> wrote:
>>
>>> I am wondering if its possible to gzip the external view znode - a
>>> simple gzip cut down the data size by 25X. Is it possible to plug in
>>> compression/decompression as zookeeper nodes are read ?
>>>
>>>  Varun
>>>
>>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com> wrote:
>>>
>>>> There are multiple options we can try here.
>>>> what if we used cacheddataaccessor for this use case?.clients will only
>>>> read if node has changed. This optimization can benefit all use cases.
>>>>
>>>> What about batching the watch triggers. Not sure which version of helix
>>>> has this option.
>>>>
>>>> Another option is to use a poll based roundtable instead of watch
>>>> based. This can coupled with cacheddataaccessor can be over efficient.
>>>>
>>>> Thanks,
>>>> Kishore G
>>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>>>>
>>>>> My total external view across all resources is roughly 3M in size and
>>>>> there are 100 clients downloading it twice for every node restart - thats
>>>>> 600M of data for every restart. So I guess that is causing this issue. We
>>>>> are thinking of doing some tricks to limit the # of clients to 1 from 100.
>>>>> I guess that should help significantly.
>>>>>
>>>>>  Varun
>>>>>
>>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>
>>>>> wrote:
>>>>>
>>>>>>  Hey Varun,
>>>>>>
>>>>>>  I guess your external view is pretty large, since each external
>>>>>> view callback takes ~3s. The RoutingTableProvider is callback based,
>>>>>> so only when there is a change in the external view, RoutingTableProvider
>>>>>> will read the entire external view from ZK. During the rolling upgrade,
>>>>>> there are lots of live instance change, which may lead to a lot of changes
>>>>>> in the external view. One possible way to mitigate the issue is to smooth
>>>>>> the traffic by having some delays in between bouncing nodes. We can do a
>>>>>> rough estimation on how many external view changes you might have during
>>>>>> the upgrade, how many listeners you have, and how large is the external
>>>>>> views. Once we have these numbers, we might know the ZK bandwidth
>>>>>> requirement. ZK read bandwidth can be scaled by adding ZK observers.
>>>>>>
>>>>>>  ZK watcher is one time only, so every time a listener receives a
>>>>>> callback, it will re-register its watcher again to ZK.
>>>>>>
>>>>>>  It's normally unreliable to depend on delta changes instead of
>>>>>> reading the entire znode. There might be some corner cases where you would
>>>>>> lose delta changes if you depend on that.
>>>>>>
>>>>>>  For the ZK connection issue, do you have any log on the ZK server
>>>>>> side regarding this connection?
>>>>>>
>>>>>>  Thanks,
>>>>>> Jason
>>>>>>
>>>>>>   ------------------------------
>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>>> *To:* user@helix.apache.org
>>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>>
>>>>>>    I believe there is a misbehaving client. Here is a stack trace -
>>>>>> it probably lost connection and is now stampeding it:
>>>>>>
>>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800
>>>>>> nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>>
>>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>>
>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>
>>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>>
>>>>>>         at
>>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>>
>>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>>
>>>>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>>
>>>>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>>
>>>>>>         at org.I0Itec.zk
>>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>>
>>>>>>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>>>>>>
>>>>>> *        at
>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>>
>>>>>> *        at
>>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>>
>>>>>> *        at
>>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>>
>>>>>>         at org.apache.helix.manager.zk
>>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>>
>>>>>>         at org.apache.helix.manager.zk
>>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>>
>>>>>>         at org.apache.helix.manager.zk
>>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>
>>>>>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
>>>>>> .ZKHelixManager)
>>>>>>
>>>>>>         at org.apache.helix.manager.zk
>>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>>
>>>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>>>
>>>>>>         at org.I0Itec.zk
>>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>
>>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am wondering what is causing the zk subscription to happen every
>>>>>>> 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>>
>>>>>>>  Thanks
>>>>>>>  Varun
>>>>>>>
>>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>  We are serving a few different resources whose total # of
>>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>>
>>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>>
>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>
>>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>
>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>>
>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>
>>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>>
>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>>
>>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>>> /main_a/EXTERNALVIEW
>>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

RE: Excessive ZooKeeper load

Posted by Zhen Zhang <zz...@linkedin.com>.

Yes. I think we did this in the incubating stage or even before. It's probably in a separate branch for some performance evaluation.

________________________________
From: kishore g [g.kishore@gmail.com]
Sent: Wednesday, February 04, 2015 9:54 PM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

Jason, I remember having the ability to compress/decompress and  before we added the support to bucketize, compression was used to support large number of partitions. However I dont see the code anywhere. Did we do this on a separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hi Varun, we can certainly add compression and have a config for turning it on/off. We do have implemented compression in our own zkclient before. The issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason
________________________________
From: kishore g [g.kishore@gmail.com<ma...@gmail.com>]
Sent: Wednesday, February 04, 2015 3:08 PM

To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure if there is a easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering if its possible to gzip the external view znode - a simple gzip cut down the data size by 25X. Is it possible to plug in compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only read if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has this option.

Another option is to use a poll based roundtable instead of watch based. This can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>> wrote:
My total external view across all resources is roughly 3M in size and there are 100 clients downloading it twice for every node restart - thats 600M of data for every restart. So I guess that is causing this issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hey Varun,

I guess your external view is pretty large, since each external view callback takes ~3s. The RoutingTableProvider is callback based, so only when there is a change in the external view, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live instance change, which may lead to a lot of changes in the external view. One possible way to mitigate the issue is to smooth the traffic by having some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might have during the upgrade, how many listeners you have, and how large is the external views. Once we have these numbers, we might know the ZK bandwidth requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the entire znode. There might be some corner cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side regarding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 30K. We just did a rolling restart fo the cluster and the clients which use the RoutingTableProvider are stuck in a bad state where they are constantly subscribing to changes in the external view of a cluster. Here is the helix log on the client after our rolling restart was finished - the client is constantly polling ZK. The zookeeper node is pushing 300mbps right now and most of the traffic is being pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so aggressively. We restarted one of the clients and we don't see these same messages anymore. Also is it possible to just propagate external view diffs instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by kishore g <g....@gmail.com>.

Jason, I remember having the ability to compress/decompress and  before we
added the support to bucketize, compression was used to support large
number of partitions. However I dont see the code anywhere. Did we do this
on a separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zz...@linkedin.com> wrote:

>  Hi Varun, we can certainly add compression and have a config for turning
> it on/off. We do have implemented compression in our own zkclient before.
> The issue for compression might be:
> 1) cpu consumption on controller will increase.
> 2) hard to debug
>
>  Thanks,
> Jason
>  ------------------------------
> *From:* kishore g [g.kishore@gmail.com]
> *Sent:* Wednesday, February 04, 2015 3:08 PM
>
> *To:* user@helix.apache.org
> *Subject:* Re: Excessive ZooKeeper load
>
>   we do have the ability to compress the data. I am not sure if there is
> a easy way to turn on/off the compression.
>
> On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> I am wondering if its possible to gzip the external view znode - a simple
>> gzip cut down the data size by 25X. Is it possible to plug in
>> compression/decompression as zookeeper nodes are read ?
>>
>>  Varun
>>
>> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com> wrote:
>>
>>> There are multiple options we can try here.
>>> what if we used cacheddataaccessor for this use case?.clients will only
>>> read if node has changed. This optimization can benefit all use cases.
>>>
>>> What about batching the watch triggers. Not sure which version of helix
>>> has this option.
>>>
>>> Another option is to use a poll based roundtable instead of watch based.
>>> This can coupled with cacheddataaccessor can be over efficient.
>>>
>>> Thanks,
>>> Kishore G
>>>  On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>>>
>>>> My total external view across all resources is roughly 3M in size and
>>>> there are 100 clients downloading it twice for every node restart - thats
>>>> 600M of data for every restart. So I guess that is causing this issue. We
>>>> are thinking of doing some tricks to limit the # of clients to 1 from 100.
>>>> I guess that should help significantly.
>>>>
>>>>  Varun
>>>>
>>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>>
>>>>>  Hey Varun,
>>>>>
>>>>>  I guess your external view is pretty large, since each external view
>>>>> callback takes ~3s. The RoutingTableProvider is callback based, so
>>>>> only when there is a change in the external view, RoutingTableProvider will
>>>>> read the entire external view from ZK. During the rolling upgrade, there
>>>>> are lots of live instance change, which may lead to a lot of changes in the
>>>>> external view. One possible way to mitigate the issue is to smooth the
>>>>> traffic by having some delays in between bouncing nodes. We can do a rough
>>>>> estimation on how many external view changes you might have during the
>>>>> upgrade, how many listeners you have, and how large is the external views.
>>>>> Once we have these numbers, we might know the ZK bandwidth requirement. ZK
>>>>> read bandwidth can be scaled by adding ZK observers.
>>>>>
>>>>>  ZK watcher is one time only, so every time a listener receives a
>>>>> callback, it will re-register its watcher again to ZK.
>>>>>
>>>>>  It's normally unreliable to depend on delta changes instead of
>>>>> reading the entire znode. There might be some corner cases where you would
>>>>> lose delta changes if you depend on that.
>>>>>
>>>>>  For the ZK connection issue, do you have any log on the ZK server
>>>>> side regarding this connection?
>>>>>
>>>>>  Thanks,
>>>>> Jason
>>>>>
>>>>>   ------------------------------
>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>>> *To:* user@helix.apache.org
>>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>>
>>>>>    I believe there is a misbehaving client. Here is a stack trace -
>>>>> it probably lost connection and is now stampeding it:
>>>>>
>>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800
>>>>> nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>>
>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>
>>>>>         at java.lang.Object.wait(Native Method)
>>>>>
>>>>>         at java.lang.Object.wait(Object.java:503)
>>>>>
>>>>>         at
>>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>>
>>>>>         - locked <0x00000004fb0d8c38> (a
>>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>>
>>>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>>
>>>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>>
>>>>>         at org.I0Itec.zk
>>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>>
>>>>>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>>>>>
>>>>> *        at
>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>>
>>>>> *        at
>>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>>
>>>>> *        at
>>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>>
>>>>>         at org.apache.helix.manager.zk
>>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>>
>>>>>         at org.apache.helix.manager.zk
>>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>>
>>>>>         at org.apache.helix.manager.zk
>>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>
>>>>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
>>>>> .ZKHelixManager)
>>>>>
>>>>>         at org.apache.helix.manager.zk
>>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>>
>>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>>
>>>>>         at org.I0Itec.zk
>>>>> client.ZkEventThread.run(ZkEventThread.java:71)
>>>>>
>>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>
>>>>> wrote:
>>>>>
>>>>>> I am wondering what is causing the zk subscription to happen every
>>>>>> 2-3 seconds - is this a new watch being established every 3 seconds ?
>>>>>>
>>>>>>  Thanks
>>>>>>  Varun
>>>>>>
>>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>  We are serving a few different resources whose total # of
>>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>>
>>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>> /main_a/EXTERNALVIEW
>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>>
>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>> /main_a/EXTERNALVIEW
>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>
>>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>
>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>> /main_a/EXTERNALVIEW
>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>>
>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>> /main_a/EXTERNALVIEW
>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>
>>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>>
>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>>> /main_a/EXTERNALVIEW
>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>>
>>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>>> /main_a/EXTERNALVIEW
>>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

RE: Excessive ZooKeeper load

Posted by Zhen Zhang <zz...@linkedin.com>.

Hi Varun, we can certainly add compression and have a config for turning it on/off. We do have implemented compression in our own zkclient before. The issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason
________________________________
From: kishore g [g.kishore@gmail.com]
Sent: Wednesday, February 04, 2015 3:08 PM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure if there is a easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering if its possible to gzip the external view znode - a simple gzip cut down the data size by 25X. Is it possible to plug in compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com>> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only read if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has this option.

Another option is to use a poll based roundtable instead of watch based. This can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com>> wrote:
My total external view across all resources is roughly 3M in size and there are 100 clients downloading it twice for every node restart - thats 600M of data for every restart. So I guess that is causing this issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
Hey Varun,

I guess your external view is pretty large, since each external view callback takes ~3s. The RoutingTableProvider is callback based, so only when there is a change in the external view, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live instance change, which may lead to a lot of changes in the external view. One possible way to mitigate the issue is to smooth the traffic by having some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might have during the upgrade, how many listeners you have, and how large is the external views. Once we have these numbers, we might know the ZK bandwidth requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the entire znode. There might be some corner cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side regarding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<ma...@pinterest.com>]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org<ma...@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 30K. We just did a rolling restart fo the cluster and the clients which use the RoutingTableProvider are stuck in a bad state where they are constantly subscribing to changes in the external view of a cluster. Here is the helix log on the client after our rolling restart was finished - the client is constantly polling ZK. The zookeeper node is pushing 300mbps right now and most of the traffic is being pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so aggressively. We restarted one of the clients and we don't see these same messages anymore. Also is it possible to just propagate external view diffs instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by kishore g <g....@gmail.com>.

we do have the ability to compress the data. I am not sure if there is a
easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <va...@pinterest.com> wrote:

> I am wondering if its possible to gzip the external view znode - a simple
> gzip cut down the data size by 25X. Is it possible to plug in
> compression/decompression as zookeeper nodes are read ?
>
> Varun
>
> On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com> wrote:
>
>> There are multiple options we can try here.
>> what if we used cacheddataaccessor for this use case?.clients will only
>> read if node has changed. This optimization can benefit all use cases.
>>
>> What about batching the watch triggers. Not sure which version of helix
>> has this option.
>>
>> Another option is to use a poll based roundtable instead of watch based.
>> This can coupled with cacheddataaccessor can be over efficient.
>>
>> Thanks,
>> Kishore G
>> On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>>
>>> My total external view across all resources is roughly 3M in size and
>>> there are 100 clients downloading it twice for every node restart - thats
>>> 600M of data for every restart. So I guess that is causing this issue. We
>>> are thinking of doing some tricks to limit the # of clients to 1 from 100.
>>> I guess that should help significantly.
>>>
>>> Varun
>>>
>>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>
>>>>  Hey Varun,
>>>>
>>>>  I guess your external view is pretty large, since each external view
>>>> callback takes ~3s. The RoutingTableProvider is callback based, so
>>>> only when there is a change in the external view, RoutingTableProvider will
>>>> read the entire external view from ZK. During the rolling upgrade, there
>>>> are lots of live instance change, which may lead to a lot of changes in the
>>>> external view. One possible way to mitigate the issue is to smooth the
>>>> traffic by having some delays in between bouncing nodes. We can do a rough
>>>> estimation on how many external view changes you might have during the
>>>> upgrade, how many listeners you have, and how large is the external views.
>>>> Once we have these numbers, we might know the ZK bandwidth requirement. ZK
>>>> read bandwidth can be scaled by adding ZK observers.
>>>>
>>>>  ZK watcher is one time only, so every time a listener receives a
>>>> callback, it will re-register its watcher again to ZK.
>>>>
>>>>  It's normally unreliable to depend on delta changes instead of
>>>> reading the entire znode. There might be some corner cases where you would
>>>> lose delta changes if you depend on that.
>>>>
>>>>  For the ZK connection issue, do you have any log on the ZK server
>>>> side regarding this connection?
>>>>
>>>>  Thanks,
>>>> Jason
>>>>
>>>>   ------------------------------
>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>>> *To:* user@helix.apache.org
>>>> *Subject:* Re: Excessive ZooKeeper load
>>>>
>>>>   I believe there is a misbehaving client. Here is a stack trace - it
>>>> probably lost connection and is now stampeding it:
>>>>
>>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>>> 002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800
>>>> nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>>
>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>
>>>>         at java.lang.Object.wait(Native Method)
>>>>
>>>>         at java.lang.Object.wait(Object.java:503)
>>>>
>>>>         at
>>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>>
>>>>         - locked <0x00000004fb0d8c38> (a
>>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>>
>>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>>
>>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>>
>>>>         at org.I0Itec.zk
>>>> client.ZkConnection.exists(ZkConnection.java:95)
>>>>
>>>>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>>>>
>>>> *        at
>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>>
>>>> *        at
>>>> org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>>
>>>> *        at
>>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>>
>>>>         at org.apache.helix.manager.zk
>>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>>
>>>>         at org.apache.helix.manager.zk
>>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>>
>>>>         at org.apache.helix.manager.zk
>>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>>
>>>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
>>>> .ZKHelixManager)
>>>>
>>>>         at org.apache.helix.manager.zk
>>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>>
>>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>>
>>>>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>>>>
>>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>
>>>> wrote:
>>>>
>>>>> I am wondering what is causing the zk subscription to happen every 2-3
>>>>> seconds - is this a new watch being established every 3 seconds ?
>>>>>
>>>>>  Thanks
>>>>>  Varun
>>>>>
>>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>  We are serving a few different resources whose total # of
>>>>>> partitions is ~ 30K. We just did a rolling restart fo the cluster and the
>>>>>> clients which use the RoutingTableProvider are stuck in a bad state where
>>>>>> they are constantly subscribing to changes in the external view of a
>>>>>> cluster. Here is the helix log on the client after our rolling restart was
>>>>>> finished - the client is constantly polling ZK. The zookeeper node is
>>>>>> pushing 300mbps right now and most of the traffic is being pulled by
>>>>>> clients. Is this a race condition - also is there an easy way to make the
>>>>>> clients not poll so aggressively. We restarted one of the clients and we
>>>>>> don't see these same messages anymore. Also is it possible to just
>>>>>> propagate external view diffs instead of the whole big znode ?
>>>>>>
>>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>> /main_a/EXTERNALVIEW
>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>>
>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>> /main_a/EXTERNALVIEW
>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>
>>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>
>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>> /main_a/EXTERNALVIEW
>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>>
>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>> /main_a/EXTERNALVIEW
>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>
>>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>>
>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>>> /main_a/EXTERNALVIEW
>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>>
>>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>>> /main_a/EXTERNALVIEW
>>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

I am wondering if its possible to gzip the external view znode - a simple
gzip cut down the data size by 25X. Is it possible to plug in
compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g....@gmail.com> wrote:

> There are multiple options we can try here.
> what if we used cacheddataaccessor for this use case?.clients will only
> read if node has changed. This optimization can benefit all use cases.
>
> What about batching the watch triggers. Not sure which version of helix
> has this option.
>
> Another option is to use a poll based roundtable instead of watch based.
> This can coupled with cacheddataaccessor can be over efficient.
>
> Thanks,
> Kishore G
> On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>
>> My total external view across all resources is roughly 3M in size and
>> there are 100 clients downloading it twice for every node restart - thats
>> 600M of data for every restart. So I guess that is causing this issue. We
>> are thinking of doing some tricks to limit the # of clients to 1 from 100.
>> I guess that should help significantly.
>>
>> Varun
>>
>> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>
>>>  Hey Varun,
>>>
>>>  I guess your external view is pretty large, since each external view
>>> callback takes ~3s. The RoutingTableProvider is callback based, so only
>>> when there is a change in the external view, RoutingTableProvider will read
>>> the entire external view from ZK. During the rolling upgrade, there are
>>> lots of live instance change, which may lead to a lot of changes in the
>>> external view. One possible way to mitigate the issue is to smooth the
>>> traffic by having some delays in between bouncing nodes. We can do a rough
>>> estimation on how many external view changes you might have during the
>>> upgrade, how many listeners you have, and how large is the external views.
>>> Once we have these numbers, we might know the ZK bandwidth requirement. ZK
>>> read bandwidth can be scaled by adding ZK observers.
>>>
>>>  ZK watcher is one time only, so every time a listener receives a
>>> callback, it will re-register its watcher again to ZK.
>>>
>>>  It's normally unreliable to depend on delta changes instead of reading
>>> the entire znode. There might be some corner cases where you would lose
>>> delta changes if you depend on that.
>>>
>>>  For the ZK connection issue, do you have any log on the ZK server side
>>> regarding this connection?
>>>
>>>  Thanks,
>>> Jason
>>>
>>>   ------------------------------
>>> *From:* Varun Sharma [varun@pinterest.com]
>>> *Sent:* Monday, February 02, 2015 4:41 PM
>>> *To:* user@helix.apache.org
>>> *Subject:* Re: Excessive ZooKeeper load
>>>
>>>   I believe there is a misbehaving client. Here is a stack trace - it
>>> probably lost connection and is now stampeding it:
>>>
>>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>>> 002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800
>>> nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>>
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>
>>>         at java.lang.Object.wait(Native Method)
>>>
>>>         at java.lang.Object.wait(Object.java:503)
>>>
>>>         at
>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>>
>>>         - locked <0x00000004fb0d8c38> (a
>>> org.apache.zookeeper.ClientCnxn$Packet)
>>>
>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>>
>>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>>
>>>         at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
>>>
>>>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>>>
>>> *        at
>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>>
>>> *        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>>
>>> *        at
>>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>>
>>>         at org.apache.helix.manager.zk
>>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>>
>>>         at org.apache.helix.manager.zk
>>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>>
>>>         at org.apache.helix.manager.zk
>>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>>
>>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
>>> .ZKHelixManager)
>>>
>>>         at org.apache.helix.manager.zk
>>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>>
>>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>>
>>>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>>>
>>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>
>>> wrote:
>>>
>>>> I am wondering what is causing the zk subscription to happen every 2-3
>>>> seconds - is this a new watch being established every 3 seconds ?
>>>>
>>>>  Thanks
>>>>  Varun
>>>>
>>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  We are serving a few different resources whose total # of partitions
>>>>> is ~ 30K. We just did a rolling restart fo the cluster and the clients
>>>>> which use the RoutingTableProvider are stuck in a bad state where they are
>>>>> constantly subscribing to changes in the external view of a cluster. Here
>>>>> is the helix log on the client after our rolling restart was finished - the
>>>>> client is constantly polling ZK. The zookeeper node is pushing 300mbps
>>>>> right now and most of the traffic is being pulled by clients. Is this a
>>>>> race condition - also is there an easy way to make the clients not poll so
>>>>> aggressively. We restarted one of the clients and we don't see these same
>>>>> messages anymore. Also is it possible to just propagate external view diffs
>>>>> instead of the whole big znode ?
>>>>>
>>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>> /main_a/EXTERNALVIEW
>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>>
>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>> /main_a/EXTERNALVIEW
>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>
>>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>
>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>> /main_a/EXTERNALVIEW
>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>>
>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>> /main_a/EXTERNALVIEW
>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>
>>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>>
>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>>> /main_a/EXTERNALVIEW
>>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>>
>>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>>> /main_a/EXTERNALVIEW
>>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: Excessive ZooKeeper load

Posted by kishore g <g....@gmail.com>.

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only
read if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has
this option.

Another option is to use a poll based roundtable instead of watch based.
This can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G
On Feb 2, 2015 8:17 PM, "Varun Sharma" <va...@pinterest.com> wrote:

> My total external view across all resources is roughly 3M in size and
> there are 100 clients downloading it twice for every node restart - thats
> 600M of data for every restart. So I guess that is causing this issue. We
> are thinking of doing some tricks to limit the # of clients to 1 from 100.
> I guess that should help significantly.
>
> Varun
>
> On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  Hey Varun,
>>
>>  I guess your external view is pretty large, since each external view
>> callback takes ~3s. The RoutingTableProvider is callback based, so only
>> when there is a change in the external view, RoutingTableProvider will read
>> the entire external view from ZK. During the rolling upgrade, there are
>> lots of live instance change, which may lead to a lot of changes in the
>> external view. One possible way to mitigate the issue is to smooth the
>> traffic by having some delays in between bouncing nodes. We can do a rough
>> estimation on how many external view changes you might have during the
>> upgrade, how many listeners you have, and how large is the external views.
>> Once we have these numbers, we might know the ZK bandwidth requirement. ZK
>> read bandwidth can be scaled by adding ZK observers.
>>
>>  ZK watcher is one time only, so every time a listener receives a
>> callback, it will re-register its watcher again to ZK.
>>
>>  It's normally unreliable to depend on delta changes instead of reading
>> the entire znode. There might be some corner cases where you would lose
>> delta changes if you depend on that.
>>
>>  For the ZK connection issue, do you have any log on the ZK server side
>> regarding this connection?
>>
>>  Thanks,
>> Jason
>>
>>   ------------------------------
>> *From:* Varun Sharma [varun@pinterest.com]
>> *Sent:* Monday, February 02, 2015 4:41 PM
>> *To:* user@helix.apache.org
>> *Subject:* Re: Excessive ZooKeeper load
>>
>>   I believe there is a misbehaving client. Here is a stack trace - it
>> probably lost connection and is now stampeding it:
>>
>>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
>> 002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800
>> nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>>
>>    java.lang.Thread.State: WAITING (on object monitor)
>>
>>         at java.lang.Object.wait(Native Method)
>>
>>         at java.lang.Object.wait(Object.java:503)
>>
>>         at
>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>>
>>         - locked <0x00000004fb0d8c38> (a
>> org.apache.zookeeper.ClientCnxn$Packet)
>>
>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>
>>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>
>>         at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
>>
>>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>>
>> *        at
>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>>
>> *        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>>
>> *        at
>> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>>
>>         at org.apache.helix.manager.zk
>> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>>
>>         at org.apache.helix.manager.zk
>> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>>
>>         at org.apache.helix.manager.zk
>> .CallbackHandler.invoke(CallbackHandler.java:202)
>>
>>         - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
>> .ZKHelixManager)
>>
>>         at org.apache.helix.manager.zk
>> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>>
>>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>>
>>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>>
>> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com> wrote:
>>
>>> I am wondering what is causing the zk subscription to happen every 2-3
>>> seconds - is this a new watch being established every 3 seconds ?
>>>
>>>  Thanks
>>>  Varun
>>>
>>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>  We are serving a few different resources whose total # of partitions
>>>> is ~ 30K. We just did a rolling restart fo the cluster and the clients
>>>> which use the RoutingTableProvider are stuck in a bad state where they are
>>>> constantly subscribing to changes in the external view of a cluster. Here
>>>> is the helix log on the client after our rolling restart was finished - the
>>>> client is constantly polling ZK. The zookeeper node is pushing 300mbps
>>>> right now and most of the traffic is being pulled by clients. Is this a
>>>> race condition - also is there an easy way to make the clients not poll so
>>>> aggressively. We restarted one of the clients and we don't see these same
>>>> messages anymore. Also is it possible to just propagate external view diffs
>>>> instead of the whole big znode ?
>>>>
>>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>>> /main_a/EXTERNALVIEW
>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>>
>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>>> /main_a/EXTERNALVIEW
>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>
>>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>
>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>>> /main_a/EXTERNALVIEW
>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>>
>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>>> /main_a/EXTERNALVIEW
>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>
>>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>>
>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>>> /main_a/EXTERNALVIEW
>>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>>
>>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>>> /main_a/EXTERNALVIEW
>>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>>
>>>>
>>>>
>>>
>>
>

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

My total external view across all resources is roughly 3M in size and there
are 100 clients downloading it twice for every node restart - thats 600M of
data for every restart. So I guess that is causing this issue. We are
thinking of doing some tricks to limit the # of clients to 1 from 100. I
guess that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zz...@linkedin.com> wrote:

>  Hey Varun,
>
>  I guess your external view is pretty large, since each external view
> callback takes ~3s. The RoutingTableProvider is callback based, so only
> when there is a change in the external view, RoutingTableProvider will read
> the entire external view from ZK. During the rolling upgrade, there are
> lots of live instance change, which may lead to a lot of changes in the
> external view. One possible way to mitigate the issue is to smooth the
> traffic by having some delays in between bouncing nodes. We can do a rough
> estimation on how many external view changes you might have during the
> upgrade, how many listeners you have, and how large is the external views.
> Once we have these numbers, we might know the ZK bandwidth requirement. ZK
> read bandwidth can be scaled by adding ZK observers.
>
>  ZK watcher is one time only, so every time a listener receives a
> callback, it will re-register its watcher again to ZK.
>
>  It's normally unreliable to depend on delta changes instead of reading
> the entire znode. There might be some corner cases where you would lose
> delta changes if you depend on that.
>
>  For the ZK connection issue, do you have any log on the ZK server side
> regarding this connection?
>
>  Thanks,
> Jason
>
>   ------------------------------
> *From:* Varun Sharma [varun@pinterest.com]
> *Sent:* Monday, February 02, 2015 4:41 PM
> *To:* user@helix.apache.org
> *Subject:* Re: Excessive ZooKeeper load
>
>   I believe there is a misbehaving client. Here is a stack trace - it
> probably lost connection and is now stampeding it:
>
>  "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk
> 002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800
> nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]
>
>    java.lang.Thread.State: WAITING (on object monitor)
>
>         at java.lang.Object.wait(Native Method)
>
>         at java.lang.Object.wait(Object.java:503)
>
>         at
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>
>         - locked <0x00000004fb0d8c38> (a
> org.apache.zookeeper.ClientCnxn$Packet)
>
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>
>         at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
>
>         at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)
>
> *        at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*
>
> *        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*
>
> *        at
> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*
>
>         at org.apache.helix.manager.zk
> .CallbackHandler.subscribeDataChange(CallbackHandler.java:241)
>
>         at org.apache.helix.manager.zk
> .CallbackHandler.subscribeForChanges(CallbackHandler.java:287)
>
>         at org.apache.helix.manager.zk
> .CallbackHandler.invoke(CallbackHandler.java:202)
>
>         - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
> .ZKHelixManager)
>
>         at org.apache.helix.manager.zk
> .CallbackHandler.handleDataChange(CallbackHandler.java:338)
>
>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)
>
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>
> On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> I am wondering what is causing the zk subscription to happen every 2-3
>> seconds - is this a new watch being established every 3 seconds ?
>>
>>  Thanks
>>  Varun
>>
>> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com> wrote:
>>
>>> Hi,
>>>
>>>  We are serving a few different resources whose total # of partitions
>>> is ~ 30K. We just did a rolling restart fo the cluster and the clients
>>> which use the RoutingTableProvider are stuck in a bad state where they are
>>> constantly subscribing to changes in the external view of a cluster. Here
>>> is the helix log on the client after our rolling restart was finished - the
>>> client is constantly polling ZK. The zookeeper node is pushing 300mbps
>>> right now and most of the traffic is being pulled by clients. Is this a
>>> race condition - also is there an easy way to make the clients not poll so
>>> aggressively. We restarted one of the clients and we don't see these same
>>> messages anymore. Also is it possible to just propagate external view diffs
>>> instead of the whole big znode ?
>>>
>>>  15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>>> /main_a/EXTERNALVIEW
>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>>
>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>>> /main_a/EXTERNALVIEW
>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>
>>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>
>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>>> /main_a/EXTERNALVIEW
>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>>
>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>>> /main_a/EXTERNALVIEW
>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>
>>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>>> child-change. path: /main_a/EXTERNALVIEW, listener:
>>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>>
>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>>> /main_a/EXTERNALVIEW
>>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>>
>>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>>> /main_a/EXTERNALVIEW
>>> listener:org.apache.helix.spectator.RoutingTableProvider
>>>
>>>
>>>
>>
>

RE: Excessive ZooKeeper load

Posted by Zhen Zhang <zz...@linkedin.com>.

Hey Varun,

I guess your external view is pretty large, since each external view callback takes ~3s. The RoutingTableProvider is callback based, so only when there is a change in the external view, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live instance change, which may lead to a lot of changes in the external view. One possible way to mitigate the issue is to smooth the traffic by having some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might have during the upgrade, how many listeners you have, and how large is the external views. Once we have these numbers, we might know the ZK bandwidth requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the entire znode. There might be some corner cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side regarding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probably lost connection and is now stampeding it:


"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 30K. We just did a rolling restart fo the cluster and the clients which use the RoutingTableProvider are stuck in a bad state where they are constantly subscribing to changes in the external view of a cluster. Here is the helix log on the client after our rolling restart was finished - the client is constantly polling ZK. The zookeeper node is pushing 300mbps right now and most of the traffic is being pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so aggressively. We restarted one of the clients and we don't see these same messages anymore. Also is it possible to just propagate external view diffs instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.RoutingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

I believe there is a misbehaving client. Here is a stack trace - it
probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181"
daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait()
[0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a
org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

*        at
org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*

*        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*

*        at
org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*

        at org.apache.helix.manager.zk
.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk
.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk
.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
.ZKHelixManager)

        at org.apache.helix.manager.zk
.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <va...@pinterest.com> wrote:

> I am wondering what is causing the zk subscription to happen every 2-3
> seconds - is this a new watch being established every 3 seconds ?
>
> Thanks
> Varun
>
> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> Hi,
>>
>> We are serving a few different resources whose total # of partitions is ~
>> 30K. We just did a rolling restart fo the cluster and the clients which use
>> the RoutingTableProvider are stuck in a bad state where they are constantly
>> subscribing to changes in the external view of a cluster. Here is the helix
>> log on the client after our rolling restart was finished - the client is
>> constantly polling ZK. The zookeeper node is pushing 300mbps right now and
>> most of the traffic is being pulled by clients. Is this a race condition -
>> also is there an easy way to make the clients not poll so aggressively. We
>> restarted one of the clients and we don't see these same messages anymore.
>> Also is it possible to just propagate external view diffs instead of the
>> whole big znode ?
>>
>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>
>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider
>>
>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>> child-change. path: /main_a/EXTERNALVIEW, listener:
>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>
>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>
>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider
>>
>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>> child-change. path: /main_a/EXTERNALVIEW, listener:
>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>
>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>
>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider
>>
>>
>>
>

Re: Excessive ZooKeeper load

Posted by Varun Sharma <va...@pinterest.com>.

I am wondering what is causing the zk subscription to happen every 2-3
seconds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <va...@pinterest.com> wrote:

> Hi,
>
> We are serving a few different resources whose total # of partitions is ~
> 30K. We just did a rolling restart fo the cluster and the clients which use
> the RoutingTableProvider are stuck in a bad state where they are constantly
> subscribing to changes in the external view of a cluster. Here is the helix
> log on the client after our rolling restart was finished - the client is
> constantly polling ZK. The zookeeper node is pushing 300mbps right now and
> most of the traffic is being pulled by clients. Is this a race condition -
> also is there an easy way to make the clients not poll so aggressively. We
> restarted one of the clients and we don't see these same messages anymore.
> Also is it possible to just propagate external view diffs instead of the
> whole big znode ?
>
> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
> /main_a/EXTERNALVIEW
> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>
> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
> /main_a/EXTERNALVIEW
> listener:org.apache.helix.spectator.RoutingTableProvider
>
> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
> child-change. path: /main_a/EXTERNALVIEW, listener:
> org.apache.helix.spectator.RoutingTableProvider@76984879
>
> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
> /main_a/EXTERNALVIEW
> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>
> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
> /main_a/EXTERNALVIEW
> listener:org.apache.helix.spectator.RoutingTableProvider
>
> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
> child-change. path: /main_a/EXTERNALVIEW, listener:
> org.apache.helix.spectator.RoutingTableProvider@76984879
>
> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
> /main_a/EXTERNALVIEW
> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>
> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
> /main_a/EXTERNALVIEW
> listener:org.apache.helix.spectator.RoutingTableProvider
>
>
>