You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org> on 2017/11/02 01:03:55 UTC

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bharath Vissapragada has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8449


Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=60000

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 65 insertions(+), 33 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: te;
> I was chatting with Michael and he mentioned that in general we don't recom
Discussed this a little more with Dimitris, leaving it as-is for now. We didn't want to extend the usage of shared/weak_ptrs for readability sake.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard<mutex> l(subscribers_lock_);
             :   lock_guard<mutex> t(topic_lock_);
> My general opinion on this is "if it ain't broke, don't fix it". Do we have
Fair point, I'll revert the spinlock change. Maybe we can address it again if it really turns out to be a bottleneck.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 23:55:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: std::pair<SubscriberId, RegistrationId>
> It will until doSubscriberUpdate is called which will remove the entry. You
I was chatting with Michael and he mentioned that in general we don't recommend the use of weak_ptrs. So, if keeping the entries for some time is a concern by using shared_ptrs, you may want to ignore my recommendation.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 21:44:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 2:

(7 comments)

Thanks for doing this patch. This will help reduce a lot of unnecessary network traffic.

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165
PS2, Line 165:   typedef TUniqueId RegistrationId;
You can move this typedef to statestore.h and use the same type in statestore.h/cc too.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383
PS2, Line 383: SubscriberId
Do we even need to store the SubscriberId here? Can't we just store a unique registration ID? Where ever the 'SubscriberId' is required, we just get it from the Subscriber object anyway, and that object can be retrieved using the unique registration id.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481
PS2, Line 481: subscriber exists
Could you clarify what "exists" means here exactly? It could be confused with a node just existing as a part of the cluster. I think we want to say that it exists in the subscribers_ map.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482
PS2, Line 482: registration_ids
nit: registration_id


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484
PS2, Line 484: std::shared_ptr<Subscriber>* subscriber
Add a comment about what is returned in this out parameter.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414
PS2, Line 414: onst SubscriberId& subscriber_id,
             :     const TUniqueId& registration_id
It looks like it just makes sense to pass 'const Subscriber&' here? Is there a case where we would not get a subscriber_id and a registration_id from the same Subscriber object while calling this function?


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634
PS2, Line 634:   if (!RegisteredSubscriberExists(subscriber_to_update.first, subscriber_to_update.second,
I'm a little worried that we're contending for a mutex two more times in this function. Do you anticipate any performance regression due to increased context switching?

Consider using a spin lock if we won't have more than ~1000 entries in the map at one time. (unordered_map has a worst-case O(N) time complexity)



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 Nov 2017 03:19:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#5).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=60000

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 76 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1478/


-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 20:32:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dimitris Tsirogiannis, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#4).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=60000

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 81 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 4
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 8: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h@381
PS8, Line 381: in microseconds since epoch)
or just say "in Unix time"


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: strationId of the registered subscriber
> Discussed this a little more with Dimitris, leaving it as-is for now. We di
Thanks. yes, let's avoid shared_ptrs and especially weak_ptrs, and move toward single ownership when possible.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard<mutex> l(subscribers_lock_);
             :   lock_guard<mutex> t(topic_lock_);
> Fair point, I'll revert the spinlock change. Maybe we can address it again 
Note that SpinLock is not a traditional spin-lock -- it's adaptive and will block like a mutex after attempting to spin for a while. So, it's pretty general-purpose.


http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc@415
PS8, Line 415: RegisteredSubscriberExists
Seems like this should just be "FindSubscriber()" or "FindRegisteredSubscriber()" but okay to leave if you prefer the "exists" name.  The "exists" naming makes it a bit surprising that it also returns the pointer.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 8
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 19:12:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: std::pair<SubscriberId, RegistrationId>
> Wouldn't that keep a bunch of unregistered 'Subscriber' objects around due 
It will until doSubscriberUpdate is called which will remove the entry. You can even use a weak_ptr here if keeping these entries for some time is a concern.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 21:28:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: std::pair<SubscriberId, RegistrationId>
I think the code would be much simpler if you stored a pointer (probably a shared_ptr is needed) to the Subscriber here and simply compared it to the registered Subscriber.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 20:08:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard<SpinLock> l(subscribers_lock_);
             :   lock_guard<mutex> t(topic_lock_);
> IMO, we shouldn't use spinlock for topic_lock_ since we can potentially do 
My general opinion on this is "if it ain't broke, don't fix it". Do we have any evidence that these locks are a problem, or is this a case of premature optimization?

In any case, this is not the main focus of this patch, so let's avoid creeping in unrelated changes. I vote for reverting.

If we have evidence that a mutex is not good here, then let's change that in a separate patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 22:05:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165
PS2, Line 165:   typedef TUniqueId RegistrationId;
> You can move this typedef to statestore.h and use the same type in statesto
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383
PS2, Line 383: SubscriberId
>  Where ever the 'SubscriberId' is
 >  required, we just get it from the Subscriber object anyway, and
 >  that object can be retrieved using the unique registration id.

Not sure I follow. If you see OfferUpdate()/DoSubscriberUpdate(), we only keep track of ScheduledSubscriberUpdate objects and get the corresponding Subscriber based on ScheduledSubscriberUpdate.subscriber_Id. So we don't have a Subscriber object handy to get the subscriber_id in all cases.

Also, if we remove SubscriberId everywhere, this means that we need to change the subscribers_ map structure to map from RegistrationId -> Subscriber objects. Doing so we can't look up by subscriber_id, which is required in RegisterSubscriber() (At that point, we don't assign a RegistrationId yet to the new instance)

SubscriberMap::iterator subscriber_it = subscribers_.find(subscriber_id);
    if (subscriber_it != subscribers_.end()) {
      UnregisterSubscriber(subscriber_it->second.get());
    }

We can still figure out a way, but it seemed unnecessarily complex to me. Thoughts?


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481
PS2, Line 481: subscriber exists
> Could you clarify what "exists" means here exactly? It could be confused wi
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482
PS2, Line 482: registration_ids
> nit: registration_id
Done


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484
PS2, Line 484: std::shared_ptr<Subscriber>* subscriber
> Add a comment about what is returned in this out parameter.
clarified


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414
PS2, Line 414: onst SubscriberId& subscriber_id,
             :     const TUniqueId& registration_id
> It looks like it just makes sense to pass 'const Subscriber&' here? Is ther
Not sure I understand. We get the subscriber/registration_id from the ScheduledSubscriberUpdate object and not the Subscriber object. Do you mean we should pass directly pass ScheduledSubscriberUpdate instead? If thats the case, the signature seems kinda weird :)


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634
PS2, Line 634:   if (!RegisteredSubscriberExists(subscriber_to_update.first, subscriber_to_update.second,
> I'm a little worried that we're contending for a mutex two more times in th
Good point. I too think (theoretically) spinlock is probably a better choice to avoid context switching. Also, ~1000 entries seem like a reasonable estimate for foreseeable future :). Also, I think for string based hashing of ~1000 entries, it is reasonable to assume a O(1) average case lookup (even though the worst case is O(N)). 

I created a microbenchmark to see if the lock type makes a difference. In the benchmark, I measured how long it takes to get 100 heartbeats for a given subscriber (with heart beating thread pool sizes of 10/100). 
I didn't see any noticeable difference for 100 subscribers, but beyond that, the test runs into flaky socket connection issues. I admit that this is not representative of the real world use case because in real clusters, the statestore CPU would be much busier and the context switching could be more expensive.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 06:22:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 23:53:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#9).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=60000

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 86 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/9
-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#8).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=60000

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 88 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 8
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#3).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=60000

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 78 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 6: Code-Review+1

Carrying +1 (Thanks Sailesh). Any volunteers for a +2 review, thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 21:08:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383
PS2, Line 383: d subscriber
> >  Where ever the 'SubscriberId' is
Thanks for the explanation.

Yea my point was if we're going to have a unique RegistrationId anyway, why have a SubscriberId. It seemed redundant.

But as you pointed out, it looks like the subscriber chooses the subscriber_id and not the statestore. So, it would be hard to enforce this.

Let's leave this for now.


http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414
PS2, Line 414: onst SubscriberId& subscriber_id,
             :     const TUniqueId& registration_id
> Not sure I understand. We get the subscriber/registration_id from the Sched
Nvm, my bad, I thought both were coming from the Subscriber object. Ignore this.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard<SpinLock> l(subscribers_lock_);
             :   lock_guard<mutex> t(topic_lock_);
I just noticed this. Getting a SpinLock before getting a mutex is an anti-pattern.

Even attempting to get a spinlock while already holding a spinlock is also not exactly a great idea. However, our SpinLock implementation sleeps after a few cycles of trying to obtain the lock anyway.

Do we know if we do a lot of work holding the topic_lock_? If not, let's change this to a SpinLock too. (The GatherTopicUpdates() holds topic_lock_ and iterates through a nested loop, but I'm not sure how many iterations that would be in the worst case).

If it looks like we will end up doing a lot of work holing the lock, we can be safe and just turn the 'subscribers_lock_' back to a mutex.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@415
PS3, Line 415: const TUniqueId&
const RegistrationId&



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:18:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8449

to look at the new patch set (#7).

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=60000

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 91 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 9: Code-Review+2

Carrying +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 20:31:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 5: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Nov 2017 00:29:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h@383
PS6, Line 383: std::pair<SubscriberId, RegistrationId>
> once we have two level pair, I think it's time to start naming the fields. 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 00:18:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................

IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Bug:

Without this patch, when a subscriber repeatedly reconnects to the
statestore, the latter queues the initial heartbeat message and a
bunch of topic updates to every instance of the registered subscriber.
These queued updates are eventually picked up by the heartbeating/topic
update threads and the corresponding RPCs are made to the subscribers.
The subscriber then rejects these updates since they were meant for an
earlier registration. This is usually possible if the subscriber has
some network problems leading to failing RPCs.

Such a node is eventually marked by the statestore as bad, but depending
on the configurations set, the issue can snowball into DDOS kind of
attack when the entire thread pool of heartbeating/topic updates is
filled with instances from the problematic host. This can result in
the statestore missing timely heartbeats to other subscribers making
them reconnect. This worsens the situation and the resulting topic
updates for the reconnects will fully saturate the network on the
statestore host, until the statestore daemon is restarted.

Fix:

This patch maps topic updates/heartbeats to a specific subscriber
registered instance rather to a subscriber id (that stays same across
reconnects). That way, when we encounter a topic update that was meant to
a stale subscriber, we can simply reject it.

Testing:

Tested this locally by adding relevant logging. I made the subscribers
to reconnect aggressively(a) and delaying heartbeats from the statestore
side (b,c).

(a) --statestore_subscriber_timeout_seconds=1
(b) --statestore_max_missed_heartbeats=1000
(c) --statestore_heartbeat_frequency_ms=60000

Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Reviewed-on: http://gerrit.cloudera.org:8080/8449
Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
4 files changed, 86 insertions(+), 45 deletions(-)

Approvals:
  Bharath Vissapragada: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 10
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h@394
PS7, Line 394:   typedef std::pair<int64_t, RegisteredSubscriberInstance> ScheduledSubscriberUpdate;
> I meant flatten both pairs -- i.e. turn ScheduledSubscriberUpdate into a st
oops sorry, got confused because your comment just highlighted the second part of the pair. Redid this, makes more sense to flatten the whole thing.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 02:27:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: std::pair<SubscriberId, RegistrationId>
> I think the code would be much simpler if you stored a pointer (probably a 
Wouldn't that keep a bunch of unregistered 'Subscriber' objects around due to shared_ptr references? I agree the code might be simple though.


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard<SpinLock> l(subscribers_lock_);
             :   lock_guard<mutex> t(topic_lock_);
> I just noticed this. Getting a SpinLock before getting a mutex is an anti-p
IMO, we shouldn't use spinlock for topic_lock_ since we can potentially do some heavy work in GatherTopicUpdates().  If this is an anti-pattern I'm ok reverting the change to a mutex. May we can ask others opinions on it? 

Dimitris/Dan/Alex do you have any opinion on this?


http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@415
PS3, Line 415: const TUniqueId&
> const RegistrationId&
Changed at other places too.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 20:38:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h@394
PS7, Line 394:   typedef std::pair<int64_t, RegisteredSubscriberInstance> ScheduledSubscriberUpdate;
I meant flatten both pairs -- i.e. turn ScheduledSubscriberUpdate into a struct (with three fields).



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 05:40:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h@381
PS8, Line 381:  both kinds of subscriber up
> or just say "in Unix time"
Done


http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc@415
PS8, Line 415: Id& registration_id, share
> Seems like this should just be "FindSubscriber()" or "FindRegisteredSubscri
FindSubscriber() sounds better.



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Nov 2017 20:28:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h@383
PS6, Line 383: std::pair<SubscriberId, RegistrationId>
once we have two level pair, I think it's time to start naming the fields. How about defining a struct for this thing instead?



-- 
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Nov 2017 21:13:03 +0000
Gerrit-HasComments: Yes