You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org> on 2017/04/13 18:35:11 UTC

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6628

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
12 files changed, 168 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
> Hue would like to be able to load balance across the set of coordinators.  
I understand the use case and I fully agree. What isn't clear to me is the mechanism that Hue is using to "look at the membership list". i.e. how do they actually get that information.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
> I understand the use case and I fully agree. What isn't clear to me is the 
They aren't yet doing it, but the proposal is for them to hit the statestore webpage to get the json.  Alternatively, they could subscribe to the statestore topic.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/503/

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

Line 59:   cluster.AddHost(true, true, true);
> Agree, this isn't pretty. Using a builder may not work well with AddHost().
Cool, thanks!


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 204:       new_backend_config->AddBackend(be_desc);
> You want me to remove the punctuation?
I think it needs a comma after 'executor'. My first comment should have been more clear, sorry for that. :)


Line 205:       current_membership_.insert(make_pair(item.key, be_desc));
> Made the fields required, so I guess it's not needed now.
They still are marked "optional" in StatestoreService.thrift. Am I missing something? I agree that making them "required" is a good idea.


http://gerrit.cloudera.org:8080/#/c/6628/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 271:       rapidjson::Document* document);
I can't find the forward declaration. How does this compile now? Is this included transitively from somewhere else?


http://gerrit.cloudera.org:8080/#/c/6628/2/be/src/util/webserver.cc
File be/src/util/webserver.cc:

Line 200:   if (env->impala_server()->IsExecutor()) mode_str << " Executor";
nit: this used to have a comma between them in the previous PS. The other place where you output this has a "/" in between. Can we unify them?


http://gerrit.cloudera.org:8080/#/c/6628/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 65:   // The address of the debug HTTP server
> I added those as optional because they aren't always set (e.g. during sched
I agree, making them required looks better to me, too. It removes ambiguity while requiring more space.


http://gerrit.cloudera.org:8080/#/c/6628/2/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 69:   6: optional bool secure_webserver;
Maybe it'd be beneficial to add an optional flag "is_secured" to TNetworkAddress? This looks like other services should need the same annotation.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 7: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/503/

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
> They aren't yet doing it, but the proposal is for them to hit the statestor
Thanks for the clarification. This patch modified the /backends webpage to include all the subscribed backends and added the executor/coordinator flags. So, Hue could get that webpage out of any backend and retrieve the executor/coordinator information. I attach a screenshot so that you'll see what I mean (https://cloudera.box.com/s/rvy496vfk812esl3d17677i0m4wzgdmp). The current_membership_ doesn't affect that at all. Let me know if that is acceptable or if you prefer the statestore webpage to expose that information as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:   }
> Essentially the scheduler only manages/uses backends that are executors. Ho
What else would we want to rename backend->executor besides backend_config_ and associated functions/variables?  (The types themselves don't need to be updated in this change, but it's helpful to distinguish what backends the sets themselves include). That doesn't seem too bad.  Or are there others?

Given that we now have coord_only_backend_config_, it seems that renaming would be helpful to clarify what backend_config_ actually contains.


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 202:     // If this is not an executor, don't add it to the new backend config.
this comment just repeats what the code is doing, but doesn't explain why. it would be more useful if at the top of the function we just say at a high level what this routine does:

// Update executors_config_ and current_executors_ to match the set of executors given by the subscriber_topic_updates.

Or something like that.


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

PS4, Line 269: backend_config_
as mentioned earlier, this would be easier to follow if this were renamed to signify that it only contains executors.


PS4, Line 283:   /// Map from unique backend id to TBackendDescriptor. Used to track the known executors
             :   /// from the statestore. It's important to track both the backend ID as well as the
             :   /// TBackendDescriptor so we know what is being removed in a given update.
this comment was very unhelpful at understanding the code because it doesn't explicitly say what a "backend id" is, or what this map really is.  After reading the code you can see it's just a full representation of the IMPALA_MEMBERSHIP_TOPIC {key, value} pairs for executors (to which deltas can be applied, and it drives updates to state, e.g. backend_config_, that's derived from the topic). It would be helpful to be explicit about that.


http://gerrit.cloudera.org:8080/#/c/6628/4/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS4, Line 41: only_coordinate
when i read this name I thought that it would start a cluster of only coordinators (which was confusing).  how about --exclusive_coordinators? Though that name isn't great either.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 8: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
> I am a little confused with the last part of your comment: "That way, e.g. 
Hue would like to be able to load balance across the set of coordinators.  Before the is-coordinator change, they could do that by looking at the membership list (since all impalads acted as coordinators).  Hue still needs a way to find the coordinators, and if you remove them from the membership set, they won't be able to use that.
Instead the proposal is to continue having all impalads be part of the "cluster membership", and instead, augment the values with info about whether the node is a coordinator and/or executor.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 4:

(3 comments)

Just some minor comments.

http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 388: void ImpalaServer::BackendsUrlCallback(
I think this should go in impala-http-handler.cc


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS4, Line 284: GetServerRole
Why deal with the presentation (i.e. string formatting) here, rather than let the template do the work? That is, just have the json store IsExecutor() and IsCoordinator(), and then present them how it likes.


http://gerrit.cloudera.org:8080/#/c/6628/4/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS4, Line 42: \
don't think you need these inside ( )


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 202:     // If this is not an executor, don't add it to the new backend config.
> this comment just repeats what the code is doing, but doesn't explain why. 
Done


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

PS4, Line 269: backend_config_
> as mentioned earlier, this would be easier to follow if this were renamed t
Done


PS4, Line 283:   /// Map from unique backend id to TBackendDescriptor. Used to track the known executors
             :   /// from the statestore. It's important to track both the backend ID as well as the
             :   /// TBackendDescriptor so we know what is being removed in a given update.
> this comment was very unhelpful at understanding the code because it doesn'
Done


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 388: void ImpalaServer::BackendsUrlCallback(
> I think this should go in impala-http-handler.cc
Done


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS4, Line 284: GetServerRole
> Why deal with the presentation (i.e. string formatting) here, rather than l
Done


http://gerrit.cloudera.org:8080/#/c/6628/4/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS4, Line 41: only_coordinate
> when i read this name I thought that it would start a cluster of only coord
Done


PS4, Line 42: \
> don't think you need these inside ( )
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
13 files changed, 181 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#6).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
M www/root.tmpl
17 files changed, 433 insertions(+), 307 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
> Thanks for the clarification. This patch modified the /backends webpage to 
I see, so you're saying that the "impala-membership" topic does contain all nodes of the cluster and their is_executor/is_coordinator bits. Okay, I missed that this callback doesn't do the topic update -- that's handled by the one in impala-server.cc.

This code is even more confusing now because the terminology is not clear. I think we need to do some renaming to make it easier to understand this code. For example, "current_membership_" is misleading now. It'd be better to rename that as "current_executors_" or similar.

Also, the terminology "backend" seems confusing (or at least could use clarifications).  e.g. a coordinator but non-executor is still considered a "backend" which seems weird.  I guess in the code we are saying "backend" is synonymous with "impalad"? (I think it would be too much to rename TBackendDescriptor right now, but I think we should consider doing that independently. The comment in the thrift file refers to the "backend topic", but the topic is called "impala-membership", so maybe this is historical naming)


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
13 files changed, 176 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6628/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 271:       rapidjson::Document* document);
> I can't find the forward declaration. How does this compile now? Is this in
Exactly, it wasn't needed in the first place.


http://gerrit.cloudera.org:8080/#/c/6628/2/be/src/util/webserver.cc
File be/src/util/webserver.cc:

Line 200:   if (env->impala_server()->IsExecutor()) mode_str << " Executor";
> nit: this used to have a comma between them in the previous PS. The other p
Done


http://gerrit.cloudera.org:8080/#/c/6628/2/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 69:   6: optional bool secure_webserver;
> Maybe it'd be beneficial to add an optional flag "is_secured" to TNetworkAd
Is this related to this change? Should I add a TODO instead?


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6628/5/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS5, Line 279:   /// A map from unique backend ID to the corresponding TBackendDescriptor of that backend.
             :   /// Used to track membership updates from the statestore so queries can be cancelled
             :   /// when a backend is removed. It's not enough to just cancel fragments that are running
             :   /// based on the deletions mentioned in the most recent statestore heartbeat; sometimes
             :   /// cancellations are skipped and the statestore, at its discretion, may send only
             :   /// a delta of the current membership so we need to compute any deletions.
             :   /// TODO: Currently there are multiple locations where cluster membership is tracked,
             :   /// here and in the scheduler. This should be consolidated so there is a single component
             :   /// (the scheduler?) that tracks this information and calls other interested components.
I think this comment is about the known_backends_ field, no?


PS5, Line 290: const BackendDescriptorMap& GetKnownBackends();
why does this need to be exposed? where is it used?


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
> I see, so you're saying that the "impala-membership" topic does contain all
Essentially the scheduler only manages/uses backends that are executors. However,  the term 'backend' is embedded in so many places that we may not want to do the rename it in this patch. I did the rename you suggested and added a brief comment in StatestoreService.thrift. Let me know if you'd like me to do the backend->executor rename now.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
i thought the plan was to continue to have all nodes of the cluster still be in the "membership set", but that each node would have attributes indicating whether it is a coordinator and/or executor.
That way, e.g. clients can still use the membership topic to discover coordinators.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 7: Code-Review+2

Rebase and minor test fix. Keep Dan's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
> What else would we want to rename backend->executor besides backend_config_
There is a number of inner classes in this file (e.g. AddressableAssignment, BackendAssignmentInfo, etc) that use the term backend in fields and functions. I believe these should be renamed as well. I just want to make sure we agree on the scope of this change before I start modifying this file even more.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Reviewed-on: http://gerrit.cloudera.org:8080/6628
Reviewed-by: Dimitris Tsirogiannis <dt...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
M www/root.tmpl
17 files changed, 432 insertions(+), 307 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1//COMMIT_MSG
Commit Message:

Line 16: - Added a customer cluster test.
nit: custom


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#5).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
M www/root.tmpl
17 files changed, 433 insertions(+), 308 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/505/

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6628/5/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS5, Line 279:   /// A map from unique backend ID to the corresponding TBackendDescriptor of that backend.
             :   /// Used to track membership updates from the statestore so queries can be cancelled
             :   /// when a backend is removed. It's not enough to just cancel fragments that are running
             :   /// based on the deletions mentioned in the most recent statestore heartbeat; sometimes
             :   /// cancellations are skipped and the statestore, at its discretion, may send only
             :   /// a delta of the current membership so we need to compute any deletions.
             :   /// TODO: Currently there are multiple locations where cluster membership is tracked,
             :   /// here and in the scheduler. This should be consolidated so there is a single component
             :   /// (the scheduler?) that tracks this information and calls other interested components.
> I think this comment is about the known_backends_ field, no?
Oops, yes.


PS5, Line 290: const BackendDescriptorMap& GetKnownBackends();
> why does this need to be exposed? where is it used?
This is used in ImpalaHttpHandler::BackendsHandler().


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#4).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
14 files changed, 190 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

Line 61: int Cluster::AddHost(bool has_backend, bool has_datanode, bool is_executor) {
> Can you DCHECK(!is_executor || has_backend)?
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

Line 56: /// ranges.
> nit: single line
Done


Line 59:   cluster.AddHost(true, true, false);
> This looks like it can get hard to parse rather quickly - I think even havi
Agree, this isn't pretty. Using a builder may not work well with AddHost(). The latter is not merely a factory for hosts, it also populates several Cluster fields, like list of backends, etc. I'll leave a TODO for now and give it some thought later.


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 32: #include "rapidjson/rapidjson.h"
> still needed?
Done


Line 44: using namespace rapidjson;
> still needed?
Done


Line 204:     // If this is not an executor don't add it to the new backend config.
> nit: punctuation
You want me to remove the punctuation?


Line 205:     if (be_desc.is_executor) {
> Do you need to check __is_set?
Made the fields required, so I guess it's not needed now.


Line 207:       current_membership_.insert(make_pair(item.key, be_desc));
> i thought the plan was to continue to have all nodes of the cluster still b
I am a little confused with the last part of your comment: "That way, e.g. clients can still use the membership topic to discover coordinators." I couldn't find how that works exactly, can you plz elaborate or give some pointers?


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 2000:   vector<string> modes;
> Building this with a stringstream may be easier and would also not need ano
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 272:       rapidjson::Document* document);
> you could forward declare the class and move the include to the .cc file.
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/util/webserver.cc
File be/src/util/webserver.cc:

Line 199:   if (env->impala_server()->IsCoordinator()) modes.push_back("Coordinator");
> Same here, maybe use stringstream?
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS1, Line 216:  
> missing word?
Done


Line 254:     else:
> elif?
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 65:   // True if this is a coordinator node
> What does it mean if these are unset?
I added those as optional because they aren't always set (e.g. during scheduler tests). I think it may be better to make them required. What do you think?


http://gerrit.cloudera.org:8080/#/c/6628/1/tests/custom_cluster/test_coordinators.py
File tests/custom_cluster/test_coordinators.py:

Line 94:           operators are executed on 'expected_num_of_executors' nodes."""
> nit: indendation
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6628

to look at the new patch set (#7).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
M www/root.tmpl
17 files changed, 432 insertions(+), 307 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

Line 61: int Cluster::AddHost(bool has_backend, bool has_datanode, bool is_executor) {
Can you DCHECK(!is_executor || has_backend)?


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

Line 56: /// ranges.
nit: single line


Line 59:   cluster.AddHost(true, true, false);
This looks like it can get hard to parse rather quickly - I think even having two boolean flags is a suboptimal. Can we consider something more clear? For example chaining function calls: cluster.AddHost().SetBackend().SetDatanode();

I'm open to anything really, if you can't think of anything better, I'm also good with leaving it like it is.


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 32: #include "rapidjson/rapidjson.h"
still needed?


Line 44: using namespace rapidjson;
still needed?


Line 204:     // If this is not an executor don't add it to the new backend config.
nit: punctuation


Line 205:     if (be_desc.is_executor) {
Do you need to check __is_set?


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 2000:   vector<string> modes;
Building this with a stringstream may be easier and would also not need another boost dependency.


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 272:       rapidjson::Document* document);
you could forward declare the class and move the include to the .cc file.


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/util/webserver.cc
File be/src/util/webserver.cc:

Line 199:   if (env->impala_server()->IsCoordinator()) modes.push_back("Coordinator");
Same here, maybe use stringstream?


http://gerrit.cloudera.org:8080/#/c/6628/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS1, Line 216:  
missing word?


Line 254:     else:
elif?


http://gerrit.cloudera.org:8080/#/c/6628/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 65:   // True if this is a coordinator node
What does it mean if these are unset?


http://gerrit.cloudera.org:8080/#/c/6628/1/tests/custom_cluster/test_coordinators.py
File tests/custom_cluster/test_coordinators.py:

Line 94:           operators are executed on 'expected_num_of_executors' nodes."""
nit: indendation


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-HasComments: Yes