You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Ashwin Murthy <as...@gmail.com> on 2016/04/01 05:19:09 UTC

Re: Review Request 45511: AURORA-1493: create ELB-friendly endpoint to detect leading scheduler


> On March 31, 2016, 6:33 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/http/LeaderHealth.java, line 49
> > <https://reviews.apache.org/r/45511/diff/4/?file=1320481#file1320481line49>
> >
> >     Can we switch on this enum, rather than if/else branching? If we also add an explicit case for each enum value and reserve the default block for an unexpected status it will allow this code to continue to work as expected even if new values are added to the enum in the future (i.e. as things stand today, if we added a hypothetical state like `LeaderStatus.ALSO_LEADING`, it would fall into the `SERVICE_UNAVAILABLE` branch).
> 
> Ashwin Murthy wrote:
>     @Bill, @Joshua, 
>     
>     Can I do the following?
>     
>     1. Return 200 is leader
>     2. Return 501 (Not implemented) if this is not the leader but a leader exists
>     3. Return 503 (Service unavailable) if there is no current leader
>     4. Return 500 (Internal server error) for the default case that Joshua points out
>     
>     Would you be ok with this? I
> 
> Joshua Cohen wrote:
>     In the [LeaderRedirectFilter](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/http/LeaderRedirectFilter.java#L70-L91) we do the following:
>     
>     Is Leading -> continue request (i.e. 200)
>     No Leader -> 503
>     Not Leading -> 307
>     Unknown State -> 500
>     
>     So that meshes pretty well with what you describe, you're just replacing the redirect when the instance is not the leader with a non 2xx. I'm not sure if I love 501 for that case, but I'm having a hard time finding a better alternative.
>     
>     That being said, I feel like I've asked this before, but you can clarify why the existing `LeaderRedirectFilter` is not sufficient for ELB? If I'm reading the [docs](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-healthchecks.html#health-check-configuration) right, it seems like any non-200 reply is sufficient to indicate the instance is not healthy (i.e. not the leader). So given that `LeaderRedirectFilter` already returns a 307 for this case, can we not just use that?
> 
> Ashwin Murthy wrote:
>     @Bill, @Joshua, 
>     
>     Actually, I changed my mind. The prev one is rather more complex for this simple scenario than needs to be.
>     
>     How about?:
>     Return 200 if instance is leader
>     Return 503 (Service unavailablE) for the case when this instance is not the leader but a leader exists
>     Return 500 (internal server error) for all other (default) cases since this is truly an error. 
>     
>     Would you be ok with this?

The reason the 307 does not work in all cases is many LBs like HAProxy default to using response code 307 to mean healthy. Some LBs might even case the redirect. Since strictly speakin per HTTP RFC, this seems wrong to do, having a separate endpoint that returns a 500 level code is being pursued.


- Ashwin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45511/#review126394
-----------------------------------------------------------


On March 31, 2016, 5:06 a.m., Ashwin Murthy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45511/
> -----------------------------------------------------------
> 
> (Updated March 31, 2016, 5:06 a.m.)
> 
> 
> Review request for Aurora and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> AURORA-1493: create ELB-friendly endpoint to detect leading scheduler. The fix is to add a new endpoint - "/leaderhealth" which returns http status code 200 (OK) if the instance is the leader. If the instance is not the leader but a leading exists, returns 500 (Internal server error). If there is no leader at all, returns 503 (Service unavailable)
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/http/LeaderHealth.java PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/http/LeaderHealthTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/45511/diff/
> 
> 
> Testing
> -------
> 
> Added new unit test
> 
> 
> Thanks,
> 
> Ashwin Murthy
> 
>