You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Rogier Dikkes (JIRA)" <ji...@apache.org> on 2016/11/08 11:58:58 UTC

[jira] [Created] (AURORA-1811) sla_list_safe_domain no longer reports SLA usage

Rogier Dikkes created AURORA-1811:
-------------------------------------

             Summary: sla_list_safe_domain no longer reports SLA usage
                 Key: AURORA-1811
                 URL: https://issues.apache.org/jira/browse/AURORA-1811
             Project: Aurora
          Issue Type: Bug
          Components: Client, Maintenance, SLA
    Affects Versions: 0.16.0
         Environment: Vagrant image - Ubuntu, Centos 7.2
            Reporter: Rogier Dikkes
            Priority: Minor
             Fix For: 0.14.0


We recently had to patch hosts, in our situation we have a couple of services that run less than 2-5 instances with production = true and tier = preferred as provided in the default example documentation. 

As we understood host_drain is not configurable to set the minimum job instance count, the default is is 10. We tried to compile a list of hosts with aurora_admin sla_list_safe_domain that are running these services to feed host_drain with an unsafe_hosts_file. 

When we ran the aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 95 1m the scheduler returns: 
 INFO] Response from scheduler: OK (message: )

As if there are no hosts. We tried to change the percentage and duration to see if anything was returned but we never receive an different response.

To ensure that the client is not the cause we used the 0.16.0 client against an 0.14.0 cluster, this cluster reports hosts that are safe to kill without violating job sla's. 

To ensure its not a faulty cluster setup on our part we started the vagrant sandbox, started an task with 3 instances with tier = preferred and production = True.

commands used:
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 20 50m
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 90 5m

With -l or with time and percentage variations never changes the outcome.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)