You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Xiang Li (JIRA)" <ji...@apache.org> on 2018/02/04 13:39:00 UTC

[jira] [Comment Edited] (HBASE-19917) Improve RSGroupBasedLoadBalancer#filterServers() to be more efficient

    [ https://issues.apache.org/jira/browse/HBASE-19917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351784#comment-16351784 ] 

Xiang Li edited comment on HBASE-19917 at 2/4/18 1:38 PM:
----------------------------------------------------------

Thanks for your comment [~yuzhihong@gmail.com]!
{{filterServers()}} is only called in {{RSGroupBasedLoadBalancer#filterServers()}}, as follow:
{code}
return filterServers(RSGroupInfo.getServers(), onlineServers);
{code}
{{RSGroupInfo#getServers()}} returns servers, a SortedSet. It is a TreeSet actually, built by its constructor.

Given a TreeSet, there are 2 ways: (Let's say when calling {{filterServers()}}, size of servers is n and size of onlineServers is m)
# Keep using TreeSet. Time complexity is O(m * logn). Because TreeSet#contains() is logn and we loop for m.
# Turn TreeSet into HashSet, to pursue {noformat}O(1){noformat} for contains(). Time complexity is O(m + n), as the following 2 steps are included:
## Construct a HashSet from a TreeSet. It is O(n) for time complexity (if I get it correctly) as it needs to iterate the TreeSet
## Calculate the union of severs and onlineServers. The time complexity is m * O(1).
I think #1 is good enough, although it is worse than #2 which is linear. What is your opinion?

Regarding
bq. If possible, we should change those to using HashSet.
In RSGroupInfo, servers as well as tables is TreeSet. According to the comments, 
{code}
// Keep servers in a sorted set so has an expected ordering when displayed.
private final SortedSet<Address> servers;
// Keep tables sorted too.
private final SortedSet<TableName> tables;
{code}
TreeSet is only used for display purpose. I am checking if HashSet could be used to replace TreeSet throughout the calling chain.




was (Author: water):
Thanks for your comment [~yuzhihong@gmail.com]!
{{filterServers()}} is only called in {{RSGroupBasedLoadBalancer#filterServers()}}, as follow:
{code}
return filterServers(RSGroupInfo.getServers(), onlineServers);
{code}
{{RSGroupInfo#getServers()}} returns servers, a SortedSet. It is a TreeSet actually, built by its constructor.

Given a TreeSet, there are 2 ways: (Let's say when calling {{filterServers()}}, size of servers is n and size of onlineServers is m)
# Keep using TreeSet. Time complexity is O(m * logn). Because TreeSet#contains() is logn and we loop for m.
# Turn TreeSet into HashSet, to pursue O(1) for contains(). Time complexity is O(m + n), as the following 2 steps are included:
## Construct a HashSet from a TreeSet. It is O(n) for time complexity (if I get it correctly) as it needs to iterate the TreeSet
## Calculate the union of severs and onlineServers. The time complexity is m * O(1).
I think #1 is good enough, although it is worse than #2 which is linear. What is your opinion?

Regarding
bq. If possible, we should change those to using HashSet.
In RSGroupInfo, servers as well as tables is TreeSet. According to the comments, 
{code}
// Keep servers in a sorted set so has an expected ordering when displayed.
private final SortedSet<Address> servers;
// Keep tables sorted too.
private final SortedSet<TableName> tables;
{code}
TreeSet is only used for display purpose. I am checking if HashSet could be used to replace TreeSet throughout the calling chain.



> Improve RSGroupBasedLoadBalancer#filterServers() to be more efficient
> ---------------------------------------------------------------------
>
>                 Key: HBASE-19917
>                 URL: https://issues.apache.org/jira/browse/HBASE-19917
>             Project: HBase
>          Issue Type: Improvement
>          Components: rsgroup
>            Reporter: Xiang Li
>            Assignee: Xiang Li
>            Priority: Minor
>         Attachments: HBASE-19917.master.000.patch
>
>
> {code:title=hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java|borderStyle=solid}
> private List<ServerName> filterServers(Collection<Address> servers,
>     Collection<ServerName> onlineServers) {
>   ArrayList<ServerName> finalList = new ArrayList<ServerName>();
>   for (Address server : servers) {
>     for(ServerName curr: onlineServers) {
>       if(curr.getAddress().equals(server)) {
>         finalList.add(curr);
>       }
>     }
>   }
>   return finalList;
> }
> {code}
> filterServers is to return the union of servers and onlineServers. The current implementation has time complexity as O(m * n) (2 loops), could be in O(m + n) if HashSet is used. The trade-off is space complexity is increased.
> Another point which could be improved: filterServers() is only called in filterOfflineServers(). filterOfflineServers calls filterServers(Set, List). The current filterServers(Collection, Collection) seems could be improved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)