You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by brettKK <gi...@git.apache.org> on 2018/04/30 03:51:52 UTC

[GitHub] helix pull request #201: add null check in DeplayedAutoRebalancerz#computeNe...

GitHub user brettKK opened a pull request:

    https://github.com/apache/helix/pull/201

    add null check in DeplayedAutoRebalancerz#computeNewIdealState

    We have developed a static analysis tool [NPEDetector](https://github.com/lujiefsi/NPEDetector) to find some potential NPE. Our analysis shows that some callees may return null in corner case(e.g. node crash , IO exception), some of their callers have  !=null check but some do not have. In this issue we post a patch which can add  !=null  based on existed !=null  check. For example:
    
    Callee ClusterDataCache#getStateModelDef:
    ```
    public StateModelDefinition getStateModelDef(String stateModelDefRef) {
      if (stateModelDefRef == null) {
        return null;
      }
      return _stateModelDefMap.get(stateModelDefRef);
    }
    ```
    
    Caller AutoRebalancer#computeNewIdealState have !=null:
    ```
    StateModelDefinition stateModelDef = clusterData.getStateModelDef(stateModelName);
    if (stateModelDef == null) {
      LOG.error("State Model Definition null for resource: " + resourceName);
      throw new HelixException("State Model Definition null for resource: " + resourceName);
    }
    ```
    
    but another caller DelayedAutoRebalancer#computeNewIdealState does not have !=null check:
    ```
    StateModelDefinition stateModelDef =
        clusterData.getStateModelDef(currentIdealState.getStateModelDefRef());
    LinkedHashMap<String, Integer> stateCountMap =
        stateModelDef.getStateCountMap(activeNodes.size(), replicaCount);
    ```
    
    So we will add below code in non-(!=null) caller DelayedAutoRebalancer#computeNewIdealState
    ```
    if (stateModelDef == null) {
        throw new HelixException("State Model Definition null for resource:" + resourceName);
    }
    ```
    But due to we are not very  familiar with HELIX, hope some expert can review it.
    
    Thanks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brettKK/helix HELIX-702

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #201
    
----
commit 47ea7fbad82e4c2e4c7d846c0455a0aa502f3d97
Author: brettkk <10...@...>
Date:   2018-04-30T03:48:01Z

    add null check in DeplayedAutoRebalancerz#computeNewIdealState

----


---

[GitHub] helix issue #201: add null check in DeplayedAutoRebalancerz#computeNewIdealS...

Posted by zhan849 <gi...@git.apache.org>.
Github user zhan849 commented on the issue:

    https://github.com/apache/helix/pull/201
  
    1. IDEs are already doing NPE checking for us.
    2. You are just detecting null and throw another exception, how's it different than NPE?


---

[GitHub] helix issue #201: add null check in DeplayedAutoRebalancerz#computeNewIdealS...

Posted by zhan849 <gi...@git.apache.org>.
Github user zhan849 commented on the issue:

    https://github.com/apache/helix/pull/201
  
    @lujiefsi currently we have the assumption that resource will have state model def, which is registered by ParticipantManager. If you really want to fix the issue, then I'd suggest doing the following:
    
    - in computeNewIdealState, log error when we cannot find state model def, and mark all partitions as error, ResourceMonitor need to be updated accordingly. Don't throw exception as this will block rebalancing for all other valid resources
    
    - WorkflowConfig's start time is fetched from it's ScheduleConfig, which is enforced by builder (if no start time is provided, builder will fail the build). So we can assume it is always there. Similarly, if you really want to add check (assuming someone did not use our API to create object), don't throw exception, log error and record failure in workflow monitor


---

[GitHub] helix issue #201: add null check in DeplayedAutoRebalancerz#computeNewIdealS...

Posted by lujiefsi <gi...@git.apache.org>.
Github user lujiefsi commented on the issue:

    https://github.com/apache/helix/pull/201
  
    @zhan849 
    Great! I restudy the related code, these three are really false postive. I prefer to  close this issue without fixing. What's your opinion? @brettKK 


---

[GitHub] helix issue #201: add null check in DeplayedAutoRebalancerz#computeNewIdealS...

Posted by lujiefsi <gi...@git.apache.org>.
Github user lujiefsi commented on the issue:

    https://github.com/apache/helix/pull/201
  
    @zhan849 
    Q1: We use findbugs to check helix and found it only perform intra-procedural NPE analysis. Our analysis is  interprocedural.
    Q2: I think @brettKK want to use semantic exception replace the ugly NPEs


---