You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ConfX (Jira)" <ji...@apache.org> on 2023/07/19 16:47:00 UTC

[jira] [Updated] (HADOOP-18811) Buggy ZKFCRpcServer constructor creates null object and crashes the rpcServer

     [ https://issues.apache.org/jira/browse/HADOOP-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ConfX updated HADOOP-18811:
---------------------------
    Description: 
h2. What happened:

In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding address and create a new ZKFCRpcServer object rpcServer. However rpcServer may be null when the ZKFCRpcServer constructor accepts a null policy provider and cause any later rpcServer usage a null pointer exception.
h2. Buggy code:

In ZKFailoverController.java
{code:java}
protected void initRPC() throws IOException {
  InetSocketAddress bindAddr = getRpcAddressToBindTo();
  LOG.info("ZKFC RpcServer binding to {}", bindAddr);
  rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider());  // <-- Here getpolicyProvider might be null
}
{code}
ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function below. This function directly use provider without check null and this turns out making rpcServer above to be a null object.

In ServiceAuthorizationManager.java
{code:java}
  @Private
  public void refreshWithLoadedConfiguration(Configuration conf, PolicyProvider provider) {
    ...
    // Parse the config file
    Service[] services = provider.getServices();   // <--- provider might be null here
    ... {code}
h2. How to trigger this bug:

(1) Set hadoop.security.authorization to true

(2) Run test org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations

(3) You will see the following stack trace:
{code:java}
java.lang.NullPointerException                                                          
        at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258)                                                                                                              
        at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63)      
        at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181)                                                                                                              
        at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177)                                                                                                              
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503)                                                                                                         
        at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177)            
        at org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301)   
        at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code}
(4) The null pointer exception here is due to the null {{rpcServer}} object caused by the bug described above.

You can use the reproduce.sh in the attachment to easily reproduce the bug:

We are happy to provide a patch if this issue is confirmed. 

  was:
h2. What happened:

In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding address and create a new ZKFCRpcServer object rpcServer. However rpcServer may be null when the ZKFCRpcServer constructor accepts a null policy provider and cause any later rpcServer usage a null pointer exception.
h2. Buggy code:

In ZKFailoverController.java
{code:java}
protected void initRPC() throws IOException {
  InetSocketAddress bindAddr = getRpcAddressToBindTo();
  LOG.info("ZKFC RpcServer binding to {}", bindAddr);
  rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider());  // <-- Here getpolicyProvider might be null
}
{code}
ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function below. This function directly use provider without check null and this turns out making rpcServer above to be a null object.

In ServiceAuthorizationManager.java
{code:java}
  @Private
  public void refreshWithLoadedConfiguration(Configuration conf, PolicyProvider provider) {
    ...
    // Parse the config file
    Service[] services = provider.getServices();   // <--- provider might be null here
    ... {code}
h2. How to trigger this bug:

(1) Set hadoop.security.authorization to true

(2) Run test org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations

(3) You will see the following stack trace:
{code:java}
java.lang.NullPointerException                                                          
        at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258)                                                                                                              
        at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63)      
        at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181)                                                                                                              
        at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177)                                                                                                              
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503)                                                                                                         
        at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177)            
        at org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301)   
        at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code}
(4) The null pointer exception here is due to the null {{rpcServer}} object caused by the bug described above.

 


> Buggy ZKFCRpcServer constructor creates null object and crashes the rpcServer
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-18811
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18811
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: ConfX
>            Priority: Critical
>
> h2. What happened:
> In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding address and create a new ZKFCRpcServer object rpcServer. However rpcServer may be null when the ZKFCRpcServer constructor accepts a null policy provider and cause any later rpcServer usage a null pointer exception.
> h2. Buggy code:
> In ZKFailoverController.java
> {code:java}
> protected void initRPC() throws IOException {
>   InetSocketAddress bindAddr = getRpcAddressToBindTo();
>   LOG.info("ZKFC RpcServer binding to {}", bindAddr);
>   rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider());  // <-- Here getpolicyProvider might be null
> }
> {code}
> ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function below. This function directly use provider without check null and this turns out making rpcServer above to be a null object.
> In ServiceAuthorizationManager.java
> {code:java}
>   @Private
>   public void refreshWithLoadedConfiguration(Configuration conf, PolicyProvider provider) {
>     ...
>     // Parse the config file
>     Service[] services = provider.getServices();   // <--- provider might be null here
>     ... {code}
> h2. How to trigger this bug:
> (1) Set hadoop.security.authorization to true
> (2) Run test org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations
> (3) You will see the following stack trace:
> {code:java}
> java.lang.NullPointerException                                                          
>         at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258)                                                                                                              
>         at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63)      
>         at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181)                                                                                                              
>         at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177)                                                                                                              
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503)                                                                                                         
>         at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177)            
>         at org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301)   
>         at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code}
> (4) The null pointer exception here is due to the null {{rpcServer}} object caused by the bug described above.
> You can use the reproduce.sh in the attachment to easily reproduce the bug:
> We are happy to provide a patch if this issue is confirmed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org