You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@felix.apache.org by "Sahoo (JIRA)" <ji...@apache.org> on 2011/01/19 17:12:43 UTC

[jira] Created: (FELIX-2791) NPE in getStartLevel

NPE in getStartLevel
--------------------

                 Key: FELIX-2791
                 URL: https://issues.apache.org/jira/browse/FELIX-2791
             Project: Felix
          Issue Type: Bug
          Components: File Install
    Affects Versions: fileinstall-3.1.4
            Reporter: Sahoo
            Priority: Blocker
             Fix For: fileinstall-3.1.6


We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:

java.lang.NullPointerException
        at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
        at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
        at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
In main loop, we have serious trouble: java.lang.NullPointerException

NPE comes from FileInstall.java:283, which looks like this:
            return (StartLevel) startLevel.waitForService(timeout);

It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.

So, I suggest we change startLevel to a volatile field. Any comments?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (FELIX-2791) NPE in getStartLevel

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983773#action_12983773 ] 

Sahoo commented on FELIX-2791:
------------------------------

Even better would be to synchronize start() and stop() methods in FileInstall instead of always paying the cost of making the field volatile.

> NPE in getStartLevel
> --------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (FELIX-2791) NPE in getStartLevel

Posted by "Guillaume Nodet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983880#action_12983880 ] 

Guillaume Nodet commented on FELIX-2791:
----------------------------------------

I don't think the code is reordered, but the ConfigAdminSupport  is actually registered before the startLevel service tracker, so if it is called synchronously during the registration , the DirectoryWatcher thread might be started before the tracker is created.

So I'd rather suggest that the trackers are created and started before the ConfigAdmin support, so maybe move the ConfigAdminSupport creation at the end of the FileInstall#start() method.

Could you give that a try and see if it avoids the exception?

> NPE in getStartLevel
> --------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (FELIX-2791) NPE in getStartLevel

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahoo updated FELIX-2791:
-------------------------

    Comment: was deleted

(was: Even better would be to synchronize start() and stop() methods in FileInstall instead of always paying the cost of making the field volatile.)

> NPE in getStartLevel
> --------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (FELIX-2791) NPE in getStartLevel due to race condition

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahoo updated FELIX-2791:
-------------------------

    Attachment:     (was: patch.txt)

> NPE in getStartLevel due to race condition
> ------------------------------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Assignee: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (FELIX-2791) NPE in getStartLevel due to race condition

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahoo updated FELIX-2791:
-------------------------

    Attachment: patch.txt

PFA a patch that solves the issue. It introduces a barrier and a flag to make sure watcher threads don't run until fileinstall has finished initialisation. Please review if you want to, as I plan to commit and make a new release of fileinstall ASAP. Thanks.

> NPE in getStartLevel due to race condition
> ------------------------------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Assignee: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>         Attachments: patch.txt
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (FELIX-2791) NPE in getStartLevel due to race condition

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984069#action_12984069 ] 

Sahoo commented on FELIX-2791:
------------------------------

The exception is about startLevel being null. Where is configAdmin coming into picture?

> NPE in getStartLevel due to race condition
> ------------------------------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Assignee: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>         Attachments: patch.txt
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (FELIX-2791) NPE in getStartLevel due to race condition

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahoo updated FELIX-2791:
-------------------------

    Attachment: patch.txt

PFA a patch that solves the issue. It introduces a barrier and a flag to make sure watcher threads don't run until fileinstall has finished initialisation. Unlike the previous patch, this one resets the boolean initialised field in stop(). Please review if you want to, as I plan to commit and make a new release of fileinstall ASAP. Thanks. 

> NPE in getStartLevel due to race condition
> ------------------------------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Assignee: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>         Attachments: patch.txt
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (FELIX-2791) NPE in getStartLevel due to race condition

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahoo resolved FELIX-2791.
--------------------------

    Resolution: Fixed

Fixed in 1061240

> NPE in getStartLevel due to race condition
> ------------------------------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Assignee: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>         Attachments: patch.txt
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (FELIX-2791) NPE in getStartLevel due to race condition

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984069#action_12984069 ] 

Sahoo edited comment on FELIX-2791 at 1/20/11 6:46 AM:
-------------------------------------------------------

I now see why the watcher thread can be started because of configuration event. In any case, introducing a barrier is better way to ensure that watcher threads are not active before fileinstall is initialized. So, I am committing the patch.

Sending        fileinstall/src/main/java/org/apache/felix/fileinstall/internal/DirectoryWatcher.java
Sending        fileinstall/src/main/java/org/apache/felix/fileinstall/internal/FileInstall.java
Transmitting file data ..
Committed revision 1061240.


      was (Author: sahoo):
    The exception is about startLevel being null. Where is configAdmin coming into picture?
  
> NPE in getStartLevel due to race condition
> ------------------------------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Assignee: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>         Attachments: patch.txt
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (FELIX-2791) NPE in getStartLevel due to race condition

Posted by "Sahoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FELIX-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahoo updated FELIX-2791:
-------------------------

    Assignee: Sahoo
     Summary: NPE in getStartLevel due to race condition  (was: NPE in getStartLevel)

Assigning to myself

> NPE in getStartLevel due to race condition
> ------------------------------------------
>
>                 Key: FELIX-2791
>                 URL: https://issues.apache.org/jira/browse/FELIX-2791
>             Project: Felix
>          Issue Type: Bug
>          Components: File Install
>    Affects Versions: fileinstall-3.1.4
>            Reporter: Sahoo
>            Assignee: Sahoo
>            Priority: Blocker
>             Fix For: fileinstall-3.1.6
>
>
> We are using FileInstall 3.1.4 and one of our user is seeing the following exception repeated in the directory watcher main loop continuously:
> java.lang.NullPointerException
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:283)
>         at org.apache.felix.fileinstall.internal.FileInstall.getStartLevel(FileInstall.java:276)
>         at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:237)
> In main loop, we have serious trouble: java.lang.NullPointerException
> NPE comes from FileInstall.java:283, which looks like this:
>             return (StartLevel) startLevel.waitForService(timeout);
> It looks like a concurrency bug to me in FileInstall. The only explanation that I have so far is that in the target environment, code is getting reordered and hence directory watcher thread is spwaned before startLevel is initialzed.. Since startLevel is not declared volatile, nor is it accessed from a synchronized block, DirectoryWatcher thread continues to see the stale value.
> So, I suggest we change startLevel to a volatile field. Any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.