You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@karaf.apache.org by Fabian Lange <fa...@codecentric.de> on 2015/08/11 13:07:07 UTC

2 Potential Bugs - Not figuring out how to reproduce

Hi guys,
I was torn between posting to users or dev, but I am pretty sure both my
observations are caused by some kind of race condition or bug, rather than
my misuse (still a possibility :-))

Problem 1)
data/log/karaf.log contains every log message twice

when i watch the log file on disk, all lines are duplicated, when I log
into karaf via client.sh and I do a log:display, I see the exact same
lines, but only once.
I have not yet been able to reproduce the problem willingly. however it
happens relatively frequently on my installations(about 30% i would say).

My closest guess is that somehow the logging subsystem reloads and leaves
appenders alive. Could this be related to using "private static final
LOGGER" ?

Problem 2)
sometimes karaf does not start at all, but hangs after some seemingly
duplicate  locking messages

This seems like a race condition to me:
Aug 11, 2015 9:47:59 AM org.apache.karaf.main.Main launch
INFO: Installing and starting initial bundles
Aug 11, 2015 9:48:00 AM org.apache.karaf.main.Main launch
INFO: All initial bundles installed and set to start
Aug 11, 2015 9:48:00 AM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Trying to lock /opt/karaf/lock
Aug 11, 2015 9:48:00 AM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Lock acquired
Aug 11, 2015 9:48:00 AM org.apache.karaf.main.Main$KarafLockCallback
lockAquired
INFO: Lock acquired. Setting startlevel to 100
Aug 11, 2015 9:48:24 AM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Trying to lock /opt/karaf/lock
Aug 11, 2015 9:48:24 AM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Lock acquired
Aug 11, 2015 9:48:24 AM org.apache.karaf.main.Main$KarafLockCallback
lockAquired
INFO: Lock acquired. Setting startlevel to 100


I can open Jira issues if you think i found something here, but
unfortunately I have no idea how to troubleshoot it further, if you have
thoughts, please let me know. I am willing to experiment.

Fabian

Re: 2 Potential Bugs - Not figuring out how to reproduce

Posted by Fabian Lange <fa...@codecentric.de>.
Hi,
okay, Problem 1 is really embarrassing, but hey, better put it out there...
It was not 30% but 100%. And the reason for that is additional log4j config
which had "additivity" not set to false.

Great...

Fabian

On Tue, Aug 11, 2015 at 1:07 PM, Fabian Lange <fa...@codecentric.de>
wrote:

> Hi guys,
> I was torn between posting to users or dev, but I am pretty sure both my
> observations are caused by some kind of race condition or bug, rather than
> my misuse (still a possibility :-))
>
> Problem 1)
> data/log/karaf.log contains every log message twice
>
> when i watch the log file on disk, all lines are duplicated, when I log
> into karaf via client.sh and I do a log:display, I see the exact same
> lines, but only once.
> I have not yet been able to reproduce the problem willingly. however it
> happens relatively frequently on my installations(about 30% i would say).
>
> My closest guess is that somehow the logging subsystem reloads and leaves
> appenders alive. Could this be related to using "private static final
> LOGGER" ?
>
> Problem 2)
> sometimes karaf does not start at all, but hangs after some seemingly
> duplicate  locking messages
>
> This seems like a race condition to me:
> Aug 11, 2015 9:47:59 AM org.apache.karaf.main.Main launch
> INFO: Installing and starting initial bundles
> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.Main launch
> INFO: All initial bundles installed and set to start
> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Trying to lock /opt/karaf/lock
> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Lock acquired
> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.Main$KarafLockCallback
> lockAquired
> INFO: Lock acquired. Setting startlevel to 100
> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Trying to lock /opt/karaf/lock
> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Lock acquired
> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.Main$KarafLockCallback
> lockAquired
> INFO: Lock acquired. Setting startlevel to 100
>
>
> I can open Jira issues if you think i found something here, but
> unfortunately I have no idea how to troubleshoot it further, if you have
> thoughts, please let me know. I am willing to experiment.
>
> Fabian
>

Re: 2 Potential Bugs - Not figuring out how to reproduce

Posted by Fabian Lange <fa...@codecentric.de>.
Hi Christian,
yes thats the purpose. However, why does it output the same stuff twice
after a couple of seconds?
and why doesnt the process terminate with "already in use" ? That process
hung there.

Additionally: this happens on a fresh amazon machine, where I just unzipped
karaf.
It also was happening after i deleted ata dir, instance dir and lock file
and ensured the process was dead.

What information could help debugging this? (Karaf 4.0.0 btw)

Fabian

On Tue, Aug 11, 2015 at 2:04 PM, Christian Schneider <
chris@die-schneider.net> wrote:

> Hi Fabian,
>
> problem 2 typically happens if there is still some other karaf process
> running. In this case the new karaf will detect the file lock and not fully
> come up until the other process is finished. This is part of the karaf
> clustering and normal behaviour.
> See https://karaf.apache.org/manual/latest/users-guide/failover.html
>
> Christian
>
>
> On 11.08.2015 13:07, Fabian Lange wrote:
>
>> Hi guys,
>> I was torn between posting to users or dev, but I am pretty sure both my
>> observations are caused by some kind of race condition or bug, rather than
>> my misuse (still a possibility :-))
>>
>> Problem 1)
>> data/log/karaf.log contains every log message twice
>>
>> when i watch the log file on disk, all lines are duplicated, when I log
>> into karaf via client.sh and I do a log:display, I see the exact same
>> lines, but only once.
>> I have not yet been able to reproduce the problem willingly. however it
>> happens relatively frequently on my installations(about 30% i would say).
>>
>> My closest guess is that somehow the logging subsystem reloads and leaves
>> appenders alive. Could this be related to using "private static final
>> LOGGER" ?
>>
>> Problem 2)
>> sometimes karaf does not start at all, but hangs after some seemingly
>> duplicate  locking messages
>>
>> This seems like a race condition to me:
>> Aug 11, 2015 9:47:59 AM org.apache.karaf.main.Main launch
>> INFO: Installing and starting initial bundles
>> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.Main launch
>> INFO: All initial bundles installed and set to start
>> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.lock.SimpleFileLock lock
>> INFO: Trying to lock /opt/karaf/lock
>> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.lock.SimpleFileLock lock
>> INFO: Lock acquired
>> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.Main$KarafLockCallback
>> lockAquired
>> INFO: Lock acquired. Setting startlevel to 100
>> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.lock.SimpleFileLock lock
>> INFO: Trying to lock /opt/karaf/lock
>> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.lock.SimpleFileLock lock
>> INFO: Lock acquired
>> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.Main$KarafLockCallback
>> lockAquired
>> INFO: Lock acquired. Setting startlevel to 100
>>
>>
>> I can open Jira issues if you think i found something here, but
>> unfortunately I have no idea how to troubleshoot it further, if you have
>> thoughts, please let me know. I am willing to experiment.
>>
>> Fabian
>>
>>
>
> --
> Christian Schneider
> http://www.liquid-reality.de
>
> Open Source Architect
> http://www.talend.com
>
>

Re: 2 Potential Bugs - Not figuring out how to reproduce

Posted by Christian Schneider <ch...@die-schneider.net>.
Hi Fabian,

problem 2 typically happens if there is still some other karaf process 
running. In this case the new karaf will detect the file lock and not 
fully come up until the other process is finished. This is part of the 
karaf clustering and normal behaviour.
See https://karaf.apache.org/manual/latest/users-guide/failover.html

Christian

On 11.08.2015 13:07, Fabian Lange wrote:
> Hi guys,
> I was torn between posting to users or dev, but I am pretty sure both my
> observations are caused by some kind of race condition or bug, rather than
> my misuse (still a possibility :-))
>
> Problem 1)
> data/log/karaf.log contains every log message twice
>
> when i watch the log file on disk, all lines are duplicated, when I log
> into karaf via client.sh and I do a log:display, I see the exact same
> lines, but only once.
> I have not yet been able to reproduce the problem willingly. however it
> happens relatively frequently on my installations(about 30% i would say).
>
> My closest guess is that somehow the logging subsystem reloads and leaves
> appenders alive. Could this be related to using "private static final
> LOGGER" ?
>
> Problem 2)
> sometimes karaf does not start at all, but hangs after some seemingly
> duplicate  locking messages
>
> This seems like a race condition to me:
> Aug 11, 2015 9:47:59 AM org.apache.karaf.main.Main launch
> INFO: Installing and starting initial bundles
> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.Main launch
> INFO: All initial bundles installed and set to start
> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Trying to lock /opt/karaf/lock
> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Lock acquired
> Aug 11, 2015 9:48:00 AM org.apache.karaf.main.Main$KarafLockCallback
> lockAquired
> INFO: Lock acquired. Setting startlevel to 100
> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Trying to lock /opt/karaf/lock
> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Lock acquired
> Aug 11, 2015 9:48:24 AM org.apache.karaf.main.Main$KarafLockCallback
> lockAquired
> INFO: Lock acquired. Setting startlevel to 100
>
>
> I can open Jira issues if you think i found something here, but
> unfortunately I have no idea how to troubleshoot it further, if you have
> thoughts, please let me know. I am willing to experiment.
>
> Fabian
>


-- 
Christian Schneider
http://www.liquid-reality.de

Open Source Architect
http://www.talend.com