You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2010/04/27 17:50:24 UTC

LuceneJUnitResultFormatter sometimes fails to lock

Hi

I ran "ant test-core" today and hit this:

[junit] Exception in thread "main" java.lang.RuntimeException: Failed to
acquire random test lock; please verify filesystem for lock directory
'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
[junit] at
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
[junit] at
org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
[junit] at
org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)

All the tests still pass, but Ant reports a failure in the end. Also, this
rarely happens, but I've run into it several times already. Anyone got an
idea?

Shai

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Mark Miller <ma...@gmail.com>.
Right, but we work around other issues that look like bugs. If our 
NativeFSLock fails because a file on windows is briefly unavailable, I 
think that's a bug. Whats wrong with pausing and waiting a short bit for 
a retry when Windows is detected and this issue is triggered.

On 4/27/10 12:50 PM, Uwe Schindler wrote:
> No, the error is correct, the problem is a different one (outside of NativeFSLock):
> On windows (vista or 7) some services like virus scanners (Microsoft Security Essentials, Avira Antivir and others) and the windows search service sometimes locks the whole dir/files in it for very short time. After disabling virus scanning and most importantly windows search for the lucene development folders, the problems are gone. This behavior is also the source of some chkdsks running after restart when using svn in parallel to windows search (this is a known bug in ntfs.sys of windows 7, event id 55 in syslogs).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Tuesday, April 27, 2010 6:31 PM
>> To: dev@lucene.apache.org
>> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
>>
>> Ah - didn't look closely. This is while making the lock, not trying to
>> acquire it for stdout locking. So that seems like a bug in our native
>> lock impl we should try and fix.
>>
>> On 4/27/10 12:27 PM, Uwe Schindler wrote:
>>> When aquiring a test lock it does not wait. It just is not able to
>> produce the file there. This happens sometimes on windows and has
>> nothing to do with the tests, is a problem of NativeLockF.
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>> -----Original Message-----
>>>> From: Mark Miller [mailto:markrmiller@gmail.com]
>>>> Sent: Tuesday, April 27, 2010 6:20 PM
>>>> To: dev@lucene.apache.org
>>>> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
>>>>
>>>> We might need a higher timeout. Its like 5 seconds now. Otherwise we
>>>> should try and isolate the problem.
>>>>
>>>> - Mark
>>>>
>>>> On 4/27/10 11:52 AM, Uwe Schindler wrote:
>>>>> Windows?
>>>>>
>>>>> -----
>>>>>
>>>>> Uwe Schindler
>>>>>
>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>
>>>>> http://www.thetaphi.de<http://www.thetaphi.de/>
>>>>>
>>>>> eMail: uwe@thetaphi.de
>>>>>
>>>>> *From:* Shai Erera [mailto:serera@gmail.com]
>>>>> *Sent:* Tuesday, April 27, 2010 5:50 PM
>>>>> *To:* dev@lucene.apache.org
>>>>> *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
>>>>>
>>>>> Hi
>>>>>
>>>>> I ran "ant test-core" today and hit this:
>>>>>
>>>>> [junit] Exception in thread "main" java.lang.RuntimeException:
>> Failed
>>>> to
>>>>> acquire random test lock; please verify filesystem for lock
>> directory
>>>>> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports
>> locking
>>>>> [junit] at
>>>>>
>>>>
>> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
>>>> kFactory.java:88)
>>>>> [junit] at
>>>>>
>>>>
>> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
>>>> y.java:127)
>>>>> [junit] at
>>>>>
>>>>
>> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
>>>> ultFormatter.java:74)
>>>>>
>>>>> All the tests still pass, but Ant reports a failure in the end.
>> Also,
>>>>> this rarely happens, but I've run into it several times already.
>>>> Anyone
>>>>> got an idea?
>>>>>
>>>>> Shai
>>>>>
>>>>
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>> --------------------------------------------------------------------
>> -
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>


-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Uwe Schindler <uw...@thetaphi.de>.
> After disabling virus scanning and most importantly windows search for
> the lucene development folders, the problems are gone. This behavior is
> also the source of some chkdsks running after restart when using svn in
> parallel to windows search (this is a known bug in ntfs.sys of windows
> 7, event id 55 in syslogs).

See also: http://social.technet.microsoft.com/Forums/en/w7itprogeneral/thread/df935a52-a0a9-4f67-ac82-bc39e0585148

> > -----Original Message-----
> > From: Mark Miller [mailto:markrmiller@gmail.com]
> > Sent: Tuesday, April 27, 2010 6:31 PM
> > To: dev@lucene.apache.org
> > Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> >
> > Ah - didn't look closely. This is while making the lock, not trying
> to
> > acquire it for stdout locking. So that seems like a bug in our native
> > lock impl we should try and fix.
> >
> > On 4/27/10 12:27 PM, Uwe Schindler wrote:
> > > When aquiring a test lock it does not wait. It just is not able to
> > produce the file there. This happens sometimes on windows and has
> > nothing to do with the tests, is a problem of NativeLockF.
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > >> -----Original Message-----
> > >> From: Mark Miller [mailto:markrmiller@gmail.com]
> > >> Sent: Tuesday, April 27, 2010 6:20 PM
> > >> To: dev@lucene.apache.org
> > >> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> > >>
> > >> We might need a higher timeout. Its like 5 seconds now. Otherwise
> we
> > >> should try and isolate the problem.
> > >>
> > >> - Mark
> > >>
> > >> On 4/27/10 11:52 AM, Uwe Schindler wrote:
> > >>> Windows?
> > >>>
> > >>> -----
> > >>>
> > >>> Uwe Schindler
> > >>>
> > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>>
> > >>> http://www.thetaphi.de<http://www.thetaphi.de/>
> > >>>
> > >>> eMail: uwe@thetaphi.de
> > >>>
> > >>> *From:* Shai Erera [mailto:serera@gmail.com]
> > >>> *Sent:* Tuesday, April 27, 2010 5:50 PM
> > >>> *To:* dev@lucene.apache.org
> > >>> *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
> > >>>
> > >>> Hi
> > >>>
> > >>> I ran "ant test-core" today and hit this:
> > >>>
> > >>> [junit] Exception in thread "main" java.lang.RuntimeException:
> > Failed
> > >> to
> > >>> acquire random test lock; please verify filesystem for lock
> > directory
> > >>> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports
> > locking
> > >>> [junit] at
> > >>>
> > >>
> >
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
> > >> kFactory.java:88)
> > >>> [junit] at
> > >>>
> > >>
> >
> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
> > >> y.java:127)
> > >>> [junit] at
> > >>>
> > >>
> >
> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
> > >> ultFormatter.java:74)
> > >>>
> > >>> All the tests still pass, but Ant reports a failure in the end.
> > Also,
> > >>> this rarely happens, but I've run into it several times already.
> > >> Anyone
> > >>> got an idea?
> > >>>
> > >>> Shai
> > >>>
> > >>
> > >>
> > >> --
> > >> - Mark
> > >>
> > >> http://www.lucidimagination.com
> > >>
> > >> ------------------------------------------------------------------
> --
> > -
> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: dev-help@lucene.apache.org
> > >
> > >
> > >
> > > -------------------------------------------------------------------
> --
> > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: dev-help@lucene.apache.org
> > >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Uwe Schindler <uw...@thetaphi.de>.
No, the error is correct, the problem is a different one (outside of NativeFSLock):
On windows (vista or 7) some services like virus scanners (Microsoft Security Essentials, Avira Antivir and others) and the windows search service sometimes locks the whole dir/files in it for very short time. After disabling virus scanning and most importantly windows search for the lucene development folders, the problems are gone. This behavior is also the source of some chkdsks running after restart when using svn in parallel to windows search (this is a known bug in ntfs.sys of windows 7, event id 55 in syslogs).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Tuesday, April 27, 2010 6:31 PM
> To: dev@lucene.apache.org
> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> 
> Ah - didn't look closely. This is while making the lock, not trying to
> acquire it for stdout locking. So that seems like a bug in our native
> lock impl we should try and fix.
> 
> On 4/27/10 12:27 PM, Uwe Schindler wrote:
> > When aquiring a test lock it does not wait. It just is not able to
> produce the file there. This happens sometimes on windows and has
> nothing to do with the tests, is a problem of NativeLockF.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller [mailto:markrmiller@gmail.com]
> >> Sent: Tuesday, April 27, 2010 6:20 PM
> >> To: dev@lucene.apache.org
> >> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> >>
> >> We might need a higher timeout. Its like 5 seconds now. Otherwise we
> >> should try and isolate the problem.
> >>
> >> - Mark
> >>
> >> On 4/27/10 11:52 AM, Uwe Schindler wrote:
> >>> Windows?
> >>>
> >>> -----
> >>>
> >>> Uwe Schindler
> >>>
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>>
> >>> http://www.thetaphi.de<http://www.thetaphi.de/>
> >>>
> >>> eMail: uwe@thetaphi.de
> >>>
> >>> *From:* Shai Erera [mailto:serera@gmail.com]
> >>> *Sent:* Tuesday, April 27, 2010 5:50 PM
> >>> *To:* dev@lucene.apache.org
> >>> *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
> >>>
> >>> Hi
> >>>
> >>> I ran "ant test-core" today and hit this:
> >>>
> >>> [junit] Exception in thread "main" java.lang.RuntimeException:
> Failed
> >> to
> >>> acquire random test lock; please verify filesystem for lock
> directory
> >>> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports
> locking
> >>> [junit] at
> >>>
> >>
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
> >> kFactory.java:88)
> >>> [junit] at
> >>>
> >>
> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
> >> y.java:127)
> >>> [junit] at
> >>>
> >>
> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
> >> ultFormatter.java:74)
> >>>
> >>> All the tests still pass, but Ant reports a failure in the end.
> Also,
> >>> this rarely happens, but I've run into it several times already.
> >> Anyone
> >>> got an idea?
> >>>
> >>> Shai
> >>>
> >>
> >>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >> --------------------------------------------------------------------
> -
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
I think that's a great idea Uwe ! It will reduce the chance for collision
even further.

After we've discussed this on IRC, Uwe raised an important point about why
the lock file must be deleted. It is possible that two JVMs will attempt to
lock the same Directory, one w/ Native and the other w/ Simple. If we won't
check in obtain() whether the file exists, it might obtain a native lock,
while the Directory is actually locked by another JVM using Simple. Uwe also
mentioned Native was fixed in 2.9? Is that right Uwe - did I get that part
correct?

So, I believe the changes that should be made are:
1) Use System.nanoTime() as a seed to Random.
2) Use ManagementFactory.getRuntimeMXBean().getName() as part of the test
lock name.
3) In release(), if delete() fails, check if the file indeed exists. If it
is, let's attempt a re-delete() few ms later.
4) If (3) still fails, I think we should throw an exception, or attempt a
deleteOnExit.

What do you think?

Shai

On Thu, Apr 29, 2010 at 12:17 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> A possibility to get something that does not depend on Date/Time and so on:
> ManagementFactory.getRuntimeMXBean().getName()
>
> This returns a unique identifier of the running Java VM. On most platforms,
> this contains the process ID. Maybe we should use this as an additional
> information for the test file name. As it's an String and may contain
> incompatible chars for a filename, maybe use the hashCode() of the returned
> String.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > Sent: Thursday, April 29, 2010 12:01 PM
> > To: dev@lucene.apache.org
> > Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> >
> > On Wed, Apr 28, 2010 at 1:52 PM, Shai Erera <se...@gmail.com> wrote:
> > > I use 1.6.0_18.
> > >
> > > Maybe seeding w/ System.nanoTime() will help, but don't you think
> > > NativeFSLock should be more robust anyway? Even for the regular lock
> > > file this can happen if at the same time delete() is attempted the
> > > file is held by another process ...
> >
> > Oh definitely -- I think we should do both (fix the random seeding &
> > the robustness fixes).
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

RE: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Uwe Schindler <uw...@thetaphi.de>.
A possibility to get something that does not depend on Date/Time and so on:
ManagementFactory.getRuntimeMXBean().getName()

This returns a unique identifier of the running Java VM. On most platforms, this contains the process ID. Maybe we should use this as an additional information for the test file name. As it's an String and may contain incompatible chars for a filename, maybe use the hashCode() of the returned String.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Thursday, April 29, 2010 12:01 PM
> To: dev@lucene.apache.org
> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> 
> On Wed, Apr 28, 2010 at 1:52 PM, Shai Erera <se...@gmail.com> wrote:
> > I use 1.6.0_18.
> >
> > Maybe seeding w/ System.nanoTime() will help, but don't you think
> > NativeFSLock should be more robust anyway? Even for the regular lock
> > file this can happen if at the same time delete() is attempted the
> > file is held by another process ...
> 
> Oh definitely -- I think we should do both (fix the random seeding &
> the robustness fixes).
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Apr 28, 2010 at 1:52 PM, Shai Erera <se...@gmail.com> wrote:
> I use 1.6.0_18.
>
> Maybe seeding w/ System.nanoTime() will help, but don't you think
> NativeFSLock should be more robust anyway? Even for the regular lock
> file this can happen if at the same time delete() is attempted the
> file is held by another process ...

Oh definitely -- I think we should do both (fix the random seeding &
the robustness fixes).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
I use 1.6.0_18.

Maybe seeding w/ System.nanoTime() will help, but don't you think
NativeFSLock should be more robust anyway? Even for the regular lock
file this can happen if at the same time delete() is attempted the
file is held by another process ...

Shai

On Wednesday, April 28, 2010, Michael McCandless
<lu...@mikemccandless.com> wrote:
> OK...
>
> But this must mean that "new Random()" selects a bad (easily
> conflicts) seed?  Which JRE are you using?
>
> Maybe it's just using System.currentTimeMills()?  This is what the
> javadocs state for JDK 1.4, but for JDK 1.5 it says it tries to pick
> something unique :)
>
> Maybe we should seed w/ System.nanoTime(), when we create the test lock name?
>
> Mike
>
> On Wed, Apr 28, 2010 at 1:24 PM, Shai Erera <se...@gmail.com> wrote:
>> Mike - I think I'm pretty sure that's what happened. The reason is
>> that even w/ the reported failure, the lock dir is empty when the
>> tests finish and the lock file isn't there. I believe that if the
>> collision was not the case, then I should have seen the test lock file
>> in there?
>>
>> So overall these changes will 99.9% of the time delete the lock file.
>> It's in those (I agree) super rare cases that the rest of the code
>> will be invoked.
>>
>> In addition, I don't have any other explanation to why this sometimes
>> happens, all started after the tests parallelism. And since then it
>> happened too many times ...
>>
>> Shai
>>
>> On Wednesday, April 28, 2010, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> Nice!
>>>
>>> Mike
>>>
>>> On Wed, Apr 28, 2010 at 12:54 PM, Robert Muir <rc...@gmail.com> wrote:
>>>> As far as the build system goes, I implemented the two ideas mentioned
>>>> earlier in this message (not creating a new Formatter for each test, and not
>>>> spawning 26 jvms for each batch)
>>>>
>>>> Jira is down, but if you want to help test you can try a patch here:
>>>> http://pastebin.com/iqwb73H2 (click Raw/Download)
>>>>
>>>> Additionally this cuts 1:20 off the total Solrcene 'ant clean test' for me.
>>>>
>>>> before:
>>>> BUILD SUCCESSFUL
>>>> Total time: 7 minutes 42 seconds
>>>>
>>>> after:
>>>> BUILD SUCCESSFUL
>>>> Total time: 6 minutes 23 seconds
>>>>
>>>> On Wed, Apr 28, 2010 at 12:25 PM, Michael McCandless
>>>> <lu...@mikemccandless.com> wrote:
>>>>>
>>>>> I think this are good changes to NativeFSLockFactory.
>>>>>
>>>>> But: the chances that N JVMs launched at once would conflict on the
>>>>> randomly generated lock file name should be miniscule... though it
>>>>> does depend on how good new Random() is at seeding itself.  Do we
>>>>> really think this explains your exceptions Shai?  (And, if so, even w/
>>>>> these changes, the conflict could still happen?)  Maybe we should
>>>>> explicitly seed it?
>>>>>
>>>>> Mike
>>>>>
>>>>> On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <se...@gmail.com> wrote:
>>>>> > I'd like to summarize the IRC discussion Mark and I had:
>>>>> >
>>>>> > The lock file's existence in the directory should not fail obtain() from
>>>>> > retrieving obtaining a lock. That's the whole difference between Simple
>>>>> > and
>>>>> > Native. So we should make a best-effort to delete it. If the delete
>>>>> > fails on
>>>>> > release(), then ok. On obtain(), we won't return false if the lock
>>>>> > exists,
>>>>> > but attempt to really obtain it and fail appropriately.
>>>>> >
>>>>> > While the previously proposed fix (add "&& path.exists()" to release())
>>>>> > might work most of the times, it will only work "most of the times".
>>>>> > I.e.,
>>>>> > between release() and delete(), an external process, like AntiVirus,
>>>>> > might
>>>>> > lock the file, and delete will fail, but the file will still be there,
>>>>> > and
>>>>> > we'll throw an exception still.
>>>>> >
>>>>> > So, the proposed changes are:
>>>>> > * release() is allowed to fail to delete the lock file.
>>>>> > * obtain() should not return false if the lock file exists - it should
>>>>> > really attempt to obtain it.
>>>>> > * in acquireTestLock(), if after release() is called, the lock file
>>>>> > still
>>>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK...

But this must mean that "new Random()" selects a bad (easily
conflicts) seed?  Which JRE are you using?

Maybe it's just using System.currentTimeMills()?  This is what the
javadocs state for JDK 1.4, but for JDK 1.5 it says it tries to pick
something unique :)

Maybe we should seed w/ System.nanoTime(), when we create the test lock name?

Mike

On Wed, Apr 28, 2010 at 1:24 PM, Shai Erera <se...@gmail.com> wrote:
> Mike - I think I'm pretty sure that's what happened. The reason is
> that even w/ the reported failure, the lock dir is empty when the
> tests finish and the lock file isn't there. I believe that if the
> collision was not the case, then I should have seen the test lock file
> in there?
>
> So overall these changes will 99.9% of the time delete the lock file.
> It's in those (I agree) super rare cases that the rest of the code
> will be invoked.
>
> In addition, I don't have any other explanation to why this sometimes
> happens, all started after the tests parallelism. And since then it
> happened too many times ...
>
> Shai
>
> On Wednesday, April 28, 2010, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Nice!
>>
>> Mike
>>
>> On Wed, Apr 28, 2010 at 12:54 PM, Robert Muir <rc...@gmail.com> wrote:
>>> As far as the build system goes, I implemented the two ideas mentioned
>>> earlier in this message (not creating a new Formatter for each test, and not
>>> spawning 26 jvms for each batch)
>>>
>>> Jira is down, but if you want to help test you can try a patch here:
>>> http://pastebin.com/iqwb73H2 (click Raw/Download)
>>>
>>> Additionally this cuts 1:20 off the total Solrcene 'ant clean test' for me.
>>>
>>> before:
>>> BUILD SUCCESSFUL
>>> Total time: 7 minutes 42 seconds
>>>
>>> after:
>>> BUILD SUCCESSFUL
>>> Total time: 6 minutes 23 seconds
>>>
>>> On Wed, Apr 28, 2010 at 12:25 PM, Michael McCandless
>>> <lu...@mikemccandless.com> wrote:
>>>>
>>>> I think this are good changes to NativeFSLockFactory.
>>>>
>>>> But: the chances that N JVMs launched at once would conflict on the
>>>> randomly generated lock file name should be miniscule... though it
>>>> does depend on how good new Random() is at seeding itself.  Do we
>>>> really think this explains your exceptions Shai?  (And, if so, even w/
>>>> these changes, the conflict could still happen?)  Maybe we should
>>>> explicitly seed it?
>>>>
>>>> Mike
>>>>
>>>> On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <se...@gmail.com> wrote:
>>>> > I'd like to summarize the IRC discussion Mark and I had:
>>>> >
>>>> > The lock file's existence in the directory should not fail obtain() from
>>>> > retrieving obtaining a lock. That's the whole difference between Simple
>>>> > and
>>>> > Native. So we should make a best-effort to delete it. If the delete
>>>> > fails on
>>>> > release(), then ok. On obtain(), we won't return false if the lock
>>>> > exists,
>>>> > but attempt to really obtain it and fail appropriately.
>>>> >
>>>> > While the previously proposed fix (add "&& path.exists()" to release())
>>>> > might work most of the times, it will only work "most of the times".
>>>> > I.e.,
>>>> > between release() and delete(), an external process, like AntiVirus,
>>>> > might
>>>> > lock the file, and delete will fail, but the file will still be there,
>>>> > and
>>>> > we'll throw an exception still.
>>>> >
>>>> > So, the proposed changes are:
>>>> > * release() is allowed to fail to delete the lock file.
>>>> > * obtain() should not return false if the lock file exists - it should
>>>> > really attempt to obtain it.
>>>> > * in acquireTestLock(), if after release() is called, the lock file
>>>> > still
>>>> > exists, we'll retry the delete few ms later, and if that fails, call
>>>> > deleteOnExit.
>>>> >
>>>> > How's that sound?
>>>> >
>>>> > Shai
>>>> >
>>>> > On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <ma...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> I don't follow. The simple lock impl must delete the file, but the
>>>> >> native
>>>> >> impl should not have to. The file has nothing to do with the lock - its
>>>> >> just
>>>> >> the medium to ask for and release the lock. If it already exists, you
>>>> >> don't
>>>> >> have to create it - you can just use it to try and get a native lock.
>>>> >> Likewise, it doesn't need to be removed to release a native lock - you
>>>> >> simply call unlock on it.
>>>> >>
>>>> >> On 4/28/10 10:34 AM, Shai Erera wrote:
>>>> >>>
>>>> >>> But this method is called also for the regular lock file - if
>>>> >>> release()
>>>> >>> won't delete the file, then the next l.obtain() will return false.
>>>> >>>
>>>> >>> Shai
>>>> >>>
>>>> >>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <markrmiller@gmail.com
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
Mike - I think I'm pretty sure that's what happened. The reason is
that even w/ the reported failure, the lock dir is empty when the
tests finish and the lock file isn't there. I believe that if the
collision was not the case, then I should have seen the test lock file
in there?

So overall these changes will 99.9% of the time delete the lock file.
It's in those (I agree) super rare cases that the rest of the code
will be invoked.

In addition, I don't have any other explanation to why this sometimes
happens, all started after the tests parallelism. And since then it
happened too many times ...

Shai

On Wednesday, April 28, 2010, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Nice!
>
> Mike
>
> On Wed, Apr 28, 2010 at 12:54 PM, Robert Muir <rc...@gmail.com> wrote:
>> As far as the build system goes, I implemented the two ideas mentioned
>> earlier in this message (not creating a new Formatter for each test, and not
>> spawning 26 jvms for each batch)
>>
>> Jira is down, but if you want to help test you can try a patch here:
>> http://pastebin.com/iqwb73H2 (click Raw/Download)
>>
>> Additionally this cuts 1:20 off the total Solrcene 'ant clean test' for me.
>>
>> before:
>> BUILD SUCCESSFUL
>> Total time: 7 minutes 42 seconds
>>
>> after:
>> BUILD SUCCESSFUL
>> Total time: 6 minutes 23 seconds
>>
>> On Wed, Apr 28, 2010 at 12:25 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>>
>>> I think this are good changes to NativeFSLockFactory.
>>>
>>> But: the chances that N JVMs launched at once would conflict on the
>>> randomly generated lock file name should be miniscule... though it
>>> does depend on how good new Random() is at seeding itself.  Do we
>>> really think this explains your exceptions Shai?  (And, if so, even w/
>>> these changes, the conflict could still happen?)  Maybe we should
>>> explicitly seed it?
>>>
>>> Mike
>>>
>>> On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <se...@gmail.com> wrote:
>>> > I'd like to summarize the IRC discussion Mark and I had:
>>> >
>>> > The lock file's existence in the directory should not fail obtain() from
>>> > retrieving obtaining a lock. That's the whole difference between Simple
>>> > and
>>> > Native. So we should make a best-effort to delete it. If the delete
>>> > fails on
>>> > release(), then ok. On obtain(), we won't return false if the lock
>>> > exists,
>>> > but attempt to really obtain it and fail appropriately.
>>> >
>>> > While the previously proposed fix (add "&& path.exists()" to release())
>>> > might work most of the times, it will only work "most of the times".
>>> > I.e.,
>>> > between release() and delete(), an external process, like AntiVirus,
>>> > might
>>> > lock the file, and delete will fail, but the file will still be there,
>>> > and
>>> > we'll throw an exception still.
>>> >
>>> > So, the proposed changes are:
>>> > * release() is allowed to fail to delete the lock file.
>>> > * obtain() should not return false if the lock file exists - it should
>>> > really attempt to obtain it.
>>> > * in acquireTestLock(), if after release() is called, the lock file
>>> > still
>>> > exists, we'll retry the delete few ms later, and if that fails, call
>>> > deleteOnExit.
>>> >
>>> > How's that sound?
>>> >
>>> > Shai
>>> >
>>> > On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <ma...@gmail.com>
>>> > wrote:
>>> >>
>>> >> I don't follow. The simple lock impl must delete the file, but the
>>> >> native
>>> >> impl should not have to. The file has nothing to do with the lock - its
>>> >> just
>>> >> the medium to ask for and release the lock. If it already exists, you
>>> >> don't
>>> >> have to create it - you can just use it to try and get a native lock.
>>> >> Likewise, it doesn't need to be removed to release a native lock - you
>>> >> simply call unlock on it.
>>> >>
>>> >> On 4/28/10 10:34 AM, Shai Erera wrote:
>>> >>>
>>> >>> But this method is called also for the regular lock file - if
>>> >>> release()
>>> >>> won't delete the file, then the next l.obtain() will return false.
>>> >>>
>>> >>> Shai
>>> >>>
>>> >>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <markrmiller@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Michael McCandless <lu...@mikemccandless.com>.
Nice!

Mike

On Wed, Apr 28, 2010 at 12:54 PM, Robert Muir <rc...@gmail.com> wrote:
> As far as the build system goes, I implemented the two ideas mentioned
> earlier in this message (not creating a new Formatter for each test, and not
> spawning 26 jvms for each batch)
>
> Jira is down, but if you want to help test you can try a patch here:
> http://pastebin.com/iqwb73H2 (click Raw/Download)
>
> Additionally this cuts 1:20 off the total Solrcene 'ant clean test' for me.
>
> before:
> BUILD SUCCESSFUL
> Total time: 7 minutes 42 seconds
>
> after:
> BUILD SUCCESSFUL
> Total time: 6 minutes 23 seconds
>
> On Wed, Apr 28, 2010 at 12:25 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> I think this are good changes to NativeFSLockFactory.
>>
>> But: the chances that N JVMs launched at once would conflict on the
>> randomly generated lock file name should be miniscule... though it
>> does depend on how good new Random() is at seeding itself.  Do we
>> really think this explains your exceptions Shai?  (And, if so, even w/
>> these changes, the conflict could still happen?)  Maybe we should
>> explicitly seed it?
>>
>> Mike
>>
>> On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <se...@gmail.com> wrote:
>> > I'd like to summarize the IRC discussion Mark and I had:
>> >
>> > The lock file's existence in the directory should not fail obtain() from
>> > retrieving obtaining a lock. That's the whole difference between Simple
>> > and
>> > Native. So we should make a best-effort to delete it. If the delete
>> > fails on
>> > release(), then ok. On obtain(), we won't return false if the lock
>> > exists,
>> > but attempt to really obtain it and fail appropriately.
>> >
>> > While the previously proposed fix (add "&& path.exists()" to release())
>> > might work most of the times, it will only work "most of the times".
>> > I.e.,
>> > between release() and delete(), an external process, like AntiVirus,
>> > might
>> > lock the file, and delete will fail, but the file will still be there,
>> > and
>> > we'll throw an exception still.
>> >
>> > So, the proposed changes are:
>> > * release() is allowed to fail to delete the lock file.
>> > * obtain() should not return false if the lock file exists - it should
>> > really attempt to obtain it.
>> > * in acquireTestLock(), if after release() is called, the lock file
>> > still
>> > exists, we'll retry the delete few ms later, and if that fails, call
>> > deleteOnExit.
>> >
>> > How's that sound?
>> >
>> > Shai
>> >
>> > On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <ma...@gmail.com>
>> > wrote:
>> >>
>> >> I don't follow. The simple lock impl must delete the file, but the
>> >> native
>> >> impl should not have to. The file has nothing to do with the lock - its
>> >> just
>> >> the medium to ask for and release the lock. If it already exists, you
>> >> don't
>> >> have to create it - you can just use it to try and get a native lock.
>> >> Likewise, it doesn't need to be removed to release a native lock - you
>> >> simply call unlock on it.
>> >>
>> >> On 4/28/10 10:34 AM, Shai Erera wrote:
>> >>>
>> >>> But this method is called also for the regular lock file - if
>> >>> release()
>> >>> won't delete the file, then the next l.obtain() will return false.
>> >>>
>> >>> Shai
>> >>>
>> >>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <markrmiller@gmail.com
>> >>> <ma...@gmail.com>> wrote:
>> >>>
>> >>>    It shouldn't need too though - the native lock file is simply a
>> >>>    dummy file to apply the lock too - shouldn't matter if it already
>> >>>    exists or not (though it seems to in the current code).
>> >>>
>> >>>
>> >>>    On 4/28/10 10:22 AM, Shai Erera wrote:
>> >>>
>> >>>        If you won't delete the file, the next obtain will fail?
>> >>>
>> >>>        On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller
>> >>>        <markrmiller@gmail.com <ma...@gmail.com>
>> >>>        <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
>> >>>        wrote:
>> >>>
>> >>>            I wonder if not being able to delete the file should throw
>> >>> a
>> >>>        release
>> >>>            failed exception at all. You have actually released the
>> >>>        native lock
>> >>>            - you where just not able to clean up - but that seems more
>> >>>        like a
>> >>>            warning situation than a failure.
>> >>>
>> >>>
>> >>>            --
>> >>>            - Mark
>> >>>
>> >>>        http://www.lucidimagination.com
>> >>>
>> >>>            On 4/28/10 9:53 AM, Shai Erera wrote:
>> >>>
>> >>>                I've hit it again and here's the full stacktrace (at
>> >>> least
>> >>>                what's printed):
>> >>>
>> >>>                     [junit] Exception in thread "main"
>> >>>        java.lang.RuntimeException:
>> >>>                Failed to acquire random test lock; please verify
>> >>>        filesystem for
>> >>>                lock
>> >>>                directory
>> >>>        'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
>> >>>                supports
>> >>>                locking
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>> >>>                     [junit]     at
>> >>>                java.lang.J9VMInternals.newInstanceImpl(Native Method)
>> >>>                     [junit]     at
>> >>>        java.lang.Class.newInstance(Class.java:1325)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>> >>>                     [junit] Caused by:
>> >>>                org.apache.lucene.store.LockReleaseFailedException:
>> >>>        failed to delete
>> >>>
>> >>>
>> >>>
>> >>>  C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>> >>>                     [junit]     ... 9 more
>> >>>
>> >>>                The exception is thrown from NativeFSLock.release() b/c
>> >>>        it fails to
>> >>>                delete the lock file. I think I know what the problem
>> >>> is
>> >>>        - and
>> >>>                it must
>> >>>                be related to the large number of JVMs that are created
>> >>>        w/ the
>> >>>                parallel
>> >>>                tests:
>> >>>                * Suppose that JVM1 draws the number '1' for the test
>> >>>        lock file - it
>> >>>                thus creates lock1.
>> >>>                * Now suppose that JVM2 draws the same number,
>> >>> magically
>> >>>        somehow
>> >>>                - it
>> >>>                thus creates lock1 as well.
>> >>>                * The code of acquireTestLock in NativeFSLockFactory
>> >>>        looks like
>> >>>                this:
>> >>>                     Lock l = makeLock(randomLockName);
>> >>>                     try {
>> >>>                       l.obtain();
>> >>>                       l.release();
>> >>>                --> both will create the same test Lock file. Then
>> >>>        l.obtain()
>> >>>                probably
>> >>>                returns false for one of them, but it's not checked.
>> >>>                * Then in release there are a couple of things to note:
>> >>>                1) the method is synced on the instance, which does not
>> >>>        affect
>> >>>                the two JVMs.
>> >>>                2) suppose that both JVMs pass through the if
>> >>> (exists())
>> >>>        check. Then
>> >>>                JVM1 releases the lock, and deletes the file.
>> >>>                3) Now JVM2 kicks in, calls lock.release() which has no
>> >>>        effect
>> >>>                (from the
>> >>>                jdoc: "If this lock object is invalid then invoking
>> >>> this
>> >>>        method
>> >>>                has no
>> >>>                effect." ). Then when it comes to path.delete(), the
>> >>>        file isn't
>> >>>                there,
>> >>>                the method returns false and thus an exception is
>> >>> thrown
>> >>> ...
>> >>>
>> >>>                This situation is extremely unlikely to happen, but
>> >>>        still, it
>> >>>                happens on
>> >>>                my machine quite frequently since the parallel tests.
>> >>> I'm
>> >>>                thinking that
>> >>>                acquireTestLock should be less strict, but perhaps we
>> >>>        can fix it
>> >>>                if we
>> >>>                replace the line:
>> >>>                      if (!path.delete()) (line 310)
>> >>>                with this
>> >>>                      if (!path.delete() && path.exists())
>> >>>
>> >>>                I.e., if the lock file fails to delete but is still
>> >>>        there, throw the
>> >>>                exception ...
>> >>>
>> >>>                What do you think?
>> >>>
>> >>>                Shai
>> >>>
>> >>>                On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir
>> >>>        <rcmuir@gmail.com <ma...@gmail.com>
>> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>
>> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>>> wrote:
>> >>>
>> >>>
>> >>>
>> >>>                    On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
>> >>>        <vajda@osafoundation.org <ma...@osafoundation.org>
>> >>>        <mailto:vajda@osafoundation.org
>> >>> <ma...@osafoundation.org>>
>> >>>        <mailto:vajda@osafoundation.org
>> >>> <ma...@osafoundation.org>
>> >>>
>> >>>        <mailto:vajda@osafoundation.org
>> >>>        <ma...@osafoundation.org>>>> wrote:
>> >>>
>> >>>
>> >>>                        I've had similar random failures on Mac OS X
>> >>>        10.6. They
>> >>>                started
>> >>>                        happening recently, about two weeks ago.
>> >>>
>> >>>
>> >>>                    Thats just too randomly close to when i last worked
>> >>>        on this
>> >>>                build
>> >>>                    system stuff for LUCENE-1709... perhaps I made it
>> >>> worse
>> >>>                instead of
>> >>>                    better.
>> >>>
>> >>>                    --
>> >>>                    Robert Muir
>> >>>        rcmuir@gmail.com <ma...@gmail.com>
>> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>
>> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>  ---------------------------------------------------------------------
>> >>>            To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>>        <ma...@lucene.apache.org>
>> >>>        <mailto:dev-unsubscribe@lucene.apache.org
>> >>>        <ma...@lucene.apache.org>>
>> >>>
>> >>>            For additional commands, e-mail: dev-help@lucene.apache.org
>> >>>        <ma...@lucene.apache.org>
>> >>>        <mailto:dev-help@lucene.apache.org
>> >>>        <ma...@lucene.apache.org>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>    --
>> >>>    - Mark
>> >>>
>> >>>    http://www.lucidimagination.com
>> >>>
>> >>>
>> >>>  ---------------------------------------------------------------------
>> >>>    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>>    <ma...@lucene.apache.org>
>> >>>    For additional commands, e-mail: dev-help@lucene.apache.org
>> >>>    <ma...@lucene.apache.org>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> - Mark
>> >>
>> >> http://www.lucidimagination.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Robert Muir <rc...@gmail.com>.
As far as the build system goes, I implemented the two ideas mentioned
earlier in this message (not creating a new Formatter for each test, and not
spawning 26 jvms for each batch)

Jira is down, but if you want to help test you can try a patch here:
http://pastebin.com/iqwb73H2 (click Raw/Download)

Additionally this cuts 1:20 off the total Solrcene 'ant clean test' for me.

before:
BUILD SUCCESSFUL
Total time: 7 minutes 42 seconds

after:
BUILD SUCCESSFUL
Total time: 6 minutes 23 seconds

On Wed, Apr 28, 2010 at 12:25 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I think this are good changes to NativeFSLockFactory.
>
> But: the chances that N JVMs launched at once would conflict on the
> randomly generated lock file name should be miniscule... though it
> does depend on how good new Random() is at seeding itself.  Do we
> really think this explains your exceptions Shai?  (And, if so, even w/
> these changes, the conflict could still happen?)  Maybe we should
> explicitly seed it?
>
> Mike
>
> On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <se...@gmail.com> wrote:
> > I'd like to summarize the IRC discussion Mark and I had:
> >
> > The lock file's existence in the directory should not fail obtain() from
> > retrieving obtaining a lock. That's the whole difference between Simple
> and
> > Native. So we should make a best-effort to delete it. If the delete fails
> on
> > release(), then ok. On obtain(), we won't return false if the lock
> exists,
> > but attempt to really obtain it and fail appropriately.
> >
> > While the previously proposed fix (add "&& path.exists()" to release())
> > might work most of the times, it will only work "most of the times".
> I.e.,
> > between release() and delete(), an external process, like AntiVirus,
> might
> > lock the file, and delete will fail, but the file will still be there,
> and
> > we'll throw an exception still.
> >
> > So, the proposed changes are:
> > * release() is allowed to fail to delete the lock file.
> > * obtain() should not return false if the lock file exists - it should
> > really attempt to obtain it.
> > * in acquireTestLock(), if after release() is called, the lock file still
> > exists, we'll retry the delete few ms later, and if that fails, call
> > deleteOnExit.
> >
> > How's that sound?
> >
> > Shai
> >
> > On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <ma...@gmail.com>
> wrote:
> >>
> >> I don't follow. The simple lock impl must delete the file, but the
> native
> >> impl should not have to. The file has nothing to do with the lock - its
> just
> >> the medium to ask for and release the lock. If it already exists, you
> don't
> >> have to create it - you can just use it to try and get a native lock.
> >> Likewise, it doesn't need to be removed to release a native lock - you
> >> simply call unlock on it.
> >>
> >> On 4/28/10 10:34 AM, Shai Erera wrote:
> >>>
> >>> But this method is called also for the regular lock file - if release()
> >>> won't delete the file, then the next l.obtain() will return false.
> >>>
> >>> Shai
> >>>
> >>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <markrmiller@gmail.com
> >>> <ma...@gmail.com>> wrote:
> >>>
> >>>    It shouldn't need too though - the native lock file is simply a
> >>>    dummy file to apply the lock too - shouldn't matter if it already
> >>>    exists or not (though it seems to in the current code).
> >>>
> >>>
> >>>    On 4/28/10 10:22 AM, Shai Erera wrote:
> >>>
> >>>        If you won't delete the file, the next obtain will fail?
> >>>
> >>>        On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller
> >>>        <markrmiller@gmail.com <ma...@gmail.com>
> >>>        <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
> >>>        wrote:
> >>>
> >>>            I wonder if not being able to delete the file should throw a
> >>>        release
> >>>            failed exception at all. You have actually released the
> >>>        native lock
> >>>            - you where just not able to clean up - but that seems more
> >>>        like a
> >>>            warning situation than a failure.
> >>>
> >>>
> >>>            --
> >>>            - Mark
> >>>
> >>>        http://www.lucidimagination.com
> >>>
> >>>            On 4/28/10 9:53 AM, Shai Erera wrote:
> >>>
> >>>                I've hit it again and here's the full stacktrace (at
> least
> >>>                what's printed):
> >>>
> >>>                     [junit] Exception in thread "main"
> >>>        java.lang.RuntimeException:
> >>>                Failed to acquire random test lock; please verify
> >>>        filesystem for
> >>>                lock
> >>>                directory
> >>>        'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
> >>>                supports
> >>>                locking
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
> >>>                     [junit]     at
> >>>                java.lang.J9VMInternals.newInstanceImpl(Native Method)
> >>>                     [junit]     at
> >>>        java.lang.Class.newInstance(Class.java:1325)
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
> >>>                     [junit] Caused by:
> >>>                org.apache.lucene.store.LockReleaseFailedException:
> >>>        failed to delete
> >>>
> >>>
> >>>
>  C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
> >>>                     [junit]     at
> >>>
> >>>
> >>>
>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
> >>>                     [junit]     ... 9 more
> >>>
> >>>                The exception is thrown from NativeFSLock.release() b/c
> >>>        it fails to
> >>>                delete the lock file. I think I know what the problem is
> >>>        - and
> >>>                it must
> >>>                be related to the large number of JVMs that are created
> >>>        w/ the
> >>>                parallel
> >>>                tests:
> >>>                * Suppose that JVM1 draws the number '1' for the test
> >>>        lock file - it
> >>>                thus creates lock1.
> >>>                * Now suppose that JVM2 draws the same number, magically
> >>>        somehow
> >>>                - it
> >>>                thus creates lock1 as well.
> >>>                * The code of acquireTestLock in NativeFSLockFactory
> >>>        looks like
> >>>                this:
> >>>                     Lock l = makeLock(randomLockName);
> >>>                     try {
> >>>                       l.obtain();
> >>>                       l.release();
> >>>                --> both will create the same test Lock file. Then
> >>>        l.obtain()
> >>>                probably
> >>>                returns false for one of them, but it's not checked.
> >>>                * Then in release there are a couple of things to note:
> >>>                1) the method is synced on the instance, which does not
> >>>        affect
> >>>                the two JVMs.
> >>>                2) suppose that both JVMs pass through the if (exists())
> >>>        check. Then
> >>>                JVM1 releases the lock, and deletes the file.
> >>>                3) Now JVM2 kicks in, calls lock.release() which has no
> >>>        effect
> >>>                (from the
> >>>                jdoc: "If this lock object is invalid then invoking this
> >>>        method
> >>>                has no
> >>>                effect." ). Then when it comes to path.delete(), the
> >>>        file isn't
> >>>                there,
> >>>                the method returns false and thus an exception is thrown
> >>> ...
> >>>
> >>>                This situation is extremely unlikely to happen, but
> >>>        still, it
> >>>                happens on
> >>>                my machine quite frequently since the parallel tests.
> I'm
> >>>                thinking that
> >>>                acquireTestLock should be less strict, but perhaps we
> >>>        can fix it
> >>>                if we
> >>>                replace the line:
> >>>                      if (!path.delete()) (line 310)
> >>>                with this
> >>>                      if (!path.delete() && path.exists())
> >>>
> >>>                I.e., if the lock file fails to delete but is still
> >>>        there, throw the
> >>>                exception ...
> >>>
> >>>                What do you think?
> >>>
> >>>                Shai
> >>>
> >>>                On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir
> >>>        <rcmuir@gmail.com <ma...@gmail.com>
> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>
> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>>> wrote:
> >>>
> >>>
> >>>
> >>>                    On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
> >>>        <vajda@osafoundation.org <ma...@osafoundation.org>
> >>>        <mailto:vajda@osafoundation.org <mailto:vajda@osafoundation.org
> >>
> >>>        <mailto:vajda@osafoundation.org <mailto:vajda@osafoundation.org
> >
> >>>
> >>>        <mailto:vajda@osafoundation.org
> >>>        <ma...@osafoundation.org>>>> wrote:
> >>>
> >>>
> >>>                        I've had similar random failures on Mac OS X
> >>>        10.6. They
> >>>                started
> >>>                        happening recently, about two weeks ago.
> >>>
> >>>
> >>>                    Thats just too randomly close to when i last worked
> >>>        on this
> >>>                build
> >>>                    system stuff for LUCENE-1709... perhaps I made it
> >>> worse
> >>>                instead of
> >>>                    better.
> >>>
> >>>                    --
> >>>                    Robert Muir
> >>>        rcmuir@gmail.com <ma...@gmail.com>
> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>
> >>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>  ---------------------------------------------------------------------
> >>>            To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>>        <ma...@lucene.apache.org>
> >>>        <mailto:dev-unsubscribe@lucene.apache.org
> >>>        <ma...@lucene.apache.org>>
> >>>
> >>>            For additional commands, e-mail: dev-help@lucene.apache.org
> >>>        <ma...@lucene.apache.org>
> >>>        <mailto:dev-help@lucene.apache.org
> >>>        <ma...@lucene.apache.org>>
> >>>
> >>>
> >>>
> >>>
> >>>    --
> >>>    - Mark
> >>>
> >>>    http://www.lucidimagination.com
> >>>
> >>>
>  ---------------------------------------------------------------------
> >>>    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>>    <ma...@lucene.apache.org>
> >>>    For additional commands, e-mail: dev-help@lucene.apache.org
> >>>    <ma...@lucene.apache.org>
> >>>
> >>>
> >>
> >>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think this are good changes to NativeFSLockFactory.

But: the chances that N JVMs launched at once would conflict on the
randomly generated lock file name should be miniscule... though it
does depend on how good new Random() is at seeding itself.  Do we
really think this explains your exceptions Shai?  (And, if so, even w/
these changes, the conflict could still happen?)  Maybe we should
explicitly seed it?

Mike

On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <se...@gmail.com> wrote:
> I'd like to summarize the IRC discussion Mark and I had:
>
> The lock file's existence in the directory should not fail obtain() from
> retrieving obtaining a lock. That's the whole difference between Simple and
> Native. So we should make a best-effort to delete it. If the delete fails on
> release(), then ok. On obtain(), we won't return false if the lock exists,
> but attempt to really obtain it and fail appropriately.
>
> While the previously proposed fix (add "&& path.exists()" to release())
> might work most of the times, it will only work "most of the times". I.e.,
> between release() and delete(), an external process, like AntiVirus, might
> lock the file, and delete will fail, but the file will still be there, and
> we'll throw an exception still.
>
> So, the proposed changes are:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should
> really attempt to obtain it.
> * in acquireTestLock(), if after release() is called, the lock file still
> exists, we'll retry the delete few ms later, and if that fails, call
> deleteOnExit.
>
> How's that sound?
>
> Shai
>
> On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <ma...@gmail.com> wrote:
>>
>> I don't follow. The simple lock impl must delete the file, but the native
>> impl should not have to. The file has nothing to do with the lock - its just
>> the medium to ask for and release the lock. If it already exists, you don't
>> have to create it - you can just use it to try and get a native lock.
>> Likewise, it doesn't need to be removed to release a native lock - you
>> simply call unlock on it.
>>
>> On 4/28/10 10:34 AM, Shai Erera wrote:
>>>
>>> But this method is called also for the regular lock file - if release()
>>> won't delete the file, then the next l.obtain() will return false.
>>>
>>> Shai
>>>
>>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <markrmiller@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>    It shouldn't need too though - the native lock file is simply a
>>>    dummy file to apply the lock too - shouldn't matter if it already
>>>    exists or not (though it seems to in the current code).
>>>
>>>
>>>    On 4/28/10 10:22 AM, Shai Erera wrote:
>>>
>>>        If you won't delete the file, the next obtain will fail?
>>>
>>>        On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller
>>>        <markrmiller@gmail.com <ma...@gmail.com>
>>>        <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
>>>        wrote:
>>>
>>>            I wonder if not being able to delete the file should throw a
>>>        release
>>>            failed exception at all. You have actually released the
>>>        native lock
>>>            - you where just not able to clean up - but that seems more
>>>        like a
>>>            warning situation than a failure.
>>>
>>>
>>>            --
>>>            - Mark
>>>
>>>        http://www.lucidimagination.com
>>>
>>>            On 4/28/10 9:53 AM, Shai Erera wrote:
>>>
>>>                I've hit it again and here's the full stacktrace (at least
>>>                what's printed):
>>>
>>>                     [junit] Exception in thread "main"
>>>        java.lang.RuntimeException:
>>>                Failed to acquire random test lock; please verify
>>>        filesystem for
>>>                lock
>>>                directory
>>>        'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
>>>                supports
>>>                locking
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>>>                     [junit]     at
>>>                java.lang.J9VMInternals.newInstanceImpl(Native Method)
>>>                     [junit]     at
>>>        java.lang.Class.newInstance(Class.java:1325)
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>>>                     [junit] Caused by:
>>>                org.apache.lucene.store.LockReleaseFailedException:
>>>        failed to delete
>>>
>>>
>>>  C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>>>                     [junit]     at
>>>
>>>
>>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>>>                     [junit]     ... 9 more
>>>
>>>                The exception is thrown from NativeFSLock.release() b/c
>>>        it fails to
>>>                delete the lock file. I think I know what the problem is
>>>        - and
>>>                it must
>>>                be related to the large number of JVMs that are created
>>>        w/ the
>>>                parallel
>>>                tests:
>>>                * Suppose that JVM1 draws the number '1' for the test
>>>        lock file - it
>>>                thus creates lock1.
>>>                * Now suppose that JVM2 draws the same number, magically
>>>        somehow
>>>                - it
>>>                thus creates lock1 as well.
>>>                * The code of acquireTestLock in NativeFSLockFactory
>>>        looks like
>>>                this:
>>>                     Lock l = makeLock(randomLockName);
>>>                     try {
>>>                       l.obtain();
>>>                       l.release();
>>>                --> both will create the same test Lock file. Then
>>>        l.obtain()
>>>                probably
>>>                returns false for one of them, but it's not checked.
>>>                * Then in release there are a couple of things to note:
>>>                1) the method is synced on the instance, which does not
>>>        affect
>>>                the two JVMs.
>>>                2) suppose that both JVMs pass through the if (exists())
>>>        check. Then
>>>                JVM1 releases the lock, and deletes the file.
>>>                3) Now JVM2 kicks in, calls lock.release() which has no
>>>        effect
>>>                (from the
>>>                jdoc: "If this lock object is invalid then invoking this
>>>        method
>>>                has no
>>>                effect." ). Then when it comes to path.delete(), the
>>>        file isn't
>>>                there,
>>>                the method returns false and thus an exception is thrown
>>> ...
>>>
>>>                This situation is extremely unlikely to happen, but
>>>        still, it
>>>                happens on
>>>                my machine quite frequently since the parallel tests. I'm
>>>                thinking that
>>>                acquireTestLock should be less strict, but perhaps we
>>>        can fix it
>>>                if we
>>>                replace the line:
>>>                      if (!path.delete()) (line 310)
>>>                with this
>>>                      if (!path.delete() && path.exists())
>>>
>>>                I.e., if the lock file fails to delete but is still
>>>        there, throw the
>>>                exception ...
>>>
>>>                What do you think?
>>>
>>>                Shai
>>>
>>>                On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir
>>>        <rcmuir@gmail.com <ma...@gmail.com>
>>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>
>>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>>> wrote:
>>>
>>>
>>>
>>>                    On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
>>>        <vajda@osafoundation.org <ma...@osafoundation.org>
>>>        <mailto:vajda@osafoundation.org <ma...@osafoundation.org>>
>>>        <mailto:vajda@osafoundation.org <ma...@osafoundation.org>
>>>
>>>        <mailto:vajda@osafoundation.org
>>>        <ma...@osafoundation.org>>>> wrote:
>>>
>>>
>>>                        I've had similar random failures on Mac OS X
>>>        10.6. They
>>>                started
>>>                        happening recently, about two weeks ago.
>>>
>>>
>>>                    Thats just too randomly close to when i last worked
>>>        on this
>>>                build
>>>                    system stuff for LUCENE-1709... perhaps I made it
>>> worse
>>>                instead of
>>>                    better.
>>>
>>>                    --
>>>                    Robert Muir
>>>        rcmuir@gmail.com <ma...@gmail.com>
>>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>
>>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>  ---------------------------------------------------------------------
>>>            To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>        <ma...@lucene.apache.org>
>>>        <mailto:dev-unsubscribe@lucene.apache.org
>>>        <ma...@lucene.apache.org>>
>>>
>>>            For additional commands, e-mail: dev-help@lucene.apache.org
>>>        <ma...@lucene.apache.org>
>>>        <mailto:dev-help@lucene.apache.org
>>>        <ma...@lucene.apache.org>>
>>>
>>>
>>>
>>>
>>>    --
>>>    - Mark
>>>
>>>    http://www.lucidimagination.com
>>>
>>>    ---------------------------------------------------------------------
>>>    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>    <ma...@lucene.apache.org>
>>>    For additional commands, e-mail: dev-help@lucene.apache.org
>>>    <ma...@lucene.apache.org>
>>>
>>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
I'd like to summarize the IRC discussion Mark and I had:

The lock file's existence in the directory should not fail obtain() from
retrieving obtaining a lock. That's the whole difference between Simple and
Native. So we should make a best-effort to delete it. If the delete fails on
release(), then ok. On obtain(), we won't return false if the lock exists,
but attempt to really obtain it and fail appropriately.

While the previously proposed fix (add "&& path.exists()" to release())
might work most of the times, it will only work "most of the times". I.e.,
between release() and delete(), an external process, like AntiVirus, might
lock the file, and delete will fail, but the file will still be there, and
we'll throw an exception still.

So, the proposed changes are:
* release() is allowed to fail to delete the lock file.
* obtain() should not return false if the lock file exists - it should
really attempt to obtain it.
* in acquireTestLock(), if after release() is called, the lock file still
exists, we'll retry the delete few ms later, and if that fails, call
deleteOnExit.

How's that sound?

Shai

On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <ma...@gmail.com> wrote:

> I don't follow. The simple lock impl must delete the file, but the native
> impl should not have to. The file has nothing to do with the lock - its just
> the medium to ask for and release the lock. If it already exists, you don't
> have to create it - you can just use it to try and get a native lock.
> Likewise, it doesn't need to be removed to release a native lock - you
> simply call unlock on it.
>
>
> On 4/28/10 10:34 AM, Shai Erera wrote:
>
>> But this method is called also for the regular lock file - if release()
>> won't delete the file, then the next l.obtain() will return false.
>>
>> Shai
>>
>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <markrmiller@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>    It shouldn't need too though - the native lock file is simply a
>>    dummy file to apply the lock too - shouldn't matter if it already
>>    exists or not (though it seems to in the current code).
>>
>>
>>    On 4/28/10 10:22 AM, Shai Erera wrote:
>>
>>        If you won't delete the file, the next obtain will fail?
>>
>>        On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller
>>        <markrmiller@gmail.com <ma...@gmail.com>
>>        <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
>>
>>        wrote:
>>
>>            I wonder if not being able to delete the file should throw a
>>        release
>>            failed exception at all. You have actually released the
>>        native lock
>>            - you where just not able to clean up - but that seems more
>>        like a
>>            warning situation than a failure.
>>
>>
>>            --
>>            - Mark
>>
>>        http://www.lucidimagination.com
>>
>>            On 4/28/10 9:53 AM, Shai Erera wrote:
>>
>>                I've hit it again and here's the full stacktrace (at least
>>                what's printed):
>>
>>                     [junit] Exception in thread "main"
>>        java.lang.RuntimeException:
>>                Failed to acquire random test lock; please verify
>>        filesystem for
>>                lock
>>                directory
>>        'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
>>                supports
>>                locking
>>                     [junit]     at
>>
>>
>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>>                     [junit]     at
>>
>>
>>  org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>>                     [junit]     at
>>
>>
>>  org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>>                     [junit]     at
>>                java.lang.J9VMInternals.newInstanceImpl(Native Method)
>>                     [junit]     at
>>        java.lang.Class.newInstance(Class.java:1325)
>>                     [junit]     at
>>
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>>                     [junit]     at
>>
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>>                     [junit]     at
>>
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>>                     [junit]     at
>>
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>>                     [junit]     at
>>
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>>                     [junit] Caused by:
>>                org.apache.lucene.store.LockReleaseFailedException:
>>        failed to delete
>>
>>
>>  C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>>                     [junit]     at
>>
>>
>>  org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>>                     [junit]     at
>>
>>
>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>>                     [junit]     ... 9 more
>>
>>                The exception is thrown from NativeFSLock.release() b/c
>>        it fails to
>>                delete the lock file. I think I know what the problem is
>>        - and
>>                it must
>>                be related to the large number of JVMs that are created
>>        w/ the
>>                parallel
>>                tests:
>>                * Suppose that JVM1 draws the number '1' for the test
>>        lock file - it
>>                thus creates lock1.
>>                * Now suppose that JVM2 draws the same number, magically
>>        somehow
>>                - it
>>                thus creates lock1 as well.
>>                * The code of acquireTestLock in NativeFSLockFactory
>>        looks like
>>                this:
>>                     Lock l = makeLock(randomLockName);
>>                     try {
>>                       l.obtain();
>>                       l.release();
>>                --> both will create the same test Lock file. Then
>>        l.obtain()
>>                probably
>>                returns false for one of them, but it's not checked.
>>                * Then in release there are a couple of things to note:
>>                1) the method is synced on the instance, which does not
>>        affect
>>                the two JVMs.
>>                2) suppose that both JVMs pass through the if (exists())
>>        check. Then
>>                JVM1 releases the lock, and deletes the file.
>>                3) Now JVM2 kicks in, calls lock.release() which has no
>>        effect
>>                (from the
>>                jdoc: "If this lock object is invalid then invoking this
>>        method
>>                has no
>>                effect." ). Then when it comes to path.delete(), the
>>        file isn't
>>                there,
>>                the method returns false and thus an exception is thrown
>> ...
>>
>>                This situation is extremely unlikely to happen, but
>>        still, it
>>                happens on
>>                my machine quite frequently since the parallel tests. I'm
>>                thinking that
>>                acquireTestLock should be less strict, but perhaps we
>>        can fix it
>>                if we
>>                replace the line:
>>                      if (!path.delete()) (line 310)
>>                with this
>>                      if (!path.delete() && path.exists())
>>
>>                I.e., if the lock file fails to delete but is still
>>        there, throw the
>>                exception ...
>>
>>                What do you think?
>>
>>                Shai
>>
>>                On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir
>>        <rcmuir@gmail.com <ma...@gmail.com>
>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>
>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>>> wrote:
>>
>>
>>
>>                    On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
>>        <vajda@osafoundation.org <ma...@osafoundation.org>
>>        <mailto:vajda@osafoundation.org <ma...@osafoundation.org>>
>>        <mailto:vajda@osafoundation.org <ma...@osafoundation.org>
>>
>>        <mailto:vajda@osafoundation.org
>>        <ma...@osafoundation.org>>>> wrote:
>>
>>
>>                        I've had similar random failures on Mac OS X
>>        10.6. They
>>                started
>>                        happening recently, about two weeks ago.
>>
>>
>>                    Thats just too randomly close to when i last worked
>>        on this
>>                build
>>                    system stuff for LUCENE-1709... perhaps I made it worse
>>                instead of
>>                    better.
>>
>>                    --
>>                    Robert Muir
>>        rcmuir@gmail.com <ma...@gmail.com>
>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>
>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>  ---------------------------------------------------------------------
>>            To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>        <ma...@lucene.apache.org>
>>        <mailto:dev-unsubscribe@lucene.apache.org
>>        <ma...@lucene.apache.org>>
>>
>>            For additional commands, e-mail: dev-help@lucene.apache.org
>>        <ma...@lucene.apache.org>
>>        <mailto:dev-help@lucene.apache.org
>>        <ma...@lucene.apache.org>>
>>
>>
>>
>>
>>    --
>>    - Mark
>>
>>    http://www.lucidimagination.com
>>
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>    <ma...@lucene.apache.org>
>>    For additional commands, e-mail: dev-help@lucene.apache.org
>>    <ma...@lucene.apache.org>
>>
>>
>>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Mark Miller <ma...@gmail.com>.
I don't follow. The simple lock impl must delete the file, but the 
native impl should not have to. The file has nothing to do with the lock 
- its just the medium to ask for and release the lock. If it already 
exists, you don't have to create it - you can just use it to try and get 
a native lock. Likewise, it doesn't need to be removed to release a 
native lock - you simply call unlock on it.

On 4/28/10 10:34 AM, Shai Erera wrote:
> But this method is called also for the regular lock file - if release()
> won't delete the file, then the next l.obtain() will return false.
>
> Shai
>
> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <markrmiller@gmail.com
> <ma...@gmail.com>> wrote:
>
>     It shouldn't need too though - the native lock file is simply a
>     dummy file to apply the lock too - shouldn't matter if it already
>     exists or not (though it seems to in the current code).
>
>
>     On 4/28/10 10:22 AM, Shai Erera wrote:
>
>         If you won't delete the file, the next obtain will fail?
>
>         On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller
>         <markrmiller@gmail.com <ma...@gmail.com>
>         <mailto:markrmiller@gmail.com <ma...@gmail.com>>>
>         wrote:
>
>             I wonder if not being able to delete the file should throw a
>         release
>             failed exception at all. You have actually released the
>         native lock
>             - you where just not able to clean up - but that seems more
>         like a
>             warning situation than a failure.
>
>
>             --
>             - Mark
>
>         http://www.lucidimagination.com
>
>             On 4/28/10 9:53 AM, Shai Erera wrote:
>
>                 I've hit it again and here's the full stacktrace (at least
>                 what's printed):
>
>                      [junit] Exception in thread "main"
>         java.lang.RuntimeException:
>                 Failed to acquire random test lock; please verify
>         filesystem for
>                 lock
>                 directory
>         'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
>                 supports
>                 locking
>                      [junit]     at
>
>           org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>                      [junit]     at
>
>           org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>                      [junit]     at
>
>           org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>                      [junit]     at
>                 java.lang.J9VMInternals.newInstanceImpl(Native Method)
>                      [junit]     at
>         java.lang.Class.newInstance(Class.java:1325)
>                      [junit]     at
>
>           org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>                      [junit]     at
>
>           org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>                      [junit]     at
>
>           org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>                      [junit]     at
>
>           org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>                      [junit]     at
>
>           org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>                      [junit] Caused by:
>                 org.apache.lucene.store.LockReleaseFailedException:
>         failed to delete
>
>           C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>                      [junit]     at
>
>           org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>                      [junit]     at
>
>           org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>                      [junit]     ... 9 more
>
>                 The exception is thrown from NativeFSLock.release() b/c
>         it fails to
>                 delete the lock file. I think I know what the problem is
>         - and
>                 it must
>                 be related to the large number of JVMs that are created
>         w/ the
>                 parallel
>                 tests:
>                 * Suppose that JVM1 draws the number '1' for the test
>         lock file - it
>                 thus creates lock1.
>                 * Now suppose that JVM2 draws the same number, magically
>         somehow
>                 - it
>                 thus creates lock1 as well.
>                 * The code of acquireTestLock in NativeFSLockFactory
>         looks like
>                 this:
>                      Lock l = makeLock(randomLockName);
>                      try {
>                        l.obtain();
>                        l.release();
>                 --> both will create the same test Lock file. Then
>         l.obtain()
>                 probably
>                 returns false for one of them, but it's not checked.
>                 * Then in release there are a couple of things to note:
>                 1) the method is synced on the instance, which does not
>         affect
>                 the two JVMs.
>                 2) suppose that both JVMs pass through the if (exists())
>         check. Then
>                 JVM1 releases the lock, and deletes the file.
>                 3) Now JVM2 kicks in, calls lock.release() which has no
>         effect
>                 (from the
>                 jdoc: "If this lock object is invalid then invoking this
>         method
>                 has no
>                 effect." ). Then when it comes to path.delete(), the
>         file isn't
>                 there,
>                 the method returns false and thus an exception is thrown ...
>
>                 This situation is extremely unlikely to happen, but
>         still, it
>                 happens on
>                 my machine quite frequently since the parallel tests. I'm
>                 thinking that
>                 acquireTestLock should be less strict, but perhaps we
>         can fix it
>                 if we
>                 replace the line:
>                       if (!path.delete()) (line 310)
>                 with this
>                       if (!path.delete() && path.exists())
>
>                 I.e., if the lock file fails to delete but is still
>         there, throw the
>                 exception ...
>
>                 What do you think?
>
>                 Shai
>
>                 On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir
>         <rcmuir@gmail.com <ma...@gmail.com>
>         <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>         <mailto:rcmuir@gmail.com <ma...@gmail.com>
>         <mailto:rcmuir@gmail.com <ma...@gmail.com>>>> wrote:
>
>
>
>                     On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
>         <vajda@osafoundation.org <ma...@osafoundation.org>
>         <mailto:vajda@osafoundation.org <ma...@osafoundation.org>>
>         <mailto:vajda@osafoundation.org <ma...@osafoundation.org>
>
>         <mailto:vajda@osafoundation.org
>         <ma...@osafoundation.org>>>> wrote:
>
>
>                         I've had similar random failures on Mac OS X
>         10.6. They
>                 started
>                         happening recently, about two weeks ago.
>
>
>                     Thats just too randomly close to when i last worked
>         on this
>                 build
>                     system stuff for LUCENE-1709... perhaps I made it worse
>                 instead of
>                     better.
>
>                     --
>                     Robert Muir
>         rcmuir@gmail.com <ma...@gmail.com>
>         <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>         <mailto:rcmuir@gmail.com <ma...@gmail.com>
>         <mailto:rcmuir@gmail.com <ma...@gmail.com>>>
>
>
>
>
>
>
>
>           ---------------------------------------------------------------------
>             To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>         <ma...@lucene.apache.org>
>         <mailto:dev-unsubscribe@lucene.apache.org
>         <ma...@lucene.apache.org>>
>
>             For additional commands, e-mail: dev-help@lucene.apache.org
>         <ma...@lucene.apache.org>
>         <mailto:dev-help@lucene.apache.org
>         <ma...@lucene.apache.org>>
>
>
>
>
>     --
>     - Mark
>
>     http://www.lucidimagination.com
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Robert Muir <rc...@gmail.com>.
Hello, a possibly related thing: I ran some tests and a new Formatter is
created for each Test Suite.

This is pretty wasteful since we make a lot of lock factories and such, so
one thing to explore is using a singleton here so we arent making thousands
of test locks.

Additionally Shai pointed me at this really interesting post:
http://blog.code-cop.org/2009/09/parallel-junit.html

They first tried using the same primitive letter-based parallelism we are
using, but found that with a custom divisor (instead of blindly, wastefully
spawning 26 unbalanced jvms for each letter in each batch) that it cut the
overall test time in half again.

So some improvements like this might be promising for the parallel tests.

On Wed, Apr 28, 2010 at 10:34 AM, Shai Erera <se...@gmail.com> wrote:

> But this method is called also for the regular lock file - if release()
> won't delete the file, then the next l.obtain() will return false.
>
> Shai
>
>
> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <ma...@gmail.com>wrote:
>
>> It shouldn't need too though - the native lock file is simply a dummy file
>> to apply the lock too - shouldn't matter if it already exists or not (though
>> it seems to in the current code).
>>
>>
>> On 4/28/10 10:22 AM, Shai Erera wrote:
>>
>>> If you won't delete the file, the next obtain will fail?
>>>
>>> On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller <markrmiller@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>    I wonder if not being able to delete the file should throw a release
>>>    failed exception at all. You have actually released the native lock
>>>    - you where just not able to clean up - but that seems more like a
>>>    warning situation than a failure.
>>>
>>>
>>>    --
>>>    - Mark
>>>
>>>    http://www.lucidimagination.com
>>>
>>>    On 4/28/10 9:53 AM, Shai Erera wrote:
>>>
>>>        I've hit it again and here's the full stacktrace (at least
>>>        what's printed):
>>>
>>>             [junit] Exception in thread "main"
>>> java.lang.RuntimeException:
>>>        Failed to acquire random test lock; please verify filesystem for
>>>        lock
>>>        directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
>>>        supports
>>>        locking
>>>             [junit]     at
>>>
>>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>>>             [junit]     at
>>>
>>>  org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>>>             [junit]     at
>>>
>>>  org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>>>             [junit]     at
>>>        java.lang.J9VMInternals.newInstanceImpl(Native Method)
>>>             [junit]     at java.lang.Class.newInstance(Class.java:1325)
>>>             [junit]     at
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>>>             [junit]     at
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>>>             [junit]     at
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>>>             [junit]     at
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>>>             [junit]     at
>>>
>>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>>>             [junit] Caused by:
>>>        org.apache.lucene.store.LockReleaseFailedException: failed to
>>> delete
>>>
>>>  C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>>>             [junit]     at
>>>
>>>  org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>>>             [junit]     at
>>>
>>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>>>             [junit]     ... 9 more
>>>
>>>        The exception is thrown from NativeFSLock.release() b/c it fails
>>> to
>>>        delete the lock file. I think I know what the problem is - and
>>>        it must
>>>        be related to the large number of JVMs that are created w/ the
>>>        parallel
>>>        tests:
>>>        * Suppose that JVM1 draws the number '1' for the test lock file -
>>> it
>>>        thus creates lock1.
>>>        * Now suppose that JVM2 draws the same number, magically somehow
>>>        - it
>>>        thus creates lock1 as well.
>>>        * The code of acquireTestLock in NativeFSLockFactory looks like
>>>        this:
>>>             Lock l = makeLock(randomLockName);
>>>             try {
>>>               l.obtain();
>>>               l.release();
>>>        --> both will create the same test Lock file. Then l.obtain()
>>>        probably
>>>        returns false for one of them, but it's not checked.
>>>        * Then in release there are a couple of things to note:
>>>        1) the method is synced on the instance, which does not affect
>>>        the two JVMs.
>>>        2) suppose that both JVMs pass through the if (exists()) check.
>>> Then
>>>        JVM1 releases the lock, and deletes the file.
>>>        3) Now JVM2 kicks in, calls lock.release() which has no effect
>>>        (from the
>>>        jdoc: "If this lock object is invalid then invoking this method
>>>        has no
>>>        effect." ). Then when it comes to path.delete(), the file isn't
>>>        there,
>>>        the method returns false and thus an exception is thrown ...
>>>
>>>        This situation is extremely unlikely to happen, but still, it
>>>        happens on
>>>        my machine quite frequently since the parallel tests. I'm
>>>        thinking that
>>>        acquireTestLock should be less strict, but perhaps we can fix it
>>>        if we
>>>        replace the line:
>>>              if (!path.delete()) (line 310)
>>>        with this
>>>              if (!path.delete() && path.exists())
>>>
>>>        I.e., if the lock file fails to delete but is still there, throw
>>> the
>>>        exception ...
>>>
>>>        What do you think?
>>>
>>>        Shai
>>>
>>>        On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <rcmuir@gmail.com
>>>        <ma...@gmail.com>
>>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>> wrote:
>>>
>>>
>>>
>>>            On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
>>>        <vajda@osafoundation.org <ma...@osafoundation.org>
>>>        <mailto:vajda@osafoundation.org
>>>
>>>        <ma...@osafoundation.org>>> wrote:
>>>
>>>
>>>                I've had similar random failures on Mac OS X 10.6. They
>>>        started
>>>                happening recently, about two weeks ago.
>>>
>>>
>>>            Thats just too randomly close to when i last worked on this
>>>        build
>>>            system stuff for LUCENE-1709... perhaps I made it worse
>>>        instead of
>>>            better.
>>>
>>>            --
>>>            Robert Muir
>>>        rcmuir@gmail.com <ma...@gmail.com>
>>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>    ---------------------------------------------------------------------
>>>    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>    <ma...@lucene.apache.org>
>>>
>>>    For additional commands, e-mail: dev-help@lucene.apache.org
>>>    <ma...@lucene.apache.org>
>>>
>>>
>>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
But this method is called also for the regular lock file - if release()
won't delete the file, then the next l.obtain() will return false.

Shai

On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <ma...@gmail.com> wrote:

> It shouldn't need too though - the native lock file is simply a dummy file
> to apply the lock too - shouldn't matter if it already exists or not (though
> it seems to in the current code).
>
>
> On 4/28/10 10:22 AM, Shai Erera wrote:
>
>> If you won't delete the file, the next obtain will fail?
>>
>> On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller <markrmiller@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>    I wonder if not being able to delete the file should throw a release
>>    failed exception at all. You have actually released the native lock
>>    - you where just not able to clean up - but that seems more like a
>>    warning situation than a failure.
>>
>>
>>    --
>>    - Mark
>>
>>    http://www.lucidimagination.com
>>
>>    On 4/28/10 9:53 AM, Shai Erera wrote:
>>
>>        I've hit it again and here's the full stacktrace (at least
>>        what's printed):
>>
>>             [junit] Exception in thread "main" java.lang.RuntimeException:
>>        Failed to acquire random test lock; please verify filesystem for
>>        lock
>>        directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
>>        supports
>>        locking
>>             [junit]     at
>>
>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>>             [junit]     at
>>
>>  org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>>             [junit]     at
>>
>>  org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>>             [junit]     at
>>        java.lang.J9VMInternals.newInstanceImpl(Native Method)
>>             [junit]     at java.lang.Class.newInstance(Class.java:1325)
>>             [junit]     at
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>>             [junit]     at
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>>             [junit]     at
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>>             [junit]     at
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>>             [junit]     at
>>
>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>>             [junit] Caused by:
>>        org.apache.lucene.store.LockReleaseFailedException: failed to
>> delete
>>
>>  C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>>             [junit]     at
>>
>>  org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>>             [junit]     at
>>
>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>>             [junit]     ... 9 more
>>
>>        The exception is thrown from NativeFSLock.release() b/c it fails to
>>        delete the lock file. I think I know what the problem is - and
>>        it must
>>        be related to the large number of JVMs that are created w/ the
>>        parallel
>>        tests:
>>        * Suppose that JVM1 draws the number '1' for the test lock file -
>> it
>>        thus creates lock1.
>>        * Now suppose that JVM2 draws the same number, magically somehow
>>        - it
>>        thus creates lock1 as well.
>>        * The code of acquireTestLock in NativeFSLockFactory looks like
>>        this:
>>             Lock l = makeLock(randomLockName);
>>             try {
>>               l.obtain();
>>               l.release();
>>        --> both will create the same test Lock file. Then l.obtain()
>>        probably
>>        returns false for one of them, but it's not checked.
>>        * Then in release there are a couple of things to note:
>>        1) the method is synced on the instance, which does not affect
>>        the two JVMs.
>>        2) suppose that both JVMs pass through the if (exists()) check.
>> Then
>>        JVM1 releases the lock, and deletes the file.
>>        3) Now JVM2 kicks in, calls lock.release() which has no effect
>>        (from the
>>        jdoc: "If this lock object is invalid then invoking this method
>>        has no
>>        effect." ). Then when it comes to path.delete(), the file isn't
>>        there,
>>        the method returns false and thus an exception is thrown ...
>>
>>        This situation is extremely unlikely to happen, but still, it
>>        happens on
>>        my machine quite frequently since the parallel tests. I'm
>>        thinking that
>>        acquireTestLock should be less strict, but perhaps we can fix it
>>        if we
>>        replace the line:
>>              if (!path.delete()) (line 310)
>>        with this
>>              if (!path.delete() && path.exists())
>>
>>        I.e., if the lock file fails to delete but is still there, throw
>> the
>>        exception ...
>>
>>        What do you think?
>>
>>        Shai
>>
>>        On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <rcmuir@gmail.com
>>        <ma...@gmail.com>
>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>> wrote:
>>
>>
>>
>>            On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
>>        <vajda@osafoundation.org <ma...@osafoundation.org>
>>        <mailto:vajda@osafoundation.org
>>
>>        <ma...@osafoundation.org>>> wrote:
>>
>>
>>                I've had similar random failures on Mac OS X 10.6. They
>>        started
>>                happening recently, about two weeks ago.
>>
>>
>>            Thats just too randomly close to when i last worked on this
>>        build
>>            system stuff for LUCENE-1709... perhaps I made it worse
>>        instead of
>>            better.
>>
>>            --
>>            Robert Muir
>>        rcmuir@gmail.com <ma...@gmail.com>
>>        <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>>
>>
>>
>>
>>
>>
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>    <ma...@lucene.apache.org>
>>
>>    For additional commands, e-mail: dev-help@lucene.apache.org
>>    <ma...@lucene.apache.org>
>>
>>
>>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Mark Miller <ma...@gmail.com>.
It shouldn't need too though - the native lock file is simply a dummy 
file to apply the lock too - shouldn't matter if it already exists or 
not (though it seems to in the current code).

On 4/28/10 10:22 AM, Shai Erera wrote:
> If you won't delete the file, the next obtain will fail?
>
> On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller <markrmiller@gmail.com
> <ma...@gmail.com>> wrote:
>
>     I wonder if not being able to delete the file should throw a release
>     failed exception at all. You have actually released the native lock
>     - you where just not able to clean up - but that seems more like a
>     warning situation than a failure.
>
>
>     --
>     - Mark
>
>     http://www.lucidimagination.com
>
>     On 4/28/10 9:53 AM, Shai Erera wrote:
>
>         I've hit it again and here's the full stacktrace (at least
>         what's printed):
>
>              [junit] Exception in thread "main" java.lang.RuntimeException:
>         Failed to acquire random test lock; please verify filesystem for
>         lock
>         directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
>         supports
>         locking
>              [junit]     at
>         org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>              [junit]     at
>         org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>              [junit]     at
>         org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>              [junit]     at
>         java.lang.J9VMInternals.newInstanceImpl(Native Method)
>              [junit]     at java.lang.Class.newInstance(Class.java:1325)
>              [junit]     at
>         org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>              [junit]     at
>         org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>              [junit]     at
>         org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>              [junit]     at
>         org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>              [junit]     at
>         org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>              [junit] Caused by:
>         org.apache.lucene.store.LockReleaseFailedException: failed to delete
>         C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>              [junit]     at
>         org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>              [junit]     at
>         org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>              [junit]     ... 9 more
>
>         The exception is thrown from NativeFSLock.release() b/c it fails to
>         delete the lock file. I think I know what the problem is - and
>         it must
>         be related to the large number of JVMs that are created w/ the
>         parallel
>         tests:
>         * Suppose that JVM1 draws the number '1' for the test lock file - it
>         thus creates lock1.
>         * Now suppose that JVM2 draws the same number, magically somehow
>         - it
>         thus creates lock1 as well.
>         * The code of acquireTestLock in NativeFSLockFactory looks like
>         this:
>              Lock l = makeLock(randomLockName);
>              try {
>                l.obtain();
>                l.release();
>         --> both will create the same test Lock file. Then l.obtain()
>         probably
>         returns false for one of them, but it's not checked.
>         * Then in release there are a couple of things to note:
>         1) the method is synced on the instance, which does not affect
>         the two JVMs.
>         2) suppose that both JVMs pass through the if (exists()) check. Then
>         JVM1 releases the lock, and deletes the file.
>         3) Now JVM2 kicks in, calls lock.release() which has no effect
>         (from the
>         jdoc: "If this lock object is invalid then invoking this method
>         has no
>         effect." ). Then when it comes to path.delete(), the file isn't
>         there,
>         the method returns false and thus an exception is thrown ...
>
>         This situation is extremely unlikely to happen, but still, it
>         happens on
>         my machine quite frequently since the parallel tests. I'm
>         thinking that
>         acquireTestLock should be less strict, but perhaps we can fix it
>         if we
>         replace the line:
>               if (!path.delete()) (line 310)
>         with this
>               if (!path.delete() && path.exists())
>
>         I.e., if the lock file fails to delete but is still there, throw the
>         exception ...
>
>         What do you think?
>
>         Shai
>
>         On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <rcmuir@gmail.com
>         <ma...@gmail.com>
>         <mailto:rcmuir@gmail.com <ma...@gmail.com>>> wrote:
>
>
>
>             On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
>         <vajda@osafoundation.org <ma...@osafoundation.org>
>         <mailto:vajda@osafoundation.org
>         <ma...@osafoundation.org>>> wrote:
>
>
>                 I've had similar random failures on Mac OS X 10.6. They
>         started
>                 happening recently, about two weeks ago.
>
>
>             Thats just too randomly close to when i last worked on this
>         build
>             system stuff for LUCENE-1709... perhaps I made it worse
>         instead of
>             better.
>
>             --
>             Robert Muir
>         rcmuir@gmail.com <ma...@gmail.com>
>         <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>
>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
If you won't delete the file, the next obtain will fail?

On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller <ma...@gmail.com> wrote:

> I wonder if not being able to delete the file should throw a release failed
> exception at all. You have actually released the native lock - you where
> just not able to clean up - but that seems more like a warning situation
> than a failure.
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> On 4/28/10 9:53 AM, Shai Erera wrote:
>
>> I've hit it again and here's the full stacktrace (at least what's
>> printed):
>>
>>     [junit] Exception in thread "main" java.lang.RuntimeException:
>> Failed to acquire random test lock; please verify filesystem for lock
>> directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports
>> locking
>>     [junit]     at
>>
>> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>>     [junit]     at
>>
>> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>>     [junit]     at
>>
>> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>>     [junit]     at java.lang.J9VMInternals.newInstanceImpl(Native Method)
>>     [junit]     at java.lang.Class.newInstance(Class.java:1325)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>>     [junit] Caused by:
>> org.apache.lucene.store.LockReleaseFailedException: failed to delete
>> C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>>     [junit]     at
>> org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>>     [junit]     at
>>
>> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>>     [junit]     ... 9 more
>>
>> The exception is thrown from NativeFSLock.release() b/c it fails to
>> delete the lock file. I think I know what the problem is - and it must
>> be related to the large number of JVMs that are created w/ the parallel
>> tests:
>> * Suppose that JVM1 draws the number '1' for the test lock file - it
>> thus creates lock1.
>> * Now suppose that JVM2 draws the same number, magically somehow - it
>> thus creates lock1 as well.
>> * The code of acquireTestLock in NativeFSLockFactory looks like this:
>>     Lock l = makeLock(randomLockName);
>>     try {
>>       l.obtain();
>>       l.release();
>> --> both will create the same test Lock file. Then l.obtain() probably
>> returns false for one of them, but it's not checked.
>> * Then in release there are a couple of things to note:
>> 1) the method is synced on the instance, which does not affect the two
>> JVMs.
>> 2) suppose that both JVMs pass through the if (exists()) check. Then
>> JVM1 releases the lock, and deletes the file.
>> 3) Now JVM2 kicks in, calls lock.release() which has no effect (from the
>> jdoc: "If this lock object is invalid then invoking this method has no
>> effect." ). Then when it comes to path.delete(), the file isn't there,
>> the method returns false and thus an exception is thrown ...
>>
>> This situation is extremely unlikely to happen, but still, it happens on
>> my machine quite frequently since the parallel tests. I'm thinking that
>> acquireTestLock should be less strict, but perhaps we can fix it if we
>> replace the line:
>>      if (!path.delete()) (line 310)
>> with this
>>      if (!path.delete() && path.exists())
>>
>> I.e., if the lock file fails to delete but is still there, throw the
>> exception ...
>>
>> What do you think?
>>
>> Shai
>>
>> On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <rcmuir@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>
>>
>>    On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda <vajda@osafoundation.org
>>    <ma...@osafoundation.org>> wrote:
>>
>>
>>        I've had similar random failures on Mac OS X 10.6. They started
>>        happening recently, about two weeks ago.
>>
>>
>>    Thats just too randomly close to when i last worked on this build
>>    system stuff for LUCENE-1709... perhaps I made it worse instead of
>>    better.
>>
>>    --
>>    Robert Muir
>>    rcmuir@gmail.com <ma...@gmail.com>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Mark Miller <ma...@gmail.com>.
I wonder if not being able to delete the file should throw a release 
failed exception at all. You have actually released the native lock - 
you where just not able to clean up - but that seems more like a warning 
situation than a failure.


-- 
- Mark

http://www.lucidimagination.com

On 4/28/10 9:53 AM, Shai Erera wrote:
> I've hit it again and here's the full stacktrace (at least what's printed):
>
>      [junit] Exception in thread "main" java.lang.RuntimeException:
> Failed to acquire random test lock; please verify filesystem for lock
> directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports
> locking
>      [junit]     at
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>      [junit]     at
> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>      [junit]     at
> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>      [junit]     at java.lang.J9VMInternals.newInstanceImpl(Native Method)
>      [junit]     at java.lang.Class.newInstance(Class.java:1325)
>      [junit]     at
> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>      [junit]     at
> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>      [junit]     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>      [junit]     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>      [junit]     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>      [junit] Caused by:
> org.apache.lucene.store.LockReleaseFailedException: failed to delete
> C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>      [junit]     at
> org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>      [junit]     at
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>      [junit]     ... 9 more
>
> The exception is thrown from NativeFSLock.release() b/c it fails to
> delete the lock file. I think I know what the problem is - and it must
> be related to the large number of JVMs that are created w/ the parallel
> tests:
> * Suppose that JVM1 draws the number '1' for the test lock file - it
> thus creates lock1.
> * Now suppose that JVM2 draws the same number, magically somehow - it
> thus creates lock1 as well.
> * The code of acquireTestLock in NativeFSLockFactory looks like this:
>      Lock l = makeLock(randomLockName);
>      try {
>        l.obtain();
>        l.release();
> --> both will create the same test Lock file. Then l.obtain() probably
> returns false for one of them, but it's not checked.
> * Then in release there are a couple of things to note:
> 1) the method is synced on the instance, which does not affect the two JVMs.
> 2) suppose that both JVMs pass through the if (exists()) check. Then
> JVM1 releases the lock, and deletes the file.
> 3) Now JVM2 kicks in, calls lock.release() which has no effect (from the
> jdoc: "If this lock object is invalid then invoking this method has no
> effect." ). Then when it comes to path.delete(), the file isn't there,
> the method returns false and thus an exception is thrown ...
>
> This situation is extremely unlikely to happen, but still, it happens on
> my machine quite frequently since the parallel tests. I'm thinking that
> acquireTestLock should be less strict, but perhaps we can fix it if we
> replace the line:
>       if (!path.delete()) (line 310)
> with this
>       if (!path.delete() && path.exists())
>
> I.e., if the lock file fails to delete but is still there, throw the
> exception ...
>
> What do you think?
>
> Shai
>
> On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <rcmuir@gmail.com
> <ma...@gmail.com>> wrote:
>
>
>
>     On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda <vajda@osafoundation.org
>     <ma...@osafoundation.org>> wrote:
>
>
>         I've had similar random failures on Mac OS X 10.6. They started
>         happening recently, about two weeks ago.
>
>
>     Thats just too randomly close to when i last worked on this build
>     system stuff for LUCENE-1709... perhaps I made it worse instead of
>     better.
>
>     --
>     Robert Muir
>     rcmuir@gmail.com <ma...@gmail.com>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
I've hit it again and here's the full stacktrace (at least what's printed):

    [junit] Exception in thread "main" java.lang.RuntimeException: Failed to
acquire random test lock; please verify filesystem for lock directory
'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
    [junit]     at
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
    [junit]     at
org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
    [junit]     at
org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
    [junit]     at java.lang.J9VMInternals.newInstanceImpl(Native Method)
    [junit]     at java.lang.Class.newInstance(Class.java:1325)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
    [junit] Caused by: org.apache.lucene.store.LockReleaseFailedException:
failed to delete
C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
    [junit]     at
org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
    [junit]     at
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
    [junit]     ... 9 more

The exception is thrown from NativeFSLock.release() b/c it fails to delete
the lock file. I think I know what the problem is - and it must be related
to the large number of JVMs that are created w/ the parallel tests:
* Suppose that JVM1 draws the number '1' for the test lock file - it thus
creates lock1.
* Now suppose that JVM2 draws the same number, magically somehow - it thus
creates lock1 as well.
* The code of acquireTestLock in NativeFSLockFactory looks like this:
    Lock l = makeLock(randomLockName);
    try {
      l.obtain();
      l.release();
--> both will create the same test Lock file. Then l.obtain() probably
returns false for one of them, but it's not checked.
* Then in release there are a couple of things to note:
1) the method is synced on the instance, which does not affect the two JVMs.
2) suppose that both JVMs pass through the if (exists()) check. Then JVM1
releases the lock, and deletes the file.
3) Now JVM2 kicks in, calls lock.release() which has no effect (from the
jdoc: "If this lock object is invalid then invoking this method has no
effect." ). Then when it comes to path.delete(), the file isn't there, the
method returns false and thus an exception is thrown ...

This situation is extremely unlikely to happen, but still, it happens on my
machine quite frequently since the parallel tests. I'm thinking that
acquireTestLock should be less strict, but perhaps we can fix it if we
replace the line:
     if (!path.delete()) (line 310)
with this
     if (!path.delete() && path.exists())

I.e., if the lock file fails to delete but is still there, throw the
exception ...

What do you think?

Shai

On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <rc...@gmail.com> wrote:

>
>
> On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda <va...@osafoundation.org>wrote:
>
>>
>> I've had similar random failures on Mac OS X 10.6. They started happening
>> recently, about two weeks ago.
>>
>>
> Thats just too randomly close to when i last worked on this build system
> stuff for LUCENE-1709... perhaps I made it worse instead of better.
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda <va...@osafoundation.org> wrote:

>
> I've had similar random failures on Mac OS X 10.6. They started happening
> recently, about two weeks ago.
>
>
Thats just too randomly close to when i last worked on this build system
stuff for LUCENE-1709... perhaps I made it worse instead of better.

-- 
Robert Muir
rcmuir@gmail.com

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Andi Vajda <va...@osafoundation.org>.
On Tue, 27 Apr 2010, Andi Vajda wrote:

>
> On Tue, 27 Apr 2010, Shai Erera wrote:
>
>> But that's not a good explanation I think. Realtime protection is always 
>> on,
>> and I agree w/ Mark - if NativeFSLock fails because of that, we should fix
>> that b/c Lucene is run on users' machines, where AV software is running 
>> too.
>> Moreover, this happens very rarely for me ... I even ran the tests while
>> scanning the Temp folder at the same time, and it didn't happen.
>
> I've had similar random failures on Mac OS X 10.6. They started happening 
> recently, about two weeks ago.

Here is a stacktrace:

======================================================================
ERROR: testWriteLock (lia.indexing.LockTest.LockTest)
----------------------------------------------------------------------
Traceback (most recent call last):
   File 
"/Users/vajda/apache/pylucene/samples/LuceneInAction/lia/indexing/LockTest.py", 
line 45, in testWriteLock
     writer1.close()
JavaError: org.apache.lucene.store.LockReleaseFailedException: failed to 
delete 
/private/var/folders/lp/lp2+7G9YFNapwTmv3E25uE+++TI/-Tmp-/index/write.lock
     Java stacktrace:
org.apache.lucene.store.LockReleaseFailedException: failed to delete 
/private/var/folders/lp/lp2+7G9YFNapwTmv3E25uE+++TI/-Tmp-/index/write.lock
 	at 
org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
 	at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1801)
 	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1732)
 	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1696)

Andi..

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Andi Vajda <va...@osafoundation.org>.
On Tue, 27 Apr 2010, Shai Erera wrote:

> But that's not a good explanation I think. Realtime protection is always on,
> and I agree w/ Mark - if NativeFSLock fails because of that, we should fix
> that b/c Lucene is run on users' machines, where AV software is running too.
> Moreover, this happens very rarely for me ... I even ran the tests while
> scanning the Temp folder at the same time, and it didn't happen.

I've had similar random failures on Mac OS X 10.6. They started happening 
recently, about two weeks ago.

Andi..

> 
> I'll re-post the next time it happens for me, w/ the full stacktrace.
> 
> Shai
> 
> On Tue, Apr 27, 2010 at 8:43 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>       Realtime protection is mostly the problem.
>
> 
>
>       -----
>
>       Uwe Schindler
>
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>
>       http://www.thetaphi.de
>
>       eMail: uwe@thetaphi.de
>
> 
> 
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: Tuesday, April 27, 2010 7:35 PM
> 
> 
> To: dev@lucene.apache.org
> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
>
> 
> 
> I use Windows XP, and have no indexing service nor AV running at the
> same time (except its real time protection). Also, the lock file is
> attempted to obtain on the temp folder, which is usually excluded from
> many services that monitor the file system ...
> 
> I will try to reproduce it again, because I see that I've left out the
> nested exception from the trace above.
> 
> Shai
> 
> On Tue, Apr 27, 2010 at 7:56 PM, Shai Erera <se...@gmail.com> wrote:
> 
> Yes it is Windows. Didn't mention it - thought the C:\ part says it
> all :).
> 
> I wonder then why it only sometimes happens. And I've never run into
> such problems w/ NativeFSLock on Windows, only w/ the tests. But I
> agree it does deserve a closer look ?
> 
> Shai
> 
> 
> On Tuesday, April 27, 2010, Mark Miller <ma...@gmail.com> wrote:
> > Ah - didn't look closely. This is while making the lock, not trying
> to acquire it for stdout locking. So that seems like a bug in our
> native lock impl we should try and fix.
> >
> > On 4/27/10 12:27 PM, Uwe Schindler wrote:
> >
> > When aquiring a test lock it does not wait. It just is not able to
> produce the file there. This happens sometimes on windows and has
> nothing to do with the tests, is a problem of NativeLockF.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >
> > -----Original Message-----
> > From: Mark Miller [mailto:markrmiller@gmail.com]
> > Sent: Tuesday, April 27, 2010 6:20 PM
> > To: dev@lucene.apache.org
> > Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> >
> > We might need a higher timeout. Its like 5 seconds now. Otherwise we
> > should try and isolate the problem.
> >
> > - Mark
> >
> > On 4/27/10 11:52 AM, Uwe Schindler wrote:
> >
> > Windows?
> >
> > -----
> >
> > Uwe Schindler
> >
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> >
> > http://www.thetaphi.de<http://www.thetaphi.de/>
> >
> > eMail: uwe@thetaphi.de
> >
> > *From:* Shai Erera [mailto:serera@gmail.com]
> > *Sent:* Tuesday, April 27, 2010 5:50 PM
> > *To:* dev@lucene.apache.org
> > *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
> >
> > Hi
> >
> > I ran "ant test-core" today and hit this:
> >
> > [junit] Exception in thread "main" java.lang.RuntimeException:
> Failed
> >
> > to
> >
> > acquire random test lock; please verify filesystem for lock
> directory
> > 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
> > [junit] at
> >
> >
> >
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
> > kFactory.java:88)
> >
> > [junit] at
> >
> >
> >
> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
> > y.java:127)
> >
> > [junit] at
> >
> >
> >
> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
> > ultFormatter.java:74)
> >
> >
> > All the tests still pass, but Ant reports a failure in the end.
> Also,
> > this rarely happens, but I've run into it several times already.
> >
> > Anyone
> >
> > got an idea?
> >
> > Shai
> >
> >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
>
> 
> 
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
But that's not a good explanation I think. Realtime protection is always on,
and I agree w/ Mark - if NativeFSLock fails because of that, we should fix
that b/c Lucene is run on users' machines, where AV software is running too.
Moreover, this happens very rarely for me ... I even ran the tests while
scanning the Temp folder at the same time, and it didn't happen.

I'll re-post the next time it happens for me, w/ the full stacktrace.

Shai

On Tue, Apr 27, 2010 at 8:43 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  Realtime protection is mostly the problem.
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Tuesday, April 27, 2010 7:35 PM
>
> *To:* dev@lucene.apache.org
> *Subject:* Re: LuceneJUnitResultFormatter sometimes fails to lock
>
>
>
> I use Windows XP, and have no indexing service nor AV running at the same
> time (except its real time protection). Also, the lock file is attempted to
> obtain on the temp folder, which is usually excluded from many services that
> monitor the file system ...
>
> I will try to reproduce it again, because I see that I've left out the
> nested exception from the trace above.
>
> Shai
>
> On Tue, Apr 27, 2010 at 7:56 PM, Shai Erera <se...@gmail.com> wrote:
>
> Yes it is Windows. Didn't mention it - thought the C:\ part says it all :).
>
> I wonder then why it only sometimes happens. And I've never run into
> such problems w/ NativeFSLock on Windows, only w/ the tests. But I
> agree it does deserve a closer look …
>
> Shai
>
>
> On Tuesday, April 27, 2010, Mark Miller <ma...@gmail.com> wrote:
> > Ah - didn't look closely. This is while making the lock, not trying to
> acquire it for stdout locking. So that seems like a bug in our native lock
> impl we should try and fix.
> >
> > On 4/27/10 12:27 PM, Uwe Schindler wrote:
> >
> > When aquiring a test lock it does not wait. It just is not able to
> produce the file there. This happens sometimes on windows and has nothing to
> do with the tests, is a problem of NativeLockF.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >
> > -----Original Message-----
> > From: Mark Miller [mailto:markrmiller@gmail.com]
> > Sent: Tuesday, April 27, 2010 6:20 PM
> > To: dev@lucene.apache.org
> > Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> >
> > We might need a higher timeout. Its like 5 seconds now. Otherwise we
> > should try and isolate the problem.
> >
> > - Mark
> >
> > On 4/27/10 11:52 AM, Uwe Schindler wrote:
> >
> > Windows?
> >
> > -----
> >
> > Uwe Schindler
> >
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> >
> > http://www.thetaphi.de<http://www.thetaphi.de/>
> >
> > eMail: uwe@thetaphi.de
> >
> > *From:* Shai Erera [mailto:serera@gmail.com]
> > *Sent:* Tuesday, April 27, 2010 5:50 PM
> > *To:* dev@lucene.apache.org
> > *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
> >
> > Hi
> >
> > I ran "ant test-core" today and hit this:
> >
> > [junit] Exception in thread "main" java.lang.RuntimeException: Failed
> >
> > to
> >
> > acquire random test lock; please verify filesystem for lock directory
> > 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
> > [junit] at
> >
> >
> > org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
> > kFactory.java:88)
> >
> > [junit] at
> >
> >
> > org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
> > y.java:127)
> >
> > [junit] at
> >
> >
> > org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
> > ultFormatter.java:74)
> >
> >
> > All the tests still pass, but Ant reports a failure in the end. Also,
> > this rarely happens, but I've run into it several times already.
> >
> > Anyone
> >
> > got an idea?
> >
> > Shai
> >
> >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
>
>
>

RE: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Uwe Schindler <uw...@thetaphi.de>.
Realtime protection is mostly the problem.

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Shai Erera [mailto:serera@gmail.com] 
Sent: Tuesday, April 27, 2010 7:35 PM
To: dev@lucene.apache.org
Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock

 

I use Windows XP, and have no indexing service nor AV running at the same time (except its real time protection). Also, the lock file is attempted to obtain on the temp folder, which is usually excluded from many services that monitor the file system ...

I will try to reproduce it again, because I see that I've left out the nested exception from the trace above.

Shai

On Tue, Apr 27, 2010 at 7:56 PM, Shai Erera <se...@gmail.com> wrote:

Yes it is Windows. Didn't mention it - thought the C:\ part says it all :).

I wonder then why it only sometimes happens. And I've never run into
such problems w/ NativeFSLock on Windows, only w/ the tests. But I
agree it does deserve a closer look …

Shai


On Tuesday, April 27, 2010, Mark Miller <ma...@gmail.com> wrote:
> Ah - didn't look closely. This is while making the lock, not trying to acquire it for stdout locking. So that seems like a bug in our native lock impl we should try and fix.
>
> On 4/27/10 12:27 PM, Uwe Schindler wrote:
>
> When aquiring a test lock it does not wait. It just is not able to produce the file there. This happens sometimes on windows and has nothing to do with the tests, is a problem of NativeLockF.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Tuesday, April 27, 2010 6:20 PM
> To: dev@lucene.apache.org
> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
>
> We might need a higher timeout. Its like 5 seconds now. Otherwise we
> should try and isolate the problem.
>
> - Mark
>
> On 4/27/10 11:52 AM, Uwe Schindler wrote:
>
> Windows?
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de<http://www.thetaphi.de/>
>
> eMail: uwe@thetaphi.de
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Tuesday, April 27, 2010 5:50 PM
> *To:* dev@lucene.apache.org
> *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
>
> Hi
>
> I ran "ant test-core" today and hit this:
>
> [junit] Exception in thread "main" java.lang.RuntimeException: Failed
>
> to
>
> acquire random test lock; please verify filesystem for lock directory
> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
> [junit] at
>
>
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
> kFactory.java:88)
>
> [junit] at
>
>
> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
> y.java:127)
>
> [junit] at
>
>
> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
> ultFormatter.java:74)
>
>
> All the tests still pass, but Ant reports a failure in the end. Also,
> this rarely happens, but I've run into it several times already.
>
> Anyone
>
> got an idea?
>
> Shai
>
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

 


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
I use Windows XP, and have no indexing service nor AV running at the same
time (except its real time protection). Also, the lock file is attempted to
obtain on the temp folder, which is usually excluded from many services that
monitor the file system ...

I will try to reproduce it again, because I see that I've left out the
nested exception from the trace above.

Shai

On Tue, Apr 27, 2010 at 7:56 PM, Shai Erera <se...@gmail.com> wrote:

> Yes it is Windows. Didn't mention it - thought the C:\ part says it all :).
>
> I wonder then why it only sometimes happens. And I've never run into
> such problems w/ NativeFSLock on Windows, only w/ the tests. But I
> agree it does deserve a closer look …
>
> Shai
>
> On Tuesday, April 27, 2010, Mark Miller <ma...@gmail.com> wrote:
> > Ah - didn't look closely. This is while making the lock, not trying to
> acquire it for stdout locking. So that seems like a bug in our native lock
> impl we should try and fix.
> >
> > On 4/27/10 12:27 PM, Uwe Schindler wrote:
> >
> > When aquiring a test lock it does not wait. It just is not able to
> produce the file there. This happens sometimes on windows and has nothing to
> do with the tests, is a problem of NativeLockF.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >
> > -----Original Message-----
> > From: Mark Miller [mailto:markrmiller@gmail.com]
> > Sent: Tuesday, April 27, 2010 6:20 PM
> > To: dev@lucene.apache.org
> > Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> >
> > We might need a higher timeout. Its like 5 seconds now. Otherwise we
> > should try and isolate the problem.
> >
> > - Mark
> >
> > On 4/27/10 11:52 AM, Uwe Schindler wrote:
> >
> > Windows?
> >
> > -----
> >
> > Uwe Schindler
> >
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> >
> > http://www.thetaphi.de<http://www.thetaphi.de/>
> >
> > eMail: uwe@thetaphi.de
> >
> > *From:* Shai Erera [mailto:serera@gmail.com]
> > *Sent:* Tuesday, April 27, 2010 5:50 PM
> > *To:* dev@lucene.apache.org
> > *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
> >
> > Hi
> >
> > I ran "ant test-core" today and hit this:
> >
> > [junit] Exception in thread "main" java.lang.RuntimeException: Failed
> >
> > to
> >
> > acquire random test lock; please verify filesystem for lock directory
> > 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
> > [junit] at
> >
> >
> > org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
> > kFactory.java:88)
> >
> > [junit] at
> >
> >
> > org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
> > y.java:127)
> >
> > [junit] at
> >
> >
> > org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
> > ultFormatter.java:74)
> >
> >
> > All the tests still pass, but Ant reports a failure in the end. Also,
> > this rarely happens, but I've run into it several times already.
> >
> > Anyone
> >
> > got an idea?
> >
> > Shai
> >
> >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
>

Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Shai Erera <se...@gmail.com>.
Yes it is Windows. Didn't mention it - thought the C:\ part says it all :).

I wonder then why it only sometimes happens. And I've never run into
such problems w/ NativeFSLock on Windows, only w/ the tests. But I
agree it does deserve a closer look …

Shai

On Tuesday, April 27, 2010, Mark Miller <ma...@gmail.com> wrote:
> Ah - didn't look closely. This is while making the lock, not trying to acquire it for stdout locking. So that seems like a bug in our native lock impl we should try and fix.
>
> On 4/27/10 12:27 PM, Uwe Schindler wrote:
>
> When aquiring a test lock it does not wait. It just is not able to produce the file there. This happens sometimes on windows and has nothing to do with the tests, is a problem of NativeLockF.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Tuesday, April 27, 2010 6:20 PM
> To: dev@lucene.apache.org
> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
>
> We might need a higher timeout. Its like 5 seconds now. Otherwise we
> should try and isolate the problem.
>
> - Mark
>
> On 4/27/10 11:52 AM, Uwe Schindler wrote:
>
> Windows?
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de<http://www.thetaphi.de/>
>
> eMail: uwe@thetaphi.de
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Tuesday, April 27, 2010 5:50 PM
> *To:* dev@lucene.apache.org
> *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
>
> Hi
>
> I ran "ant test-core" today and hit this:
>
> [junit] Exception in thread "main" java.lang.RuntimeException: Failed
>
> to
>
> acquire random test lock; please verify filesystem for lock directory
> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
> [junit] at
>
>
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
> kFactory.java:88)
>
> [junit] at
>
>
> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
> y.java:127)
>
> [junit] at
>
>
> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
> ultFormatter.java:74)
>
>
> All the tests still pass, but Ant reports a failure in the end. Also,
> this rarely happens, but I've run into it several times already.
>
> Anyone
>
> got an idea?
>
> Shai
>
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Mark Miller <ma...@gmail.com>.
Ah - didn't look closely. This is while making the lock, not trying to 
acquire it for stdout locking. So that seems like a bug in our native 
lock impl we should try and fix.

On 4/27/10 12:27 PM, Uwe Schindler wrote:
> When aquiring a test lock it does not wait. It just is not able to produce the file there. This happens sometimes on windows and has nothing to do with the tests, is a problem of NativeLockF.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Tuesday, April 27, 2010 6:20 PM
>> To: dev@lucene.apache.org
>> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
>>
>> We might need a higher timeout. Its like 5 seconds now. Otherwise we
>> should try and isolate the problem.
>>
>> - Mark
>>
>> On 4/27/10 11:52 AM, Uwe Schindler wrote:
>>> Windows?
>>>
>>> -----
>>>
>>> Uwe Schindler
>>>
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>
>>> http://www.thetaphi.de<http://www.thetaphi.de/>
>>>
>>> eMail: uwe@thetaphi.de
>>>
>>> *From:* Shai Erera [mailto:serera@gmail.com]
>>> *Sent:* Tuesday, April 27, 2010 5:50 PM
>>> *To:* dev@lucene.apache.org
>>> *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
>>>
>>> Hi
>>>
>>> I ran "ant test-core" today and hit this:
>>>
>>> [junit] Exception in thread "main" java.lang.RuntimeException: Failed
>> to
>>> acquire random test lock; please verify filesystem for lock directory
>>> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
>>> [junit] at
>>>
>> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
>> kFactory.java:88)
>>> [junit] at
>>>
>> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
>> y.java:127)
>>> [junit] at
>>>
>> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
>> ultFormatter.java:74)
>>>
>>> All the tests still pass, but Ant reports a failure in the end. Also,
>>> this rarely happens, but I've run into it several times already.
>> Anyone
>>> got an idea?
>>>
>>> Shai
>>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>


-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Uwe Schindler <uw...@thetaphi.de>.
When aquiring a test lock it does not wait. It just is not able to produce the file there. This happens sometimes on windows and has nothing to do with the tests, is a problem of NativeLockF.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Tuesday, April 27, 2010 6:20 PM
> To: dev@lucene.apache.org
> Subject: Re: LuceneJUnitResultFormatter sometimes fails to lock
> 
> We might need a higher timeout. Its like 5 seconds now. Otherwise we
> should try and isolate the problem.
> 
> - Mark
> 
> On 4/27/10 11:52 AM, Uwe Schindler wrote:
> > Windows?
> >
> > -----
> >
> > Uwe Schindler
> >
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> >
> > http://www.thetaphi.de <http://www.thetaphi.de/>
> >
> > eMail: uwe@thetaphi.de
> >
> > *From:* Shai Erera [mailto:serera@gmail.com]
> > *Sent:* Tuesday, April 27, 2010 5:50 PM
> > *To:* dev@lucene.apache.org
> > *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
> >
> > Hi
> >
> > I ran "ant test-core" today and hit this:
> >
> > [junit] Exception in thread "main" java.lang.RuntimeException: Failed
> to
> > acquire random test lock; please verify filesystem for lock directory
> > 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
> > [junit] at
> >
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLoc
> kFactory.java:88)
> > [junit] at
> >
> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
> y.java:127)
> > [junit] at
> >
> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitRes
> ultFormatter.java:74)
> >
> > All the tests still pass, but Ant reports a failure in the end. Also,
> > this rarely happens, but I've run into it several times already.
> Anyone
> > got an idea?
> >
> > Shai
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Mark Miller <ma...@gmail.com>.
We might need a higher timeout. Its like 5 seconds now. Otherwise we 
should try and isolate the problem.

- Mark

On 4/27/10 11:52 AM, Uwe Schindler wrote:
> Windows?
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de <http://www.thetaphi.de/>
>
> eMail: uwe@thetaphi.de
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Tuesday, April 27, 2010 5:50 PM
> *To:* dev@lucene.apache.org
> *Subject:* LuceneJUnitResultFormatter sometimes fails to lock
>
> Hi
>
> I ran "ant test-core" today and hit this:
>
> [junit] Exception in thread "main" java.lang.RuntimeException: Failed to
> acquire random test lock; please verify filesystem for lock directory
> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
> [junit] at
> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
> [junit] at
> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
> [junit] at
> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>
> All the tests still pass, but Ant reports a failure in the end. Also,
> this rarely happens, but I've run into it several times already. Anyone
> got an idea?
>
> Shai
>


-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: LuceneJUnitResultFormatter sometimes fails to lock

Posted by Uwe Schindler <uw...@thetaphi.de>.
Windows?

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: Shai Erera [mailto:serera@gmail.com] 
Sent: Tuesday, April 27, 2010 5:50 PM
To: dev@lucene.apache.org
Subject: LuceneJUnitResultFormatter sometimes fails to lock

 

Hi

I ran "ant test-core" today and hit this:

[junit] Exception in thread "main" java.lang.RuntimeException: Failed to acquire random test lock; please verify filesystem for lock directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
[junit] at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
[junit] at org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
[junit] at org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)

All the tests still pass, but Ant reports a failure in the end. Also, this rarely happens, but I've run into it several times already. Anyone got an idea?

Shai