You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by yo...@apache.org on 2010/01/16 16:33:54 UTC

svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Author: yonik
Date: Sat Jan 16 15:33:54 2010
New Revision: 899979

URL: http://svn.apache.org/viewvc?rev=899979&view=rev
Log:
doc: note about native locks not working for multiple webapps in same JVM

Modified:
    lucene/solr/trunk/example/solr/conf/solrconfig.xml

Modified: lucene/solr/trunk/example/solr/conf/solrconfig.xml
URL: http://svn.apache.org/viewvc/lucene/solr/trunk/example/solr/conf/solrconfig.xml?rev=899979&r1=899978&r2=899979&view=diff
==============================================================================
--- lucene/solr/trunk/example/solr/conf/solrconfig.xml (original)
+++ lucene/solr/trunk/example/solr/conf/solrconfig.xml Sat Jan 16 15:33:54 2010
@@ -130,7 +130,8 @@
       single = SingleInstanceLockFactory - suggested for a read-only index
                or when there is no possibility of another process trying
                to modify the index.
-      native = NativeFSLockFactory  - uses OS native file locking
+      native = NativeFSLockFactory  - uses OS native file locking.
+               Do not use with multiple solr webapps in the same JVM.
       simple = SimpleFSLockFactory  - uses a plain file for locking
 
       (For backwards compatibility with Solr 1.2, 'simple' is the default



Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Sanne Grinovero <sa...@gmail.com>.
thanks for the heads-up, this is good to know.
I've updated http://wiki.apache.org/lucene-java/AvailableLockFactories
which I recently created as a guide to help in choosing between
different LockFactories.

I believe the Native LockFactory is very useful, I wouldn't consider
this a bug nor consider discouraging it's use, people just need to be
informed of the behavior and know that no LockFactory impl is good for
all cases.

Adding some lines to it's javadoc seems appropriate.

Regards,
Sanne

2010/1/20 Chris Hostetter <ho...@fucit.org>:
>
> : > At a minimu, shouldn't NativeFSLock.obtain() be checking for
> : > OverlappingFileLockException and treating that as a failure to acquire the
> : > lock?
>        ...
> : Perhaps - that should make it work in more cases - but in my simple
> : testing its not 100% reliable.
>        ...
> : File locks are held on behalf of the entire Java virtual machine.
> :      * They are not suitable for controlling access to a file by multiple
> :      * threads within the same virtual machine.
>
> ...Grrr....  so where does that leave us?
>
> Yonik's added comment was that "native" isnt' recommended when running
> multiple webapps in the same container.  in truth, "native" *can*
> work when running multiple webapps in the same container, just as long as
> those cotnainers don't refrence the same data dirs
>
> I'm worried that we should recommend people avoid native altogether
> because even if you are only running one webapp, it seems like a "reload"
> or that app could trigger some similar bad behavior.
>
> So what/how should we document all of this?
>
> -Hoss
>
>

Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Chris Hostetter <ho...@fucit.org>.
: >> So what/how should we document all of this?
	...
: > I've got more info on this.

Mark: most of what you wrote is above my head, but since you fixed a 
grammar error in my updated example solrconfig.xml comment w/o making any 
content changes, I'm assuming you feel what i put there is sufficient.

Most of your comments feel like they should be raised over in Lucene-Java 
land, at a minimum in documentation (added to the AvailableLockFactories 
page perhaps) or possibly in some code changes (should we changed the 
default LockFactory depending on Java version?)

I'll leave that up to you, since (as i mentioned) i didnt' understand half 
of it.

: > Checking for OverlappingFileLockException *should* actually work when
: > using Java 1.6. Java 1.6 started using a *system wide* thread safe check
: > for this.
: >
: > Previous to Java 1.6, checks for this *were* limited to an instance of
: > FileChannel - the FileChannel maintained its own personal lock list. So
: > you have to use
: > the same Channel to even have any hope of seeing an
: > OverlappingFileLockException. Even then though, its not properly thread
: > safe. They did not sync across
: > checking if the lock exists and acquiring the lock - they separately
: > sync each action - leaving room to acquire the lock twice from two
: > different threads like I was seeing.
: >
: > Interestingly, Java 1.6 has a back compat mode you can turn on that
: > doesn't use the system wide lock list, and they have fixed this thread
: > safety issue in that impl - there is a sync across checking
: > and getting the lock so that it is properly thread safe - but not in
: > Java 1.4, 1.5.
: >
: > Looking at GCC - uh ... I don't think you want to use GCC - they don't
: > appear to use a lock list and check for this at all :)
: >
: > But the point is, this is fixable on Java 6 if we check for
: > OverlappingFileLockException - it *should* work across webapps, and it
: > is actually thread safe, unlike Java 1.4,1.5.
: >
: >   
: Another interesting fact:
: 
: On Windows, if you attempt to lock the same file with different channel
: instances pre Java 1.6 - the code will deadlock.
: 
: -- 
: - Mark
: 
: http://www.lucidimagination.com
: 
: 
: 



-Hoss


Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Mark Miller <ma...@gmail.com>.
Mark Miller wrote:
> Chris Hostetter wrote:
>   
>> : > At a minimu, shouldn't NativeFSLock.obtain() be checking for 
>> : > OverlappingFileLockException and treating that as a failure to acquire the 
>> : > lock?
>> 	...
>> : Perhaps - that should make it work in more cases - but in my simple
>> : testing its not 100% reliable.
>> 	...
>> : File locks are held on behalf of the entire Java virtual machine.
>> :      * They are not suitable for controlling access to a file by multiple
>> :      * threads within the same virtual machine.
>>
>> ...Grrr....  so where does that leave us?
>>
>> Yonik's added comment was that "native" isnt' recommended when running 
>> multiple webapps in the same container.  in truth, "native" *can* 
>> work when running multiple webapps in the same container, just as long as 
>> those cotnainers don't refrence the same data dirs
>>
>> I'm worried that we should recommend people avoid native altogether 
>> because even if you are only running one webapp, it seems like a "reload" 
>> or that app could trigger some similar bad behavior.
>>
>> So what/how should we document all of this?
>>
>> -Hoss
>>
>>   
>>     
> I've got more info on this.
>
> Checking for OverlappingFileLockException *should* actually work when
> using Java 1.6. Java 1.6 started using a *system wide* thread safe check
> for this.
>
> Previous to Java 1.6, checks for this *were* limited to an instance of
> FileChannel - the FileChannel maintained its own personal lock list. So
> you have to use
> the same Channel to even have any hope of seeing an
> OverlappingFileLockException. Even then though, its not properly thread
> safe. They did not sync across
> checking if the lock exists and acquiring the lock - they separately
> sync each action - leaving room to acquire the lock twice from two
> different threads like I was seeing.
>
> Interestingly, Java 1.6 has a back compat mode you can turn on that
> doesn't use the system wide lock list, and they have fixed this thread
> safety issue in that impl - there is a sync across checking
> and getting the lock so that it is properly thread safe - but not in
> Java 1.4, 1.5.
>
> Looking at GCC - uh ... I don't think you want to use GCC - they don't
> appear to use a lock list and check for this at all :)
>
> But the point is, this is fixable on Java 6 if we check for
> OverlappingFileLockException - it *should* work across webapps, and it
> is actually thread safe, unlike Java 1.4,1.5.
>
>   
Another interesting fact:

On Windows, if you attempt to lock the same file with different channel
instances pre Java 1.6 - the code will deadlock.

-- 
- Mark

http://www.lucidimagination.com




Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Mark Miller <ma...@gmail.com>.
Chris Hostetter wrote:
> : > At a minimu, shouldn't NativeFSLock.obtain() be checking for 
> : > OverlappingFileLockException and treating that as a failure to acquire the 
> : > lock?
> 	...
> : Perhaps - that should make it work in more cases - but in my simple
> : testing its not 100% reliable.
> 	...
> : File locks are held on behalf of the entire Java virtual machine.
> :      * They are not suitable for controlling access to a file by multiple
> :      * threads within the same virtual machine.
>
> ...Grrr....  so where does that leave us?
>
> Yonik's added comment was that "native" isnt' recommended when running 
> multiple webapps in the same container.  in truth, "native" *can* 
> work when running multiple webapps in the same container, just as long as 
> those cotnainers don't refrence the same data dirs
>
> I'm worried that we should recommend people avoid native altogether 
> because even if you are only running one webapp, it seems like a "reload" 
> or that app could trigger some similar bad behavior.
>
> So what/how should we document all of this?
>
> -Hoss
>
>   
I've got more info on this.

Checking for OverlappingFileLockException *should* actually work when
using Java 1.6. Java 1.6 started using a *system wide* thread safe check
for this.

Previous to Java 1.6, checks for this *were* limited to an instance of
FileChannel - the FileChannel maintained its own personal lock list. So
you have to use
the same Channel to even have any hope of seeing an
OverlappingFileLockException. Even then though, its not properly thread
safe. They did not sync across
checking if the lock exists and acquiring the lock - they separately
sync each action - leaving room to acquire the lock twice from two
different threads like I was seeing.

Interestingly, Java 1.6 has a back compat mode you can turn on that
doesn't use the system wide lock list, and they have fixed this thread
safety issue in that impl - there is a sync across checking
and getting the lock so that it is properly thread safe - but not in
Java 1.4, 1.5.

Looking at GCC - uh ... I don't think you want to use GCC - they don't
appear to use a lock list and check for this at all :)

But the point is, this is fixable on Java 6 if we check for
OverlappingFileLockException - it *should* work across webapps, and it
is actually thread safe, unlike Java 1.4,1.5.

-- 
- Mark

http://www.lucidimagination.com




Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Chris Hostetter <ho...@fucit.org>.
: > At a minimu, shouldn't NativeFSLock.obtain() be checking for 
: > OverlappingFileLockException and treating that as a failure to acquire the 
: > lock?
	...
: Perhaps - that should make it work in more cases - but in my simple
: testing its not 100% reliable.
	...
: File locks are held on behalf of the entire Java virtual machine.
:      * They are not suitable for controlling access to a file by multiple
:      * threads within the same virtual machine.

...Grrr....  so where does that leave us?

Yonik's added comment was that "native" isnt' recommended when running 
multiple webapps in the same container.  in truth, "native" *can* 
work when running multiple webapps in the same container, just as long as 
those cotnainers don't refrence the same data dirs

I'm worried that we should recommend people avoid native altogether 
because even if you are only running one webapp, it seems like a "reload" 
or that app could trigger some similar bad behavior.

So what/how should we document all of this?

-Hoss


Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Mark Miller <ma...@gmail.com>.
Chris Hostetter wrote:
> : again. I don't think it matters if its the same FileChannel or not - you
> : just can't use Native Locks within the same JVM, as the lock is held by
> : the JVM - they are per process - so Lucene does its own little static
> : map stuff to lock within JVM (simple in memory lock tracking) and uses
> : the actual Native Lock for multiple JVMs (which is all its good for -
> : process granularity). But obviously, the in memory locking doesn't work
> : across webapps.
>
> Assuming I'm understanding all of this correctly, that implies a bug in 
> Lucene's NativeFSLockFactory when used in a multiple classloader type 
> situation -- including any app running in a servlet container.
>
> At a minimu, shouldn't NativeFSLock.obtain() be checking for 
> OverlappingFileLockException and treating that as a failure to acquire the 
> lock?
>
>
>
> -Hoss
>
>   
Perhaps - that should make it work in more cases - but in my simple
testing its not 100% reliable.

If I startup two threads and and try and get a lock (with the same
channel, with different channels) with first one thread and then the
other - sometimes it throws OverlappingFileLockException
... and sometimes it doesn't. From what I can tell, you certainly can't
count on it.

If you pause between attempts, it does appear to always work - so it
certainly would give us a lot of ground it would seem - but if they
attempts are back to back, both threads can still successfully get the lock.

This behavior could be OS dependent as its using OS level locks.

FileChannel does appear to say that this should work (though its
obviously not completely thread safe from what I can tell), but it also
says:

File locks are held on behalf of the entire Java virtual machine.
     * They are not suitable for controlling access to a file by multiple
     * threads within the same virtual machine.

Which seems to be the case.

-- 
- Mark

http://www.lucidimagination.com




Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Chris Hostetter <ho...@fucit.org>.
: again. I don't think it matters if its the same FileChannel or not - you
: just can't use Native Locks within the same JVM, as the lock is held by
: the JVM - they are per process - so Lucene does its own little static
: map stuff to lock within JVM (simple in memory lock tracking) and uses
: the actual Native Lock for multiple JVMs (which is all its good for -
: process granularity). But obviously, the in memory locking doesn't work
: across webapps.

Assuming I'm understanding all of this correctly, that implies a bug in 
Lucene's NativeFSLockFactory when used in a multiple classloader type 
situation -- including any app running in a servlet container.

At a minimu, shouldn't NativeFSLock.obtain() be checking for 
OverlappingFileLockException and treating that as a failure to acquire the 
lock?



-Hoss


Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Ah thanks - I was going by that comment :-)

On Mon, Jan 18, 2010 at 12:07 PM, Mark Miller <ma...@gmail.com> wrote:
> Mark Miller wrote:
>> Yonik Seeley wrote:
>>
>>> On Mon, Jan 18, 2010 at 1:17 AM, Chris Hostetter
>>> <ho...@fucit.org> wrote:
>>>
>>>
>>>> : Right... for stock Solr usage (i.e. as long as they don't try to lock
>>>> : the same thing.)
>>>> : It is funny that native locks always work across different processes,
>>>> : but not always in the same JVM though.
>>>>
>>>> Actaully, the more i think about this the less i understand it ... why
>>>> don't native locks "work" within the same VM? ... and by "work" i mean why
>>>> didn't he just get a lock timeout error?
>>>>
>>>>
>>> Within the same VM, you need the same FileChannel for some reason.
>>> Lucene uses a static hashmap so that multiple NativeFSLockFactory
>>> instances will end up using the same FileChannel for locking.  But
>>> multiple webapps obviously breaks that.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>> Native Locks are obtained at the JVM level - so if you try and lock the
>> same Channel twice, since the same JVM already has the lock, its granted
>> again. I don't think it matters if its the same FileChannel or not - you
>> just can't use Native Locks within the same JVM, as the lock is held by
>> the JVM - they are per process - so Lucene does its own little static
>> map stuff to lock within JVM (simple in memory lock tracking) and uses
>> the actual Native Lock for multiple JVMs (which is all its good for -
>> process granularity). But obviously, the in memory locking doesn't work
>> across webapps.
>>
>>
> Also, the javadocs in Lucene are wrong:
>
>  /*
>   * The javadocs for FileChannel state that you should have
>   * a single instance of a FileChannel (per JVM) for all
>   * locking against a given file.  To ensure this, we have
>   * a single (static) HashSet that contains the file paths
>   * of all currently locked locks.  This protects against
>   * possible cases where different Directory instances in
>   * one JVM (each with their own NativeFSLockFactory
>   * instance) have set the same lock dir and lock prefix.
>   */
>
> The javadocs for FileChannel don't say this at all - and this implies
> that Lucene is doing something that it is not. The javadocs say don't
> expect native locks to work for locking within a JVM, because it
> doesn't. And Lucene doesn't try and use the same FileChannel per JVM (it
> wouldn't help anyway) - Lucene simply attempts to track per JVM locks in
> a static map (which doesn't work per JVM when you are dealing with
> different classloaders).
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Mark Miller <ma...@gmail.com>.
Mark Miller wrote:
> Yonik Seeley wrote:
>   
>> On Mon, Jan 18, 2010 at 1:17 AM, Chris Hostetter
>> <ho...@fucit.org> wrote:
>>   
>>     
>>> : Right... for stock Solr usage (i.e. as long as they don't try to lock
>>> : the same thing.)
>>> : It is funny that native locks always work across different processes,
>>> : but not always in the same JVM though.
>>>
>>> Actaully, the more i think about this the less i understand it ... why
>>> don't native locks "work" within the same VM? ... and by "work" i mean why
>>> didn't he just get a lock timeout error?
>>>     
>>>       
>> Within the same VM, you need the same FileChannel for some reason.
>> Lucene uses a static hashmap so that multiple NativeFSLockFactory
>> instances will end up using the same FileChannel for locking.  But
>> multiple webapps obviously breaks that.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>   
>>     
> Native Locks are obtained at the JVM level - so if you try and lock the
> same Channel twice, since the same JVM already has the lock, its granted
> again. I don't think it matters if its the same FileChannel or not - you
> just can't use Native Locks within the same JVM, as the lock is held by
> the JVM - they are per process - so Lucene does its own little static
> map stuff to lock within JVM (simple in memory lock tracking) and uses
> the actual Native Lock for multiple JVMs (which is all its good for -
> process granularity). But obviously, the in memory locking doesn't work
> across webapps.
>
>   
Also, the javadocs in Lucene are wrong:

  /*
   * The javadocs for FileChannel state that you should have
   * a single instance of a FileChannel (per JVM) for all
   * locking against a given file.  To ensure this, we have
   * a single (static) HashSet that contains the file paths
   * of all currently locked locks.  This protects against
   * possible cases where different Directory instances in
   * one JVM (each with their own NativeFSLockFactory
   * instance) have set the same lock dir and lock prefix.
   */

The javadocs for FileChannel don't say this at all - and this implies
that Lucene is doing something that it is not. The javadocs say don't
expect native locks to work for locking within a JVM, because it
doesn't. And Lucene doesn't try and use the same FileChannel per JVM (it
wouldn't help anyway) - Lucene simply attempts to track per JVM locks in
a static map (which doesn't work per JVM when you are dealing with
different classloaders).

-- 
- Mark

http://www.lucidimagination.com




Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Mark Miller <ma...@gmail.com>.
Yonik Seeley wrote:
> On Mon, Jan 18, 2010 at 1:17 AM, Chris Hostetter
> <ho...@fucit.org> wrote:
>   
>> : Right... for stock Solr usage (i.e. as long as they don't try to lock
>> : the same thing.)
>> : It is funny that native locks always work across different processes,
>> : but not always in the same JVM though.
>>
>> Actaully, the more i think about this the less i understand it ... why
>> don't native locks "work" within the same VM? ... and by "work" i mean why
>> didn't he just get a lock timeout error?
>>     
>
> Within the same VM, you need the same FileChannel for some reason.
> Lucene uses a static hashmap so that multiple NativeFSLockFactory
> instances will end up using the same FileChannel for locking.  But
> multiple webapps obviously breaks that.
>
> -Yonik
> http://www.lucidimagination.com
>   
Native Locks are obtained at the JVM level - so if you try and lock the
same Channel twice, since the same JVM already has the lock, its granted
again. I don't think it matters if its the same FileChannel or not - you
just can't use Native Locks within the same JVM, as the lock is held by
the JVM - they are per process - so Lucene does its own little static
map stuff to lock within JVM (simple in memory lock tracking) and uses
the actual Native Lock for multiple JVMs (which is all its good for -
process granularity). But obviously, the in memory locking doesn't work
across webapps.

-- 
- Mark

http://www.lucidimagination.com




Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Jan 18, 2010 at 1:17 AM, Chris Hostetter
<ho...@fucit.org> wrote:
> : Right... for stock Solr usage (i.e. as long as they don't try to lock
> : the same thing.)
> : It is funny that native locks always work across different processes,
> : but not always in the same JVM though.
>
> Actaully, the more i think about this the less i understand it ... why
> don't native locks "work" within the same VM? ... and by "work" i mean why
> didn't he just get a lock timeout error?

Within the same VM, you need the same FileChannel for some reason.
Lucene uses a static hashmap so that multiple NativeFSLockFactory
instances will end up using the same FileChannel for locking.  But
multiple webapps obviously breaks that.

-Yonik
http://www.lucidimagination.com

Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Chris Hostetter <ho...@fucit.org>.
: Right... for stock Solr usage (i.e. as long as they don't try to lock
: the same thing.)
: It is funny that native locks always work across different processes,
: but not always in the same JVM though.

Actaully, the more i think about this the less i understand it ... why 
don't native locks "work" within the same VM? ... and by "work" i mean why 
didn't he just get a lock timeout error?

If the behavior of Native Locks is really that you don't get the same 
behavior if both clients are in the same JVM, then shouldn't the Lucene 
NativeLockFactory be doing something like wrapping a 
SingleInstanceLockFactory arround the NativeFSLockFactory?

: #2) native lock factory fails if it's two different Solr webapps in
: the same JVM trying to lock the same thing.
	...
: Should we clarify "Do not use with multiple solr webapps in the same
: JVM" or just remove it?

I'm starting to think we should remove support for native locks at all -- 
if it can fail in the situation of multiple wars in the same JVM trying to 
use the same solr home, that implies that it can also fail if something 
goes wrong during a "hot deploying" the solr.war ... if the shutdown of 
the older instance of solr.war fails for some reason, thentheir could be a 
stale lock, created in the same JVM, left over when the newer instance is 
brought online.

correct?


-Hoss


Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sat, Jan 16, 2010 at 3:40 PM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : doc: note about native locks not working for multiple webapps in same JVM
>
> Is this in resposne to the OverlappingFileLockException thread started by
> Joe Kessel? ...
>
> : +      native = NativeFSLockFactory  - uses OS native file locking.
> : +               Do not use with multiple solr webapps in the same JVM.
>
> I think there's a missunderstanding about the root cause of hte problem.
> There shouldn't be any inherent problem with using Native locks
> and multiple webapps

Right... for stock Solr usage (i.e. as long as they don't try to lock
the same thing.)
It is funny that native locks always work across different processes,
but not always in the same JVM though.

> -- i believe the underlying source of the exception
> was that he was using multiple webapps w/o realizing it -- so presumably
> both webapps were trying to use the same solr home dir.

Right... it's really two issues:
#1) two separate solr instances trying to use the same solr index
#2) native lock factory fails if it's two different Solr webapps in
the same JVM trying to lock the same thing.

I do recall expert level stuff like people having mutiple solr
instances pointing to the same data directory in the past though, but
not sure if it was from the same JVM or not.

Should we clarify "Do not use with multiple solr webapps in the same
JVM" or just remove it?

-Yonik
http://www.lucidimagination.com

Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

Posted by Chris Hostetter <ho...@fucit.org>.
: doc: note about native locks not working for multiple webapps in same JVM

Is this in resposne to the OverlappingFileLockException thread started by 
Joe Kessel? ...

: +      native = NativeFSLockFactory  - uses OS native file locking.
: +               Do not use with multiple solr webapps in the same JVM.

I think there's a missunderstanding about the root cause of hte problem.  
There shouldn't be any inherent problem with using Native locks 
and multiple webapps -- i believe the underlying source of the exception 
was that he was using multiple webapps w/o realizing it -- so presumably 
both webapps were trying to use the same solr home dir.


-Hoss