You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Roman Puchkovskiy <ro...@blandware.com> on 2008/07/06 22:33:04 UTC

ThreadLocal in SegmentReader

Hi.

There's a ThreadLocal field in SegmentReader (it's called termVectorsLocal).
Some value is put to it, but it's never cleared.
Is it ok? It looks like sometimes this behavior may lead to leaks.

This is the same in lucene-2.2.0 and lucene-2.3.2.

Thanks in advance.
-- 
View this message in context: http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18306230.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
Yes, calling set(null) does not seem a good fix. As for setting a reference
to termVectorsLocal to null, not sure could this help or not, as this
ThreadLocal will still be referenced by the thread (or threads).
Anyway, I will try to test this approach and post the results here.


Michael McCandless-2 wrote:
> 
> 
> Hmmm, I see... you're right.  ThreadLocal is dangerous.
> 
> So how would you recommend fixing it?
> 
> One thing we can do, in SegmentReader.close, is to call  
> termVectorsLocal.set(null).  We do this eg in FieldsReader.close,  
> which uses a ThreadLocal to hold thread-private clones of the  
> fieldsStream
> 
> However, that only works if the very same thread that had opened the  
> reader also calls close... which likely often is the case but in  
> general is not guaranteed and we should not expect/require.
> 
> How about if we set termVectorsLocal itself to null?  Will GC then "do  
> the right thing", ie, recognize (eventually) that this ThreadLocal  
> instance is no longer referenced, and clear all Objects for all  
> threads that were held in it?
> 
> Mike
> 
> Roman Puchkovskiy wrote:
> 
>>
>> Unfortunately, it's not ok sometimes. For instance, when Lucene is  
>> loaded by
>> a web-application from its WEB-INF/lib and SegmentReader is  
>> initialized
>> during the application start-up (i.e. it's initialized in the thread  
>> which
>> will never die), this causes problems with unloading of a  
>> classloader of the
>> web-application. When web-application is undeployed, there's still a
>> ThreadLocal in a thread which is external to webapp classloader, and  
>> this
>> ThreadLocal contains an object which references its class, and this  
>> class
>> was loaded through web-app classloader and hence references it... so  
>> we have
>> a chain of hard links from an alive thread to our classloader. In the
>> result, the classloader cannot be unloaded together will all its  
>> classes, so
>> memory waste is considerable.
>>
>> I've found a way to work this around by creating a new thread during  
>> webapp
>> start-up and executing code which eventually initializes Lucene  
>> indices from
>> this thread, so all spawned ThreadLocals go to this short-lived  
>> thread and
>> get garbage-collected shortly after the webapp start-up is finished.  
>> But
>> this does not seem to be a pretty solution. Besides, one has to  
>> guess that
>> ThreadLocals are the problem to invent such a work-around :)
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>>
>>> Well ... if the thread dies, the value in its ThreadLocal should be
>>> GC'd.
>>>
>>> If the thread does not die (eg thread pool in an app server) then the
>>> ThreadLocal value remains, but that value is a shallow clone of the
>>> original TermVectorsReader and should not be consuming that much RAM
>>> per thread.
>>>
>>> So I think it's OK?
>>>
>>> Mike
>>>
>>> Roman Puchkovskiy wrote:
>>>
>>>>
>>>> Hi.
>>>>
>>>> There's a ThreadLocal field in SegmentReader (it's called
>>>> termVectorsLocal).
>>>> Some value is put to it, but it's never cleared.
>>>> Is it ok? It looks like sometimes this behavior may lead to leaks.
>>>>
>>>> This is the same in lucene-2.2.0 and lucene-2.3.2.
>>>>
>>>> Thanks in advance.
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18306230.html
>>>> Sent from the Lucene - Java Users mailing list archive at  
>>>> Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18314310.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18314959.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
That is correct.

ANY library that uses thread locals is going to have the same issue,  
and in many cases the containers themselves use static thread locals.

This is just another reason why (I think) a better solution when  
using Lucene as a service is to run it in its own process, and  
communicate with the process from the web container. You have better  
memory/request control - all of the recent memory caching  
enhancements are worthless when running Lucene along side other apps  
in the same web container - future JSRs might make this feasible, but  
not yet. It allows for better process monitoring, etc. So many  
advantages - very few downsides



On Jul 14, 2008, at 8:38 AM, Roman Puchkovskiy wrote:

>
> 1. It is not obligatory for servlet container to deploy  
> applications from the
> same thread. It easily can happen, that first instance of webapp is  
> loaded
> by the main thread at container start-up, then repedloy happens  
> because a
> hot deployer discovers that WAR has changed (and this hot deployer  
> easily
> may work in another thread, not in the main one), and after that  
> the webapp
> is redeployed in response to HTTP request (Tomcat manager is  
> example, which
> is webapp itself and may redeploy other applications). Here a bunch of
> threads deploy webapp, and they are all different.
> 2. What if a user just wants to undeploy the webapp, without the  
> redeploy?
> He expects the memory to be released, but it will not be.
>
>
> Robert Engels wrote:
>>
>> If you attempting to restart the web app context, then when restarted
>> the Thread will still be the same and as it runs, new ThreadLocals
>> will be created which will cause the old to be removed, eventually
>> allowing the class loader to be unloaded.
>>
>> On Jul 14, 2008, at 12:31 AM, Roman Puchkovskiy wrote:
>>
>>>
>>> Oops, sent too early, excuse me :) One more to say: yes, Yonik is
>>> correct.
>>> Values remain strong-referenced until the stale values are removed
>>> from the
>>> table. Just a quote from the ThreadLocals' javadoc:
>>> "However, since reference queues are not used, stale entries are
>>> guaranteed
>>> to be removed only when the table starts running out of space."
>>> How do you suggest to force the removal of stale entries to the
>>> Lucene user?
>>>
>>>
>>> Robert Engels wrote:
>>>>
>>>> You are mistaken - Yonik's comment in that thread is correct
>>>> (although it is not just at table resize - any time a  
>>>> ThreadLocal is
>>>> added, and any time the ThreadLocal is not found in its direct hash
>>>> it might clear entries).
>>>>
>>>> The ThreadLocals map only has a WeakReference to the  
>>>> ThreadLocal, so
>>>> if that is the only reference, it will be GC'd - eventually, and  
>>>> then
>>>> it will be cleared as new ThreadLocals are created.
>>>>
>>>> With a static reference, the thread can reference the  
>>>> ThreadLocal at
>>>> any time, and thus the WeakReference will not be cleared.
>>>>
>>>> If the object is VERY large, and new ThreadLocals are not  
>>>> created it
>>>> could cause a problem, but I don't think that is the case with
>>>> Lucene, as  the objects stored in ThreadLocals are designed to live
>>>> for the life of the SegmentReader/IndexReader and thread.
>>>>
>>>>
>>>> On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
>>>>
>>>>>
>>>>> Well, possibly I'm mistaken, but it seems that this affects non-
>>>>> static fields
>>>>> too. Please see
>>>>> http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html
>>>>> where the
>>>>> use case is described in the details.
>>>>> In short: it seems that the scope of ThreadLocals does not matter.
>>>>> What
>>>>> really matters is that they are referenced by ThreadLocals map in
>>>>> the thread
>>>>> which is still alive.
>>>>>
>>>>>
>>>>> Robert Engels wrote:
>>>>>>
>>>>>> This is only an issue for static ThreadLocals ...
>>>>>>
>>>>>> On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:
>>>>>>
>>>>>>>
>>>>>>> The problem here is not because ThreadLocal instances are not  
>>>>>>> GC'd
>>>>>>> (they are
>>>>>>> GC'd, and your test shows this clearly).
>>>>>>> But even one instance which is not removed from its Thread is
>>>>>>> enough to
>>>>>>> prevent the classloader from being unloaded, and that's the
>>>>>>> problem.
>>>>>>>
>>>>>>>
>>>>>>> Michael McCandless-2 wrote:
>>>>>>>>
>>>>>>>> OK, I created a simple test to test this (attached).  The test
>>>>>>>> just
>>>>>>>> runs 10 threads, each one creating a 100 KB byte array which is
>>>>>>>> stored
>>>>>>>> into a ThreadLocal, and then periodically the ThreadLocal is
>>>>>>>> replaced
>>>>>>>> with a new one.  This is to test whether GC of a ThreadLocal,
>>>>>>>> even
>>>>>>>> though the thread is still alive, in fact leads to GC of the
>>>>>>>> objects
>>>>>>>> held in the ThreadLocal.
>>>>>>>>
>>>>>>>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects
>>>>>>>> are in
>>>>>>>> fact properly collected.
>>>>>>>>
>>>>>>>> So this is not a leak but rather a "delayed collection" issue.
>>>>>>>> Java's
>>>>>>>> GC is never guaranteed to be immediate, and apparently when  
>>>>>>>> using
>>>>>>>> ThreadLocals it's even less immediate than "normal".  In the
>>>>>>>> original
>>>>>>>> issue, if other things create ThreadLocals, then eventually
>>>>>>>> Lucene's
>>>>>>>> unreferenced ThreadLocals would be properly collected.
>>>>>>>>
>>>>>>>> So I think we continue to use non-static ThreadLocals in
>>>>>>>> Lucene...
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> robert engels wrote:
>>>>>>>>
>>>>>>>>> Once again, these are "static" thread locals. A completely
>>>>>>>>> different
>>>>>>>>> issue. Since the object is available statically, the weak
>>>>>>>>> reference
>>>>>>>>> cannot be cleared so stale entries will never be cleared as
>>>>>>>>> long as
>>>>>>>>> the thread is alive.
>>>>>>>>>
>>>>>>>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>>>>>>>
>>>>>>>>>> Just a few examples of "problems" using ThreadLocals.
>>>>>>>>>>
>>>>>>>>>> http://opensource.atlassian.com/projects/hibernate/browse/
>>>>>>>>>> HHH-2481
>>>>>>>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>>>>>>>
>>>>>>>>>> Once again, I'm not pointing to Lucene SegmentReader as a  
>>>>>>>>>> "bad"
>>>>>>>>>> implementation, and maybe the current "problems" of
>>>>>>>>>> ThreadLocals
>>>>>>>>>> are not a problem for SegmentReader but it seems safer to use
>>>>>>>>>> ThreadLocals to pass context information which is cleared  
>>>>>>>>>> when
>>>>>>>>>> the
>>>>>>>>>> call exits instead of storing long-lived objects.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> robert engels wrote:
>>>>>>>>>>>
>>>>>>>>>>> Aside from the pre-1.5 thread local "perceived leak", there
>>>>>>>>>>> are no
>>>>>>>>>>> issues with ThreadLocals if used properly.
>>>>>>>>>>>
>>>>>>>>>>> There is no need for try/finally blocks, unless you MUST
>>>>>>>>>>> release
>>>>>>>>>>> resources immediately, usually this is not the case,  
>>>>>>>>>>> which is
>>>>>>>>>>> why
>>>>>>>>>>> a ThreadLocal is used in the first place.
>>>>>>>>>>>
>>>>>>>>>>> From the ThreadLocalMap javadoc...
>>>>>>>>>>>
>>>>>>>>>>>  /**
>>>>>>>>>>>      * ThreadLocalMap is a customized hash map suitable
>>>>>>>>>>> only for
>>>>>>>>>>>      * maintaining thread local values. No operations are
>>>>>>>>>>> exported
>>>>>>>>>>>      * outside of the ThreadLocal class. The class is  
>>>>>>>>>>> package
>>>>>>>>>>> private to
>>>>>>>>>>>      * allow declaration of fields in class Thread.  To help
>>>>>>>>>>> deal
>>>>>>>>>>> with
>>>>>>>>>>>      * very large and long-lived usages, the hash table
>>>>>>>>>>> entries
>>>>>>>>>>> use
>>>>>>>>>>>      * WeakReferences for keys. However, since reference
>>>>>>>>>>> queues
>>>>>>>>>>> are not
>>>>>>>>>>>      * used, stale entries are guaranteed to be removed only
>>>>>>>>>>> when
>>>>>>>>>>>      * the table starts running out of space.
>>>>>>>>>>>      */
>>>>>>>>>>>
>>>>>>>>>>> /**
>>>>>>>>>>>          * Heuristically scan some cells looking for stale
>>>>>>>>>>> entries.
>>>>>>>>>>>          * This is invoked when either a new element is
>>>>>>>>>>> added, or
>>>>>>>>>>>          * another stale one has been expunged. It  
>>>>>>>>>>> performs a
>>>>>>>>>>>          * logarithmic number of scans, as a balance
>>>>>>>>>>> between no
>>>>>>>>>>>          * scanning (fast but retains garbage) and a  
>>>>>>>>>>> number of
>>>>>>>>>>> scans
>>>>>>>>>>>          * proportional to number of elements, that would
>>>>>>>>>>> find all
>>>>>>>>>>>          * garbage but would cause some insertions to take O
>>>>>>>>>>> (n)
>>>>>>>>>>> time.
>>>>>>>>>>>          *
>>>>>>>>>>>          * @param i a position known NOT to hold a stale
>>>>>>>>>>> entry.
>>>>>>>>>>> The
>>>>>>>>>>>          * scan starts at the element after i.
>>>>>>>>>>>          *
>>>>>>>>>>>          * @param n scan control: <tt>log2(n)</tt> cells are
>>>>>>>>>>> scanned,
>>>>>>>>>>>          * unless a stale entry one is found, in which case
>>>>>>>>>>>          * <tt>log2(table.length)-1</tt> additional cells  
>>>>>>>>>>> are
>>>>>>>>>>> scanned.
>>>>>>>>>>>          * When called from insertions, this parameter is  
>>>>>>>>>>> the
>>>>>>>>>>> number
>>>>>>>>>>>          * of elements, but when from replaceStaleEntry, it
>>>>>>>>>>> is the
>>>>>>>>>>>          * table length. (Note: all this could be changed
>>>>>>>>>>> to be
>>>>>>>>>>> either
>>>>>>>>>>>          * more or less aggressive by weighting n instead of
>>>>>>>>>>> just
>>>>>>>>>>>          * using straight log n. But this version is simple,
>>>>>>>>>>> fast,
>>>>>>>>>>> and
>>>>>>>>>>>          * seems to work well.)
>>>>>>>>>>>          *
>>>>>>>>>>>          * @return true if any stale entries have been
>>>>>>>>>>> removed.
>>>>>>>>>>>          */
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The instance ThreadLocals (and what the refer to) will be  
>>>>>>>>>>> GC'd
>>>>>>>>>>> when the containing Object is GC'd.
>>>>>>>>>>>
>>>>>>>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal
>>>>>>>>>>> refers
>>>>>>>>>>> to an object that has native resources (e.g. file handles),
>>>>>>>>>>> it may
>>>>>>>>>>> not be released until other thread locals are created by the
>>>>>>>>>>> thread (or the thread terminates).
>>>>>>>>>>>
>>>>>>>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>>>>>>>> applications it should never be necessary - unless a very
>>>>>>>>>>> strange
>>>>>>>>>>> usage...
>>>>>>>>>>>
>>>>>>>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>>>>>>>
>>>>>>>>>>>> From what I know, storing objects in ThreadLocal is safe as
>>>>>>>>>>>> long
>>>>>>>>>>>> as you release the object within a try {} finall {}  
>>>>>>>>>>>> block or
>>>>>>>>>>>> store objects which are independent of the rest of the code
>>>>>>>>>>>> (no
>>>>>>>>>>>> dependencies).Otherwise it can get pretty tricky(memory
>>>>>>>>>>>> leaks,
>>>>>>>>>>>> classloader problems) after awhile.
>>>>>>>>>>>>
>>>>>>>>>>>> It is pretty convenient to pass HTTP request information
>>>>>>>>>>>> with a
>>>>>>>>>>>> ThreadLocal in a servlet(but you should cleanup the  
>>>>>>>>>>>> variable
>>>>>>>>>>>> before leaving the servlet) but I'm not sure how safe it
>>>>>>>>>>>> is in
>>>>>>>>>>>> this case.
>>>>>>>>>>>>
>>>>>>>>>>>> robert engels wrote:
>>>>>>>>>>>>> Using synchronization is a poor/invalid substitute for
>>>>>>>>>>>>> thread
>>>>>>>>>>>>> locals in many cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The point of the thread local in these referenced cases
>>>>>>>>>>>>> is too
>>>>>>>>>>>>> allow streaming reads on a file descriptor. if you use a
>>>>>>>>>>>>> shared
>>>>>>>>>>>>> file descriptor/buffer you are going to continually
>>>>>>>>>>>>> invalidate
>>>>>>>>>>>>> the buffer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-
>>>>>>>>>>>>>> private instance of TermVectorsReader, to avoid
>>>>>>>>>>>>>> synchronizing
>>>>>>>>>>>>>> per-document when loading term vectors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Clearing this ThreadLocal value per call to  
>>>>>>>>>>>>>> SegmentReader's
>>>>>>>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Though, of course, we then synchronize on the underlying
>>>>>>>>>>>>>> file
>>>>>>>>>>>>>> (when using FSDirectory), so perhaps we are really not
>>>>>>>>>>>>>> saving
>>>>>>>>>>>>>> much by using ThreadLocal here.  But we are looking to
>>>>>>>>>>>>>> relax
>>>>>>>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe we could make our own ThreadLocal that just uses a
>>>>>>>>>>>>>> HashMap, which we'd have to synchronize on when  
>>>>>>>>>>>>>> getting the
>>>>>>>>>>>>>> per-
>>>>>>>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jason has suggested moving to a model where you ask the
>>>>>>>>>>>>>> IndexReader for an object that can return term vectors /
>>>>>>>>>>>>>> stored
>>>>>>>>>>>>>> fields / etc, and then you interact with that many  
>>>>>>>>>>>>>> times to
>>>>>>>>>>>>>> retrieve each doc.  We could then synchronize only on
>>>>>>>>>>>>>> retrieving that object, and provide a thread-private
>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It seems like we should move away from using  
>>>>>>>>>>>>>> ThreadLocal in
>>>>>>>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Adrian Tarau wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Usually ThreadLocal.remove() should be called at the end
>>>>>>>>>>>>>>> (in a
>>>>>>>>>>>>>>> finally block), before the current call leaves your  
>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ex : if during searching ThreadLocal is used, every  
>>>>>>>>>>>>>>> search
>>>>>>>>>>>>>>> (..)
>>>>>>>>>>>>>>> method should cleanup any ThreadLocal variables, or even
>>>>>>>>>>>>>>> deeper in the implementation. When the call leaves  
>>>>>>>>>>>>>>> Lucene
>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ThreadLocal, which we use in several places in Lucene,
>>>>>>>>>>>>>>>> causes
>>>>>>>>>>>>>>>> a leak in app servers because the classloader never  
>>>>>>>>>>>>>>>> fully
>>>>>>>>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is
>>>>>>>>>>>>>>>> holding strong references.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>>>>>>>> synchronization.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anyone have any ideas on how to solve this w/o
>>>>>>>>>>>>>>>> falling
>>>>>>>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>>>>>>>>> So now I'm confused: the SegmentReader itself  
>>>>>>>>>>>>>>>>>> should no
>>>>>>>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>>>>>>>> assuming you are not holding any references to your
>>>>>>>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Which means the ThreadLocal instance should no
>>>>>>>>>>>>>>>>>> longer be
>>>>>>>>>>>>>>>>>> reachable.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It will still be referenced from the Thread(s)
>>>>>>>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced,
>>>>>>>>>>>>>>>>> but the
>>>>>>>>>>>>>>>>> values
>>>>>>>>>>>>>>>>> (now stale) are strongly referenced and won't be
>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>> removed
>>>>>>>>>>>>>>>>> until the table is resized (under the Java6 impl at
>>>>>>>>>>>>>>>>> least).
>>>>>>>>>>>>>>>>> Nice huh?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Yonik
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ------------------------------------------------------ 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------------------------------- 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -------------------------------------------------------- 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> -------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------------------------------------------- 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> ------
>>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------- 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --
>>>>>>>>>>>>> -----
>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ----------------------------------------------------------- 
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> ----
>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> --
>>>>>>>> --
>>>>>>>> --
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>>>>>> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
>>>>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> --
>>>>>>> --
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> --
>>>>>> --
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> -- 
>>>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>>>> ThreadLocal-in-SegmentReader-tp18326720p18416841.html
>>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>> ThreadLocal-in-SegmentReader-tp18326720p18437716.html
>>> Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Fwd%3A- 
> ThreadLocal-in-SegmentReader-tp18326720p18444095.html
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
1. It is not obligatory for servlet container to deploy applications from the
same thread. It easily can happen, that first instance of webapp is loaded
by the main thread at container start-up, then repedloy happens because a
hot deployer discovers that WAR has changed (and this hot deployer easily
may work in another thread, not in the main one), and after that the webapp
is redeployed in response to HTTP request (Tomcat manager is example, which
is webapp itself and may redeploy other applications). Here a bunch of
threads deploy webapp, and they are all different.
2. What if a user just wants to undeploy the webapp, without the redeploy?
He expects the memory to be released, but it will not be.


Robert Engels wrote:
> 
> If you attempting to restart the web app context, then when restarted  
> the Thread will still be the same and as it runs, new ThreadLocals  
> will be created which will cause the old to be removed, eventually  
> allowing the class loader to be unloaded.
> 
> On Jul 14, 2008, at 12:31 AM, Roman Puchkovskiy wrote:
> 
>>
>> Oops, sent too early, excuse me :) One more to say: yes, Yonik is  
>> correct.
>> Values remain strong-referenced until the stale values are removed  
>> from the
>> table. Just a quote from the ThreadLocals' javadoc:
>> "However, since reference queues are not used, stale entries are  
>> guaranteed
>> to be removed only when the table starts running out of space."
>> How do you suggest to force the removal of stale entries to the  
>> Lucene user?
>>
>>
>> Robert Engels wrote:
>>>
>>> You are mistaken - Yonik's comment in that thread is correct
>>> (although it is not just at table resize - any time a ThreadLocal is
>>> added, and any time the ThreadLocal is not found in its direct hash
>>> it might clear entries).
>>>
>>> The ThreadLocals map only has a WeakReference to the ThreadLocal, so
>>> if that is the only reference, it will be GC'd - eventually, and then
>>> it will be cleared as new ThreadLocals are created.
>>>
>>> With a static reference, the thread can reference the ThreadLocal at
>>> any time, and thus the WeakReference will not be cleared.
>>>
>>> If the object is VERY large, and new ThreadLocals are not created it
>>> could cause a problem, but I don't think that is the case with
>>> Lucene, as  the objects stored in ThreadLocals are designed to live
>>> for the life of the SegmentReader/IndexReader and thread.
>>>
>>>
>>> On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
>>>
>>>>
>>>> Well, possibly I'm mistaken, but it seems that this affects non-
>>>> static fields
>>>> too. Please see
>>>> http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html
>>>> where the
>>>> use case is described in the details.
>>>> In short: it seems that the scope of ThreadLocals does not matter.
>>>> What
>>>> really matters is that they are referenced by ThreadLocals map in
>>>> the thread
>>>> which is still alive.
>>>>
>>>>
>>>> Robert Engels wrote:
>>>>>
>>>>> This is only an issue for static ThreadLocals ...
>>>>>
>>>>> On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:
>>>>>
>>>>>>
>>>>>> The problem here is not because ThreadLocal instances are not GC'd
>>>>>> (they are
>>>>>> GC'd, and your test shows this clearly).
>>>>>> But even one instance which is not removed from its Thread is
>>>>>> enough to
>>>>>> prevent the classloader from being unloaded, and that's the  
>>>>>> problem.
>>>>>>
>>>>>>
>>>>>> Michael McCandless-2 wrote:
>>>>>>>
>>>>>>> OK, I created a simple test to test this (attached).  The test  
>>>>>>> just
>>>>>>> runs 10 threads, each one creating a 100 KB byte array which is
>>>>>>> stored
>>>>>>> into a ThreadLocal, and then periodically the ThreadLocal is
>>>>>>> replaced
>>>>>>> with a new one.  This is to test whether GC of a ThreadLocal,  
>>>>>>> even
>>>>>>> though the thread is still alive, in fact leads to GC of the
>>>>>>> objects
>>>>>>> held in the ThreadLocal.
>>>>>>>
>>>>>>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects
>>>>>>> are in
>>>>>>> fact properly collected.
>>>>>>>
>>>>>>> So this is not a leak but rather a "delayed collection" issue.
>>>>>>> Java's
>>>>>>> GC is never guaranteed to be immediate, and apparently when using
>>>>>>> ThreadLocals it's even less immediate than "normal".  In the
>>>>>>> original
>>>>>>> issue, if other things create ThreadLocals, then eventually
>>>>>>> Lucene's
>>>>>>> unreferenced ThreadLocals would be properly collected.
>>>>>>>
>>>>>>> So I think we continue to use non-static ThreadLocals in  
>>>>>>> Lucene...
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> robert engels wrote:
>>>>>>>
>>>>>>>> Once again, these are "static" thread locals. A completely
>>>>>>>> different
>>>>>>>> issue. Since the object is available statically, the weak
>>>>>>>> reference
>>>>>>>> cannot be cleared so stale entries will never be cleared as
>>>>>>>> long as
>>>>>>>> the thread is alive.
>>>>>>>>
>>>>>>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>>>>>>
>>>>>>>>> Just a few examples of "problems" using ThreadLocals.
>>>>>>>>>
>>>>>>>>> http://opensource.atlassian.com/projects/hibernate/browse/
>>>>>>>>> HHH-2481
>>>>>>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>>>>>>
>>>>>>>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"
>>>>>>>>> implementation, and maybe the current "problems" of  
>>>>>>>>> ThreadLocals
>>>>>>>>> are not a problem for SegmentReader but it seems safer to use
>>>>>>>>> ThreadLocals to pass context information which is cleared when
>>>>>>>>> the
>>>>>>>>> call exits instead of storing long-lived objects.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> robert engels wrote:
>>>>>>>>>>
>>>>>>>>>> Aside from the pre-1.5 thread local "perceived leak", there
>>>>>>>>>> are no
>>>>>>>>>> issues with ThreadLocals if used properly.
>>>>>>>>>>
>>>>>>>>>> There is no need for try/finally blocks, unless you MUST  
>>>>>>>>>> release
>>>>>>>>>> resources immediately, usually this is not the case, which is
>>>>>>>>>> why
>>>>>>>>>> a ThreadLocal is used in the first place.
>>>>>>>>>>
>>>>>>>>>> From the ThreadLocalMap javadoc...
>>>>>>>>>>
>>>>>>>>>>  /**
>>>>>>>>>>      * ThreadLocalMap is a customized hash map suitable  
>>>>>>>>>> only for
>>>>>>>>>>      * maintaining thread local values. No operations are
>>>>>>>>>> exported
>>>>>>>>>>      * outside of the ThreadLocal class. The class is package
>>>>>>>>>> private to
>>>>>>>>>>      * allow declaration of fields in class Thread.  To help
>>>>>>>>>> deal
>>>>>>>>>> with
>>>>>>>>>>      * very large and long-lived usages, the hash table  
>>>>>>>>>> entries
>>>>>>>>>> use
>>>>>>>>>>      * WeakReferences for keys. However, since reference  
>>>>>>>>>> queues
>>>>>>>>>> are not
>>>>>>>>>>      * used, stale entries are guaranteed to be removed only
>>>>>>>>>> when
>>>>>>>>>>      * the table starts running out of space.
>>>>>>>>>>      */
>>>>>>>>>>
>>>>>>>>>> /**
>>>>>>>>>>          * Heuristically scan some cells looking for stale
>>>>>>>>>> entries.
>>>>>>>>>>          * This is invoked when either a new element is
>>>>>>>>>> added, or
>>>>>>>>>>          * another stale one has been expunged. It performs a
>>>>>>>>>>          * logarithmic number of scans, as a balance  
>>>>>>>>>> between no
>>>>>>>>>>          * scanning (fast but retains garbage) and a number of
>>>>>>>>>> scans
>>>>>>>>>>          * proportional to number of elements, that would
>>>>>>>>>> find all
>>>>>>>>>>          * garbage but would cause some insertions to take O 
>>>>>>>>>> (n)
>>>>>>>>>> time.
>>>>>>>>>>          *
>>>>>>>>>>          * @param i a position known NOT to hold a stale  
>>>>>>>>>> entry.
>>>>>>>>>> The
>>>>>>>>>>          * scan starts at the element after i.
>>>>>>>>>>          *
>>>>>>>>>>          * @param n scan control: <tt>log2(n)</tt> cells are
>>>>>>>>>> scanned,
>>>>>>>>>>          * unless a stale entry one is found, in which case
>>>>>>>>>>          * <tt>log2(table.length)-1</tt> additional cells are
>>>>>>>>>> scanned.
>>>>>>>>>>          * When called from insertions, this parameter is the
>>>>>>>>>> number
>>>>>>>>>>          * of elements, but when from replaceStaleEntry, it
>>>>>>>>>> is the
>>>>>>>>>>          * table length. (Note: all this could be changed  
>>>>>>>>>> to be
>>>>>>>>>> either
>>>>>>>>>>          * more or less aggressive by weighting n instead of
>>>>>>>>>> just
>>>>>>>>>>          * using straight log n. But this version is simple,
>>>>>>>>>> fast,
>>>>>>>>>> and
>>>>>>>>>>          * seems to work well.)
>>>>>>>>>>          *
>>>>>>>>>>          * @return true if any stale entries have been  
>>>>>>>>>> removed.
>>>>>>>>>>          */
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The instance ThreadLocals (and what the refer to) will be GC'd
>>>>>>>>>> when the containing Object is GC'd.
>>>>>>>>>>
>>>>>>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal
>>>>>>>>>> refers
>>>>>>>>>> to an object that has native resources (e.g. file handles),
>>>>>>>>>> it may
>>>>>>>>>> not be released until other thread locals are created by the
>>>>>>>>>> thread (or the thread terminates).
>>>>>>>>>>
>>>>>>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>>>>>>> applications it should never be necessary - unless a very
>>>>>>>>>> strange
>>>>>>>>>> usage...
>>>>>>>>>>
>>>>>>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>>>>>>
>>>>>>>>>>> From what I know, storing objects in ThreadLocal is safe as
>>>>>>>>>>> long
>>>>>>>>>>> as you release the object within a try {} finall {} block or
>>>>>>>>>>> store objects which are independent of the rest of the code 
>>>>>>>>>>> (no
>>>>>>>>>>> dependencies).Otherwise it can get pretty tricky(memory  
>>>>>>>>>>> leaks,
>>>>>>>>>>> classloader problems) after awhile.
>>>>>>>>>>>
>>>>>>>>>>> It is pretty convenient to pass HTTP request information  
>>>>>>>>>>> with a
>>>>>>>>>>> ThreadLocal in a servlet(but you should cleanup the variable
>>>>>>>>>>> before leaving the servlet) but I'm not sure how safe it  
>>>>>>>>>>> is in
>>>>>>>>>>> this case.
>>>>>>>>>>>
>>>>>>>>>>> robert engels wrote:
>>>>>>>>>>>> Using synchronization is a poor/invalid substitute for  
>>>>>>>>>>>> thread
>>>>>>>>>>>> locals in many cases.
>>>>>>>>>>>>
>>>>>>>>>>>> The point of the thread local in these referenced cases  
>>>>>>>>>>>> is too
>>>>>>>>>>>> allow streaming reads on a file descriptor. if you use a
>>>>>>>>>>>> shared
>>>>>>>>>>>> file descriptor/buffer you are going to continually  
>>>>>>>>>>>> invalidate
>>>>>>>>>>>> the buffer.
>>>>>>>>>>>>
>>>>>>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-
>>>>>>>>>>>>> private instance of TermVectorsReader, to avoid  
>>>>>>>>>>>>> synchronizing
>>>>>>>>>>>>> per-document when loading term vectors.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's
>>>>>>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Though, of course, we then synchronize on the underlying  
>>>>>>>>>>>>> file
>>>>>>>>>>>>> (when using FSDirectory), so perhaps we are really not  
>>>>>>>>>>>>> saving
>>>>>>>>>>>>> much by using ThreadLocal here.  But we are looking to  
>>>>>>>>>>>>> relax
>>>>>>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe we could make our own ThreadLocal that just uses a
>>>>>>>>>>>>> HashMap, which we'd have to synchronize on when getting the
>>>>>>>>>>>>> per-
>>>>>>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jason has suggested moving to a model where you ask the
>>>>>>>>>>>>> IndexReader for an object that can return term vectors /
>>>>>>>>>>>>> stored
>>>>>>>>>>>>> fields / etc, and then you interact with that many times to
>>>>>>>>>>>>> retrieve each doc.  We could then synchronize only on
>>>>>>>>>>>>> retrieving that object, and provide a thread-private
>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It seems like we should move away from using ThreadLocal in
>>>>>>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>
>>>>>>>>>>>>> Adrian Tarau wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Usually ThreadLocal.remove() should be called at the end
>>>>>>>>>>>>>> (in a
>>>>>>>>>>>>>> finally block), before the current call leaves your code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ex : if during searching ThreadLocal is used, every search
>>>>>>>>>>>>>> (..)
>>>>>>>>>>>>>> method should cleanup any ThreadLocal variables, or even
>>>>>>>>>>>>>> deeper in the implementation. When the call leaves Lucene
>>>>>>>>>>>>>> any
>>>>>>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ThreadLocal, which we use in several places in Lucene,
>>>>>>>>>>>>>>> causes
>>>>>>>>>>>>>>> a leak in app servers because the classloader never fully
>>>>>>>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is
>>>>>>>>>>>>>>> holding strong references.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>>>>>>> synchronization.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anyone have any ideas on how to solve this w/o  
>>>>>>>>>>>>>>> falling
>>>>>>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no
>>>>>>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>>>>>>> assuming you are not holding any references to your
>>>>>>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Which means the ThreadLocal instance should no  
>>>>>>>>>>>>>>>>> longer be
>>>>>>>>>>>>>>>>> reachable.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It will still be referenced from the Thread(s)
>>>>>>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced,
>>>>>>>>>>>>>>>> but the
>>>>>>>>>>>>>>>> values
>>>>>>>>>>>>>>>> (now stale) are strongly referenced and won't be  
>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>> removed
>>>>>>>>>>>>>>>> until the table is resized (under the Java6 impl at
>>>>>>>>>>>>>>>> least).
>>>>>>>>>>>>>>>> Nice huh?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Yonik
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -------------------------------------------------------- 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------------------------------------- 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------------------------------------------------------- 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> -------
>>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----------------------------------------------------------- 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --
>>>>>>>>>>>>> ------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> -----
>>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> ----
>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------- 
>>>>>>> --
>>>>>>> --
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>>>>> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
>>>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------ 
>>>>>> --
>>>>>> -
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>>> ThreadLocal-in-SegmentReader-tp18326720p18416841.html
>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>> Nabble.com.
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Fwd%3A- 
>> ThreadLocal-in-SegmentReader-tp18326720p18437716.html
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fwd%3A-ThreadLocal-in-SegmentReader-tp18326720p18444095.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
If you attempting to restart the web app context, then when restarted  
the Thread will still be the same and as it runs, new ThreadLocals  
will be created which will cause the old to be removed, eventually  
allowing the class loader to be unloaded.

On Jul 14, 2008, at 12:31 AM, Roman Puchkovskiy wrote:

>
> Oops, sent too early, excuse me :) One more to say: yes, Yonik is  
> correct.
> Values remain strong-referenced until the stale values are removed  
> from the
> table. Just a quote from the ThreadLocals' javadoc:
> "However, since reference queues are not used, stale entries are  
> guaranteed
> to be removed only when the table starts running out of space."
> How do you suggest to force the removal of stale entries to the  
> Lucene user?
>
>
> Robert Engels wrote:
>>
>> You are mistaken - Yonik's comment in that thread is correct
>> (although it is not just at table resize - any time a ThreadLocal is
>> added, and any time the ThreadLocal is not found in its direct hash
>> it might clear entries).
>>
>> The ThreadLocals map only has a WeakReference to the ThreadLocal, so
>> if that is the only reference, it will be GC'd - eventually, and then
>> it will be cleared as new ThreadLocals are created.
>>
>> With a static reference, the thread can reference the ThreadLocal at
>> any time, and thus the WeakReference will not be cleared.
>>
>> If the object is VERY large, and new ThreadLocals are not created it
>> could cause a problem, but I don't think that is the case with
>> Lucene, as  the objects stored in ThreadLocals are designed to live
>> for the life of the SegmentReader/IndexReader and thread.
>>
>>
>> On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
>>
>>>
>>> Well, possibly I'm mistaken, but it seems that this affects non-
>>> static fields
>>> too. Please see
>>> http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html
>>> where the
>>> use case is described in the details.
>>> In short: it seems that the scope of ThreadLocals does not matter.
>>> What
>>> really matters is that they are referenced by ThreadLocals map in
>>> the thread
>>> which is still alive.
>>>
>>>
>>> Robert Engels wrote:
>>>>
>>>> This is only an issue for static ThreadLocals ...
>>>>
>>>> On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:
>>>>
>>>>>
>>>>> The problem here is not because ThreadLocal instances are not GC'd
>>>>> (they are
>>>>> GC'd, and your test shows this clearly).
>>>>> But even one instance which is not removed from its Thread is
>>>>> enough to
>>>>> prevent the classloader from being unloaded, and that's the  
>>>>> problem.
>>>>>
>>>>>
>>>>> Michael McCandless-2 wrote:
>>>>>>
>>>>>> OK, I created a simple test to test this (attached).  The test  
>>>>>> just
>>>>>> runs 10 threads, each one creating a 100 KB byte array which is
>>>>>> stored
>>>>>> into a ThreadLocal, and then periodically the ThreadLocal is
>>>>>> replaced
>>>>>> with a new one.  This is to test whether GC of a ThreadLocal,  
>>>>>> even
>>>>>> though the thread is still alive, in fact leads to GC of the
>>>>>> objects
>>>>>> held in the ThreadLocal.
>>>>>>
>>>>>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects
>>>>>> are in
>>>>>> fact properly collected.
>>>>>>
>>>>>> So this is not a leak but rather a "delayed collection" issue.
>>>>>> Java's
>>>>>> GC is never guaranteed to be immediate, and apparently when using
>>>>>> ThreadLocals it's even less immediate than "normal".  In the
>>>>>> original
>>>>>> issue, if other things create ThreadLocals, then eventually
>>>>>> Lucene's
>>>>>> unreferenced ThreadLocals would be properly collected.
>>>>>>
>>>>>> So I think we continue to use non-static ThreadLocals in  
>>>>>> Lucene...
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> robert engels wrote:
>>>>>>
>>>>>>> Once again, these are "static" thread locals. A completely
>>>>>>> different
>>>>>>> issue. Since the object is available statically, the weak
>>>>>>> reference
>>>>>>> cannot be cleared so stale entries will never be cleared as
>>>>>>> long as
>>>>>>> the thread is alive.
>>>>>>>
>>>>>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>>>>>
>>>>>>>> Just a few examples of "problems" using ThreadLocals.
>>>>>>>>
>>>>>>>> http://opensource.atlassian.com/projects/hibernate/browse/
>>>>>>>> HHH-2481
>>>>>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>>>>>
>>>>>>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"
>>>>>>>> implementation, and maybe the current "problems" of  
>>>>>>>> ThreadLocals
>>>>>>>> are not a problem for SegmentReader but it seems safer to use
>>>>>>>> ThreadLocals to pass context information which is cleared when
>>>>>>>> the
>>>>>>>> call exits instead of storing long-lived objects.
>>>>>>>>
>>>>>>>>
>>>>>>>> robert engels wrote:
>>>>>>>>>
>>>>>>>>> Aside from the pre-1.5 thread local "perceived leak", there
>>>>>>>>> are no
>>>>>>>>> issues with ThreadLocals if used properly.
>>>>>>>>>
>>>>>>>>> There is no need for try/finally blocks, unless you MUST  
>>>>>>>>> release
>>>>>>>>> resources immediately, usually this is not the case, which is
>>>>>>>>> why
>>>>>>>>> a ThreadLocal is used in the first place.
>>>>>>>>>
>>>>>>>>> From the ThreadLocalMap javadoc...
>>>>>>>>>
>>>>>>>>>  /**
>>>>>>>>>      * ThreadLocalMap is a customized hash map suitable  
>>>>>>>>> only for
>>>>>>>>>      * maintaining thread local values. No operations are
>>>>>>>>> exported
>>>>>>>>>      * outside of the ThreadLocal class. The class is package
>>>>>>>>> private to
>>>>>>>>>      * allow declaration of fields in class Thread.  To help
>>>>>>>>> deal
>>>>>>>>> with
>>>>>>>>>      * very large and long-lived usages, the hash table  
>>>>>>>>> entries
>>>>>>>>> use
>>>>>>>>>      * WeakReferences for keys. However, since reference  
>>>>>>>>> queues
>>>>>>>>> are not
>>>>>>>>>      * used, stale entries are guaranteed to be removed only
>>>>>>>>> when
>>>>>>>>>      * the table starts running out of space.
>>>>>>>>>      */
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>>          * Heuristically scan some cells looking for stale
>>>>>>>>> entries.
>>>>>>>>>          * This is invoked when either a new element is
>>>>>>>>> added, or
>>>>>>>>>          * another stale one has been expunged. It performs a
>>>>>>>>>          * logarithmic number of scans, as a balance  
>>>>>>>>> between no
>>>>>>>>>          * scanning (fast but retains garbage) and a number of
>>>>>>>>> scans
>>>>>>>>>          * proportional to number of elements, that would
>>>>>>>>> find all
>>>>>>>>>          * garbage but would cause some insertions to take O 
>>>>>>>>> (n)
>>>>>>>>> time.
>>>>>>>>>          *
>>>>>>>>>          * @param i a position known NOT to hold a stale  
>>>>>>>>> entry.
>>>>>>>>> The
>>>>>>>>>          * scan starts at the element after i.
>>>>>>>>>          *
>>>>>>>>>          * @param n scan control: <tt>log2(n)</tt> cells are
>>>>>>>>> scanned,
>>>>>>>>>          * unless a stale entry one is found, in which case
>>>>>>>>>          * <tt>log2(table.length)-1</tt> additional cells are
>>>>>>>>> scanned.
>>>>>>>>>          * When called from insertions, this parameter is the
>>>>>>>>> number
>>>>>>>>>          * of elements, but when from replaceStaleEntry, it
>>>>>>>>> is the
>>>>>>>>>          * table length. (Note: all this could be changed  
>>>>>>>>> to be
>>>>>>>>> either
>>>>>>>>>          * more or less aggressive by weighting n instead of
>>>>>>>>> just
>>>>>>>>>          * using straight log n. But this version is simple,
>>>>>>>>> fast,
>>>>>>>>> and
>>>>>>>>>          * seems to work well.)
>>>>>>>>>          *
>>>>>>>>>          * @return true if any stale entries have been  
>>>>>>>>> removed.
>>>>>>>>>          */
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The instance ThreadLocals (and what the refer to) will be GC'd
>>>>>>>>> when the containing Object is GC'd.
>>>>>>>>>
>>>>>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal
>>>>>>>>> refers
>>>>>>>>> to an object that has native resources (e.g. file handles),
>>>>>>>>> it may
>>>>>>>>> not be released until other thread locals are created by the
>>>>>>>>> thread (or the thread terminates).
>>>>>>>>>
>>>>>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>>>>>> applications it should never be necessary - unless a very
>>>>>>>>> strange
>>>>>>>>> usage...
>>>>>>>>>
>>>>>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>>>>>
>>>>>>>>>> From what I know, storing objects in ThreadLocal is safe as
>>>>>>>>>> long
>>>>>>>>>> as you release the object within a try {} finall {} block or
>>>>>>>>>> store objects which are independent of the rest of the code 
>>>>>>>>>> (no
>>>>>>>>>> dependencies).Otherwise it can get pretty tricky(memory  
>>>>>>>>>> leaks,
>>>>>>>>>> classloader problems) after awhile.
>>>>>>>>>>
>>>>>>>>>> It is pretty convenient to pass HTTP request information  
>>>>>>>>>> with a
>>>>>>>>>> ThreadLocal in a servlet(but you should cleanup the variable
>>>>>>>>>> before leaving the servlet) but I'm not sure how safe it  
>>>>>>>>>> is in
>>>>>>>>>> this case.
>>>>>>>>>>
>>>>>>>>>> robert engels wrote:
>>>>>>>>>>> Using synchronization is a poor/invalid substitute for  
>>>>>>>>>>> thread
>>>>>>>>>>> locals in many cases.
>>>>>>>>>>>
>>>>>>>>>>> The point of the thread local in these referenced cases  
>>>>>>>>>>> is too
>>>>>>>>>>> allow streaming reads on a file descriptor. if you use a
>>>>>>>>>>> shared
>>>>>>>>>>> file descriptor/buffer you are going to continually  
>>>>>>>>>>> invalidate
>>>>>>>>>>> the buffer.
>>>>>>>>>>>
>>>>>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-
>>>>>>>>>>>> private instance of TermVectorsReader, to avoid  
>>>>>>>>>>>> synchronizing
>>>>>>>>>>>> per-document when loading term vectors.
>>>>>>>>>>>>
>>>>>>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's
>>>>>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>>>>>
>>>>>>>>>>>> Though, of course, we then synchronize on the underlying  
>>>>>>>>>>>> file
>>>>>>>>>>>> (when using FSDirectory), so perhaps we are really not  
>>>>>>>>>>>> saving
>>>>>>>>>>>> much by using ThreadLocal here.  But we are looking to  
>>>>>>>>>>>> relax
>>>>>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe we could make our own ThreadLocal that just uses a
>>>>>>>>>>>> HashMap, which we'd have to synchronize on when getting the
>>>>>>>>>>>> per-
>>>>>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>>>>>
>>>>>>>>>>>> Jason has suggested moving to a model where you ask the
>>>>>>>>>>>> IndexReader for an object that can return term vectors /
>>>>>>>>>>>> stored
>>>>>>>>>>>> fields / etc, and then you interact with that many times to
>>>>>>>>>>>> retrieve each doc.  We could then synchronize only on
>>>>>>>>>>>> retrieving that object, and provide a thread-private
>>>>>>>>>>>> instance.
>>>>>>>>>>>>
>>>>>>>>>>>> It seems like we should move away from using ThreadLocal in
>>>>>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>>>>>
>>>>>>>>>>>> Mike
>>>>>>>>>>>>
>>>>>>>>>>>> Adrian Tarau wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Usually ThreadLocal.remove() should be called at the end
>>>>>>>>>>>>> (in a
>>>>>>>>>>>>> finally block), before the current call leaves your code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ex : if during searching ThreadLocal is used, every search
>>>>>>>>>>>>> (..)
>>>>>>>>>>>>> method should cleanup any ThreadLocal variables, or even
>>>>>>>>>>>>> deeper in the implementation. When the call leaves Lucene
>>>>>>>>>>>>> any
>>>>>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ThreadLocal, which we use in several places in Lucene,
>>>>>>>>>>>>>> causes
>>>>>>>>>>>>>> a leak in app servers because the classloader never fully
>>>>>>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is
>>>>>>>>>>>>>> holding strong references.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>>>>>> synchronization.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anyone have any ideas on how to solve this w/o  
>>>>>>>>>>>>>> falling
>>>>>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no
>>>>>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>>>>>> assuming you are not holding any references to your
>>>>>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Which means the ThreadLocal instance should no  
>>>>>>>>>>>>>>>> longer be
>>>>>>>>>>>>>>>> reachable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It will still be referenced from the Thread(s)
>>>>>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced,
>>>>>>>>>>>>>>> but the
>>>>>>>>>>>>>>> values
>>>>>>>>>>>>>>> (now stale) are strongly referenced and won't be  
>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>> removed
>>>>>>>>>>>>>>> until the table is resized (under the Java6 impl at
>>>>>>>>>>>>>>> least).
>>>>>>>>>>>>>>> Nice huh?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Yonik
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -------------------------------------------------------- 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------------------------------------------- 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------- 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --
>>>>>>>>>>>>> -------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ----------------------------------------------------------- 
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> ------
>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> -----
>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> ----
>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> --
>>>>>> --
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>> -- 
>>>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>>>> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
>>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>> ThreadLocal-in-SegmentReader-tp18326720p18416841.html
>>> Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Fwd%3A- 
> ThreadLocal-in-SegmentReader-tp18326720p18437716.html
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
Oops, sent too early, excuse me :) One more to say: yes, Yonik is correct.
Values remain strong-referenced until the stale values are removed from the
table. Just a quote from the ThreadLocals' javadoc:
"However, since reference queues are not used, stale entries are guaranteed
to be removed only when the table starts running out of space."
How do you suggest to force the removal of stale entries to the Lucene user?


Robert Engels wrote:
> 
> You are mistaken - Yonik's comment in that thread is correct  
> (although it is not just at table resize - any time a ThreadLocal is  
> added, and any time the ThreadLocal is not found in its direct hash  
> it might clear entries).
> 
> The ThreadLocals map only has a WeakReference to the ThreadLocal, so  
> if that is the only reference, it will be GC'd - eventually, and then  
> it will be cleared as new ThreadLocals are created.
> 
> With a static reference, the thread can reference the ThreadLocal at  
> any time, and thus the WeakReference will not be cleared.
> 
> If the object is VERY large, and new ThreadLocals are not created it  
> could cause a problem, but I don't think that is the case with  
> Lucene, as  the objects stored in ThreadLocals are designed to live  
> for the life of the SegmentReader/IndexReader and thread.
> 
> 
> On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
> 
>>
>> Well, possibly I'm mistaken, but it seems that this affects non- 
>> static fields
>> too. Please see
>> http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html  
>> where the
>> use case is described in the details.
>> In short: it seems that the scope of ThreadLocals does not matter.  
>> What
>> really matters is that they are referenced by ThreadLocals map in  
>> the thread
>> which is still alive.
>>
>>
>> Robert Engels wrote:
>>>
>>> This is only an issue for static ThreadLocals ...
>>>
>>> On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:
>>>
>>>>
>>>> The problem here is not because ThreadLocal instances are not GC'd
>>>> (they are
>>>> GC'd, and your test shows this clearly).
>>>> But even one instance which is not removed from its Thread is
>>>> enough to
>>>> prevent the classloader from being unloaded, and that's the problem.
>>>>
>>>>
>>>> Michael McCandless-2 wrote:
>>>>>
>>>>> OK, I created a simple test to test this (attached).  The test just
>>>>> runs 10 threads, each one creating a 100 KB byte array which is
>>>>> stored
>>>>> into a ThreadLocal, and then periodically the ThreadLocal is  
>>>>> replaced
>>>>> with a new one.  This is to test whether GC of a ThreadLocal, even
>>>>> though the thread is still alive, in fact leads to GC of the  
>>>>> objects
>>>>> held in the ThreadLocal.
>>>>>
>>>>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects  
>>>>> are in
>>>>> fact properly collected.
>>>>>
>>>>> So this is not a leak but rather a "delayed collection" issue.
>>>>> Java's
>>>>> GC is never guaranteed to be immediate, and apparently when using
>>>>> ThreadLocals it's even less immediate than "normal".  In the  
>>>>> original
>>>>> issue, if other things create ThreadLocals, then eventually  
>>>>> Lucene's
>>>>> unreferenced ThreadLocals would be properly collected.
>>>>>
>>>>> So I think we continue to use non-static ThreadLocals in Lucene...
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> robert engels wrote:
>>>>>
>>>>>> Once again, these are "static" thread locals. A completely  
>>>>>> different
>>>>>> issue. Since the object is available statically, the weak  
>>>>>> reference
>>>>>> cannot be cleared so stale entries will never be cleared as  
>>>>>> long as
>>>>>> the thread is alive.
>>>>>>
>>>>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>>>>
>>>>>>> Just a few examples of "problems" using ThreadLocals.
>>>>>>>
>>>>>>> http://opensource.atlassian.com/projects/hibernate/browse/ 
>>>>>>> HHH-2481
>>>>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>>>>
>>>>>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"
>>>>>>> implementation, and maybe the current "problems" of ThreadLocals
>>>>>>> are not a problem for SegmentReader but it seems safer to use
>>>>>>> ThreadLocals to pass context information which is cleared when  
>>>>>>> the
>>>>>>> call exits instead of storing long-lived objects.
>>>>>>>
>>>>>>>
>>>>>>> robert engels wrote:
>>>>>>>>
>>>>>>>> Aside from the pre-1.5 thread local "perceived leak", there  
>>>>>>>> are no
>>>>>>>> issues with ThreadLocals if used properly.
>>>>>>>>
>>>>>>>> There is no need for try/finally blocks, unless you MUST release
>>>>>>>> resources immediately, usually this is not the case, which is  
>>>>>>>> why
>>>>>>>> a ThreadLocal is used in the first place.
>>>>>>>>
>>>>>>>> From the ThreadLocalMap javadoc...
>>>>>>>>
>>>>>>>>  /**
>>>>>>>>      * ThreadLocalMap is a customized hash map suitable only for
>>>>>>>>      * maintaining thread local values. No operations are  
>>>>>>>> exported
>>>>>>>>      * outside of the ThreadLocal class. The class is package
>>>>>>>> private to
>>>>>>>>      * allow declaration of fields in class Thread.  To help  
>>>>>>>> deal
>>>>>>>> with
>>>>>>>>      * very large and long-lived usages, the hash table entries
>>>>>>>> use
>>>>>>>>      * WeakReferences for keys. However, since reference queues
>>>>>>>> are not
>>>>>>>>      * used, stale entries are guaranteed to be removed only  
>>>>>>>> when
>>>>>>>>      * the table starts running out of space.
>>>>>>>>      */
>>>>>>>>
>>>>>>>> /**
>>>>>>>>          * Heuristically scan some cells looking for stale
>>>>>>>> entries.
>>>>>>>>          * This is invoked when either a new element is  
>>>>>>>> added, or
>>>>>>>>          * another stale one has been expunged. It performs a
>>>>>>>>          * logarithmic number of scans, as a balance between no
>>>>>>>>          * scanning (fast but retains garbage) and a number of
>>>>>>>> scans
>>>>>>>>          * proportional to number of elements, that would  
>>>>>>>> find all
>>>>>>>>          * garbage but would cause some insertions to take O(n)
>>>>>>>> time.
>>>>>>>>          *
>>>>>>>>          * @param i a position known NOT to hold a stale entry.
>>>>>>>> The
>>>>>>>>          * scan starts at the element after i.
>>>>>>>>          *
>>>>>>>>          * @param n scan control: <tt>log2(n)</tt> cells are
>>>>>>>> scanned,
>>>>>>>>          * unless a stale entry one is found, in which case
>>>>>>>>          * <tt>log2(table.length)-1</tt> additional cells are
>>>>>>>> scanned.
>>>>>>>>          * When called from insertions, this parameter is the
>>>>>>>> number
>>>>>>>>          * of elements, but when from replaceStaleEntry, it  
>>>>>>>> is the
>>>>>>>>          * table length. (Note: all this could be changed to be
>>>>>>>> either
>>>>>>>>          * more or less aggressive by weighting n instead of  
>>>>>>>> just
>>>>>>>>          * using straight log n. But this version is simple,  
>>>>>>>> fast,
>>>>>>>> and
>>>>>>>>          * seems to work well.)
>>>>>>>>          *
>>>>>>>>          * @return true if any stale entries have been removed.
>>>>>>>>          */
>>>>>>>>
>>>>>>>>
>>>>>>>> The instance ThreadLocals (and what the refer to) will be GC'd
>>>>>>>> when the containing Object is GC'd.
>>>>>>>>
>>>>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal  
>>>>>>>> refers
>>>>>>>> to an object that has native resources (e.g. file handles),  
>>>>>>>> it may
>>>>>>>> not be released until other thread locals are created by the
>>>>>>>> thread (or the thread terminates).
>>>>>>>>
>>>>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>>>>> applications it should never be necessary - unless a very  
>>>>>>>> strange
>>>>>>>> usage...
>>>>>>>>
>>>>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>>>>
>>>>>>>>> From what I know, storing objects in ThreadLocal is safe as  
>>>>>>>>> long
>>>>>>>>> as you release the object within a try {} finall {} block or
>>>>>>>>> store objects which are independent of the rest of the code(no
>>>>>>>>> dependencies).Otherwise it can get pretty tricky(memory leaks,
>>>>>>>>> classloader problems) after awhile.
>>>>>>>>>
>>>>>>>>> It is pretty convenient to pass HTTP request information with a
>>>>>>>>> ThreadLocal in a servlet(but you should cleanup the variable
>>>>>>>>> before leaving the servlet) but I'm not sure how safe it is in
>>>>>>>>> this case.
>>>>>>>>>
>>>>>>>>> robert engels wrote:
>>>>>>>>>> Using synchronization is a poor/invalid substitute for thread
>>>>>>>>>> locals in many cases.
>>>>>>>>>>
>>>>>>>>>> The point of the thread local in these referenced cases is too
>>>>>>>>>> allow streaming reads on a file descriptor. if you use a  
>>>>>>>>>> shared
>>>>>>>>>> file descriptor/buffer you are going to continually invalidate
>>>>>>>>>> the buffer.
>>>>>>>>>>
>>>>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-
>>>>>>>>>>> private instance of TermVectorsReader, to avoid synchronizing
>>>>>>>>>>> per-document when loading term vectors.
>>>>>>>>>>>
>>>>>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's
>>>>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>>>>
>>>>>>>>>>> Though, of course, we then synchronize on the underlying file
>>>>>>>>>>> (when using FSDirectory), so perhaps we are really not saving
>>>>>>>>>>> much by using ThreadLocal here.  But we are looking to relax
>>>>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>>>>
>>>>>>>>>>> Maybe we could make our own ThreadLocal that just uses a
>>>>>>>>>>> HashMap, which we'd have to synchronize on when getting the
>>>>>>>>>>> per-
>>>>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>>>>
>>>>>>>>>>> Jason has suggested moving to a model where you ask the
>>>>>>>>>>> IndexReader for an object that can return term vectors /  
>>>>>>>>>>> stored
>>>>>>>>>>> fields / etc, and then you interact with that many times to
>>>>>>>>>>> retrieve each doc.  We could then synchronize only on
>>>>>>>>>>> retrieving that object, and provide a thread-private  
>>>>>>>>>>> instance.
>>>>>>>>>>>
>>>>>>>>>>> It seems like we should move away from using ThreadLocal in
>>>>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>>>>
>>>>>>>>>>> Mike
>>>>>>>>>>>
>>>>>>>>>>> Adrian Tarau wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Usually ThreadLocal.remove() should be called at the end 
>>>>>>>>>>>> (in a
>>>>>>>>>>>> finally block), before the current call leaves your code.
>>>>>>>>>>>>
>>>>>>>>>>>> Ex : if during searching ThreadLocal is used, every search 
>>>>>>>>>>>> (..)
>>>>>>>>>>>> method should cleanup any ThreadLocal variables, or even
>>>>>>>>>>>> deeper in the implementation. When the call leaves Lucene  
>>>>>>>>>>>> any
>>>>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>>>>
>>>>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ThreadLocal, which we use in several places in Lucene,  
>>>>>>>>>>>>> causes
>>>>>>>>>>>>> a leak in app servers because the classloader never fully
>>>>>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is
>>>>>>>>>>>>> holding strong references.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>>>>> synchronization.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone have any ideas on how to solve this w/o falling
>>>>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>
>>>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no
>>>>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>>>>> assuming you are not holding any references to your
>>>>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Which means the ThreadLocal instance should no longer be
>>>>>>>>>>>>>>> reachable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It will still be referenced from the Thread(s)
>>>>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced,  
>>>>>>>>>>>>>> but the
>>>>>>>>>>>>>> values
>>>>>>>>>>>>>> (now stale) are strongly referenced and won't be actually
>>>>>>>>>>>>>> removed
>>>>>>>>>>>>>> until the table is resized (under the Java6 impl at  
>>>>>>>>>>>>>> least).
>>>>>>>>>>>>>> Nice huh?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Yonik
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------------------------------------------------------- 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----------------------------------------------------------- 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>>> --
>>>>>>>>>>>> -------
>>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>>> --
>>>>>>>>>>> ------
>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------- 
>>>>>>>>>> --
>>>>>>>>>> -----
>>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------- 
>>>>>>>>> --
>>>>>>>>> ----
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>>> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>> Nabble.com.
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Fwd%3A- 
>> ThreadLocal-in-SegmentReader-tp18326720p18416841.html
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fwd%3A-ThreadLocal-in-SegmentReader-tp18326720p18437716.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
Looks fine, and there's the only problem: all this does not work. Possibly,
this is JVM issue, possibly, there's another reason, but when I reproduce
the described situation with the web-app, its classloader cannot be
unloaded, and the profiler shows that the cause is that ThreadLocal. When I
apply the workaround (run code which uses Lucene in a Thread which dies
shortly after executing the code), the problem disappears.

I'm not holding references to this ThreadLocal anywhere in my code, it's
package-local field in package-local class of Lucene.


Robert Engels wrote:
> 
> You are mistaken - Yonik's comment in that thread is correct  
> (although it is not just at table resize - any time a ThreadLocal is  
> added, and any time the ThreadLocal is not found in its direct hash  
> it might clear entries).
> 
> The ThreadLocals map only has a WeakReference to the ThreadLocal, so  
> if that is the only reference, it will be GC'd - eventually, and then  
> it will be cleared as new ThreadLocals are created.
> 
> With a static reference, the thread can reference the ThreadLocal at  
> any time, and thus the WeakReference will not be cleared.
> 
> If the object is VERY large, and new ThreadLocals are not created it  
> could cause a problem, but I don't think that is the case with  
> Lucene, as  the objects stored in ThreadLocals are designed to live  
> for the life of the SegmentReader/IndexReader and thread.
> 
> 
> On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
> 
>>
>> Well, possibly I'm mistaken, but it seems that this affects non- 
>> static fields
>> too. Please see
>> http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html  
>> where the
>> use case is described in the details.
>> In short: it seems that the scope of ThreadLocals does not matter.  
>> What
>> really matters is that they are referenced by ThreadLocals map in  
>> the thread
>> which is still alive.
>>
>>
>> Robert Engels wrote:
>>>
>>> This is only an issue for static ThreadLocals ...
>>>
>>> On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:
>>>
>>>>
>>>> The problem here is not because ThreadLocal instances are not GC'd
>>>> (they are
>>>> GC'd, and your test shows this clearly).
>>>> But even one instance which is not removed from its Thread is
>>>> enough to
>>>> prevent the classloader from being unloaded, and that's the problem.
>>>>
>>>>
>>>> Michael McCandless-2 wrote:
>>>>>
>>>>> OK, I created a simple test to test this (attached).  The test just
>>>>> runs 10 threads, each one creating a 100 KB byte array which is
>>>>> stored
>>>>> into a ThreadLocal, and then periodically the ThreadLocal is  
>>>>> replaced
>>>>> with a new one.  This is to test whether GC of a ThreadLocal, even
>>>>> though the thread is still alive, in fact leads to GC of the  
>>>>> objects
>>>>> held in the ThreadLocal.
>>>>>
>>>>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects  
>>>>> are in
>>>>> fact properly collected.
>>>>>
>>>>> So this is not a leak but rather a "delayed collection" issue.
>>>>> Java's
>>>>> GC is never guaranteed to be immediate, and apparently when using
>>>>> ThreadLocals it's even less immediate than "normal".  In the  
>>>>> original
>>>>> issue, if other things create ThreadLocals, then eventually  
>>>>> Lucene's
>>>>> unreferenced ThreadLocals would be properly collected.
>>>>>
>>>>> So I think we continue to use non-static ThreadLocals in Lucene...
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> robert engels wrote:
>>>>>
>>>>>> Once again, these are "static" thread locals. A completely  
>>>>>> different
>>>>>> issue. Since the object is available statically, the weak  
>>>>>> reference
>>>>>> cannot be cleared so stale entries will never be cleared as  
>>>>>> long as
>>>>>> the thread is alive.
>>>>>>
>>>>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>>>>
>>>>>>> Just a few examples of "problems" using ThreadLocals.
>>>>>>>
>>>>>>> http://opensource.atlassian.com/projects/hibernate/browse/ 
>>>>>>> HHH-2481
>>>>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>>>>
>>>>>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"
>>>>>>> implementation, and maybe the current "problems" of ThreadLocals
>>>>>>> are not a problem for SegmentReader but it seems safer to use
>>>>>>> ThreadLocals to pass context information which is cleared when  
>>>>>>> the
>>>>>>> call exits instead of storing long-lived objects.
>>>>>>>
>>>>>>>
>>>>>>> robert engels wrote:
>>>>>>>>
>>>>>>>> Aside from the pre-1.5 thread local "perceived leak", there  
>>>>>>>> are no
>>>>>>>> issues with ThreadLocals if used properly.
>>>>>>>>
>>>>>>>> There is no need for try/finally blocks, unless you MUST release
>>>>>>>> resources immediately, usually this is not the case, which is  
>>>>>>>> why
>>>>>>>> a ThreadLocal is used in the first place.
>>>>>>>>
>>>>>>>> From the ThreadLocalMap javadoc...
>>>>>>>>
>>>>>>>>  /**
>>>>>>>>      * ThreadLocalMap is a customized hash map suitable only for
>>>>>>>>      * maintaining thread local values. No operations are  
>>>>>>>> exported
>>>>>>>>      * outside of the ThreadLocal class. The class is package
>>>>>>>> private to
>>>>>>>>      * allow declaration of fields in class Thread.  To help  
>>>>>>>> deal
>>>>>>>> with
>>>>>>>>      * very large and long-lived usages, the hash table entries
>>>>>>>> use
>>>>>>>>      * WeakReferences for keys. However, since reference queues
>>>>>>>> are not
>>>>>>>>      * used, stale entries are guaranteed to be removed only  
>>>>>>>> when
>>>>>>>>      * the table starts running out of space.
>>>>>>>>      */
>>>>>>>>
>>>>>>>> /**
>>>>>>>>          * Heuristically scan some cells looking for stale
>>>>>>>> entries.
>>>>>>>>          * This is invoked when either a new element is  
>>>>>>>> added, or
>>>>>>>>          * another stale one has been expunged. It performs a
>>>>>>>>          * logarithmic number of scans, as a balance between no
>>>>>>>>          * scanning (fast but retains garbage) and a number of
>>>>>>>> scans
>>>>>>>>          * proportional to number of elements, that would  
>>>>>>>> find all
>>>>>>>>          * garbage but would cause some insertions to take O(n)
>>>>>>>> time.
>>>>>>>>          *
>>>>>>>>          * @param i a position known NOT to hold a stale entry.
>>>>>>>> The
>>>>>>>>          * scan starts at the element after i.
>>>>>>>>          *
>>>>>>>>          * @param n scan control: <tt>log2(n)</tt> cells are
>>>>>>>> scanned,
>>>>>>>>          * unless a stale entry one is found, in which case
>>>>>>>>          * <tt>log2(table.length)-1</tt> additional cells are
>>>>>>>> scanned.
>>>>>>>>          * When called from insertions, this parameter is the
>>>>>>>> number
>>>>>>>>          * of elements, but when from replaceStaleEntry, it  
>>>>>>>> is the
>>>>>>>>          * table length. (Note: all this could be changed to be
>>>>>>>> either
>>>>>>>>          * more or less aggressive by weighting n instead of  
>>>>>>>> just
>>>>>>>>          * using straight log n. But this version is simple,  
>>>>>>>> fast,
>>>>>>>> and
>>>>>>>>          * seems to work well.)
>>>>>>>>          *
>>>>>>>>          * @return true if any stale entries have been removed.
>>>>>>>>          */
>>>>>>>>
>>>>>>>>
>>>>>>>> The instance ThreadLocals (and what the refer to) will be GC'd
>>>>>>>> when the containing Object is GC'd.
>>>>>>>>
>>>>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal  
>>>>>>>> refers
>>>>>>>> to an object that has native resources (e.g. file handles),  
>>>>>>>> it may
>>>>>>>> not be released until other thread locals are created by the
>>>>>>>> thread (or the thread terminates).
>>>>>>>>
>>>>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>>>>> applications it should never be necessary - unless a very  
>>>>>>>> strange
>>>>>>>> usage...
>>>>>>>>
>>>>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>>>>
>>>>>>>>> From what I know, storing objects in ThreadLocal is safe as  
>>>>>>>>> long
>>>>>>>>> as you release the object within a try {} finall {} block or
>>>>>>>>> store objects which are independent of the rest of the code(no
>>>>>>>>> dependencies).Otherwise it can get pretty tricky(memory leaks,
>>>>>>>>> classloader problems) after awhile.
>>>>>>>>>
>>>>>>>>> It is pretty convenient to pass HTTP request information with a
>>>>>>>>> ThreadLocal in a servlet(but you should cleanup the variable
>>>>>>>>> before leaving the servlet) but I'm not sure how safe it is in
>>>>>>>>> this case.
>>>>>>>>>
>>>>>>>>> robert engels wrote:
>>>>>>>>>> Using synchronization is a poor/invalid substitute for thread
>>>>>>>>>> locals in many cases.
>>>>>>>>>>
>>>>>>>>>> The point of the thread local in these referenced cases is too
>>>>>>>>>> allow streaming reads on a file descriptor. if you use a  
>>>>>>>>>> shared
>>>>>>>>>> file descriptor/buffer you are going to continually invalidate
>>>>>>>>>> the buffer.
>>>>>>>>>>
>>>>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-
>>>>>>>>>>> private instance of TermVectorsReader, to avoid synchronizing
>>>>>>>>>>> per-document when loading term vectors.
>>>>>>>>>>>
>>>>>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's
>>>>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>>>>
>>>>>>>>>>> Though, of course, we then synchronize on the underlying file
>>>>>>>>>>> (when using FSDirectory), so perhaps we are really not saving
>>>>>>>>>>> much by using ThreadLocal here.  But we are looking to relax
>>>>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>>>>
>>>>>>>>>>> Maybe we could make our own ThreadLocal that just uses a
>>>>>>>>>>> HashMap, which we'd have to synchronize on when getting the
>>>>>>>>>>> per-
>>>>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>>>>
>>>>>>>>>>> Jason has suggested moving to a model where you ask the
>>>>>>>>>>> IndexReader for an object that can return term vectors /  
>>>>>>>>>>> stored
>>>>>>>>>>> fields / etc, and then you interact with that many times to
>>>>>>>>>>> retrieve each doc.  We could then synchronize only on
>>>>>>>>>>> retrieving that object, and provide a thread-private  
>>>>>>>>>>> instance.
>>>>>>>>>>>
>>>>>>>>>>> It seems like we should move away from using ThreadLocal in
>>>>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>>>>
>>>>>>>>>>> Mike
>>>>>>>>>>>
>>>>>>>>>>> Adrian Tarau wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Usually ThreadLocal.remove() should be called at the end 
>>>>>>>>>>>> (in a
>>>>>>>>>>>> finally block), before the current call leaves your code.
>>>>>>>>>>>>
>>>>>>>>>>>> Ex : if during searching ThreadLocal is used, every search 
>>>>>>>>>>>> (..)
>>>>>>>>>>>> method should cleanup any ThreadLocal variables, or even
>>>>>>>>>>>> deeper in the implementation. When the call leaves Lucene  
>>>>>>>>>>>> any
>>>>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>>>>
>>>>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ThreadLocal, which we use in several places in Lucene,  
>>>>>>>>>>>>> causes
>>>>>>>>>>>>> a leak in app servers because the classloader never fully
>>>>>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is
>>>>>>>>>>>>> holding strong references.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>>>>> synchronization.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone have any ideas on how to solve this w/o falling
>>>>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>
>>>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no
>>>>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>>>>> assuming you are not holding any references to your
>>>>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Which means the ThreadLocal instance should no longer be
>>>>>>>>>>>>>>> reachable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It will still be referenced from the Thread(s)
>>>>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced,  
>>>>>>>>>>>>>> but the
>>>>>>>>>>>>>> values
>>>>>>>>>>>>>> (now stale) are strongly referenced and won't be actually
>>>>>>>>>>>>>> removed
>>>>>>>>>>>>>> until the table is resized (under the Java6 impl at  
>>>>>>>>>>>>>> least).
>>>>>>>>>>>>>> Nice huh?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Yonik
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------------------------------------------------------- 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----------------------------------------------------------- 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>>> --
>>>>>>>>>>>> -------
>>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>>> --
>>>>>>>>>>> ------
>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------- 
>>>>>>>>>> --
>>>>>>>>>> -----
>>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------- 
>>>>>>>>> --
>>>>>>>>> ----
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>>> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>> Nabble.com.
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Fwd%3A- 
>> ThreadLocal-in-SegmentReader-tp18326720p18416841.html
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fwd%3A-ThreadLocal-in-SegmentReader-tp18326720p18437679.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
You are mistaken - Yonik's comment in that thread is correct  
(although it is not just at table resize - any time a ThreadLocal is  
added, and any time the ThreadLocal is not found in its direct hash  
it might clear entries).

The ThreadLocals map only has a WeakReference to the ThreadLocal, so  
if that is the only reference, it will be GC'd - eventually, and then  
it will be cleared as new ThreadLocals are created.

With a static reference, the thread can reference the ThreadLocal at  
any time, and thus the WeakReference will not be cleared.

If the object is VERY large, and new ThreadLocals are not created it  
could cause a problem, but I don't think that is the case with  
Lucene, as  the objects stored in ThreadLocals are designed to live  
for the life of the SegmentReader/IndexReader and thread.


On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:

>
> Well, possibly I'm mistaken, but it seems that this affects non- 
> static fields
> too. Please see
> http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html  
> where the
> use case is described in the details.
> In short: it seems that the scope of ThreadLocals does not matter.  
> What
> really matters is that they are referenced by ThreadLocals map in  
> the thread
> which is still alive.
>
>
> Robert Engels wrote:
>>
>> This is only an issue for static ThreadLocals ...
>>
>> On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:
>>
>>>
>>> The problem here is not because ThreadLocal instances are not GC'd
>>> (they are
>>> GC'd, and your test shows this clearly).
>>> But even one instance which is not removed from its Thread is
>>> enough to
>>> prevent the classloader from being unloaded, and that's the problem.
>>>
>>>
>>> Michael McCandless-2 wrote:
>>>>
>>>> OK, I created a simple test to test this (attached).  The test just
>>>> runs 10 threads, each one creating a 100 KB byte array which is
>>>> stored
>>>> into a ThreadLocal, and then periodically the ThreadLocal is  
>>>> replaced
>>>> with a new one.  This is to test whether GC of a ThreadLocal, even
>>>> though the thread is still alive, in fact leads to GC of the  
>>>> objects
>>>> held in the ThreadLocal.
>>>>
>>>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects  
>>>> are in
>>>> fact properly collected.
>>>>
>>>> So this is not a leak but rather a "delayed collection" issue.
>>>> Java's
>>>> GC is never guaranteed to be immediate, and apparently when using
>>>> ThreadLocals it's even less immediate than "normal".  In the  
>>>> original
>>>> issue, if other things create ThreadLocals, then eventually  
>>>> Lucene's
>>>> unreferenced ThreadLocals would be properly collected.
>>>>
>>>> So I think we continue to use non-static ThreadLocals in Lucene...
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> robert engels wrote:
>>>>
>>>>> Once again, these are "static" thread locals. A completely  
>>>>> different
>>>>> issue. Since the object is available statically, the weak  
>>>>> reference
>>>>> cannot be cleared so stale entries will never be cleared as  
>>>>> long as
>>>>> the thread is alive.
>>>>>
>>>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>>>
>>>>>> Just a few examples of "problems" using ThreadLocals.
>>>>>>
>>>>>> http://opensource.atlassian.com/projects/hibernate/browse/ 
>>>>>> HHH-2481
>>>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>>>
>>>>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"
>>>>>> implementation, and maybe the current "problems" of ThreadLocals
>>>>>> are not a problem for SegmentReader but it seems safer to use
>>>>>> ThreadLocals to pass context information which is cleared when  
>>>>>> the
>>>>>> call exits instead of storing long-lived objects.
>>>>>>
>>>>>>
>>>>>> robert engels wrote:
>>>>>>>
>>>>>>> Aside from the pre-1.5 thread local "perceived leak", there  
>>>>>>> are no
>>>>>>> issues with ThreadLocals if used properly.
>>>>>>>
>>>>>>> There is no need for try/finally blocks, unless you MUST release
>>>>>>> resources immediately, usually this is not the case, which is  
>>>>>>> why
>>>>>>> a ThreadLocal is used in the first place.
>>>>>>>
>>>>>>> From the ThreadLocalMap javadoc...
>>>>>>>
>>>>>>>  /**
>>>>>>>      * ThreadLocalMap is a customized hash map suitable only for
>>>>>>>      * maintaining thread local values. No operations are  
>>>>>>> exported
>>>>>>>      * outside of the ThreadLocal class. The class is package
>>>>>>> private to
>>>>>>>      * allow declaration of fields in class Thread.  To help  
>>>>>>> deal
>>>>>>> with
>>>>>>>      * very large and long-lived usages, the hash table entries
>>>>>>> use
>>>>>>>      * WeakReferences for keys. However, since reference queues
>>>>>>> are not
>>>>>>>      * used, stale entries are guaranteed to be removed only  
>>>>>>> when
>>>>>>>      * the table starts running out of space.
>>>>>>>      */
>>>>>>>
>>>>>>> /**
>>>>>>>          * Heuristically scan some cells looking for stale
>>>>>>> entries.
>>>>>>>          * This is invoked when either a new element is  
>>>>>>> added, or
>>>>>>>          * another stale one has been expunged. It performs a
>>>>>>>          * logarithmic number of scans, as a balance between no
>>>>>>>          * scanning (fast but retains garbage) and a number of
>>>>>>> scans
>>>>>>>          * proportional to number of elements, that would  
>>>>>>> find all
>>>>>>>          * garbage but would cause some insertions to take O(n)
>>>>>>> time.
>>>>>>>          *
>>>>>>>          * @param i a position known NOT to hold a stale entry.
>>>>>>> The
>>>>>>>          * scan starts at the element after i.
>>>>>>>          *
>>>>>>>          * @param n scan control: <tt>log2(n)</tt> cells are
>>>>>>> scanned,
>>>>>>>          * unless a stale entry one is found, in which case
>>>>>>>          * <tt>log2(table.length)-1</tt> additional cells are
>>>>>>> scanned.
>>>>>>>          * When called from insertions, this parameter is the
>>>>>>> number
>>>>>>>          * of elements, but when from replaceStaleEntry, it  
>>>>>>> is the
>>>>>>>          * table length. (Note: all this could be changed to be
>>>>>>> either
>>>>>>>          * more or less aggressive by weighting n instead of  
>>>>>>> just
>>>>>>>          * using straight log n. But this version is simple,  
>>>>>>> fast,
>>>>>>> and
>>>>>>>          * seems to work well.)
>>>>>>>          *
>>>>>>>          * @return true if any stale entries have been removed.
>>>>>>>          */
>>>>>>>
>>>>>>>
>>>>>>> The instance ThreadLocals (and what the refer to) will be GC'd
>>>>>>> when the containing Object is GC'd.
>>>>>>>
>>>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal  
>>>>>>> refers
>>>>>>> to an object that has native resources (e.g. file handles),  
>>>>>>> it may
>>>>>>> not be released until other thread locals are created by the
>>>>>>> thread (or the thread terminates).
>>>>>>>
>>>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>>>> applications it should never be necessary - unless a very  
>>>>>>> strange
>>>>>>> usage...
>>>>>>>
>>>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>>>
>>>>>>>> From what I know, storing objects in ThreadLocal is safe as  
>>>>>>>> long
>>>>>>>> as you release the object within a try {} finall {} block or
>>>>>>>> store objects which are independent of the rest of the code(no
>>>>>>>> dependencies).Otherwise it can get pretty tricky(memory leaks,
>>>>>>>> classloader problems) after awhile.
>>>>>>>>
>>>>>>>> It is pretty convenient to pass HTTP request information with a
>>>>>>>> ThreadLocal in a servlet(but you should cleanup the variable
>>>>>>>> before leaving the servlet) but I'm not sure how safe it is in
>>>>>>>> this case.
>>>>>>>>
>>>>>>>> robert engels wrote:
>>>>>>>>> Using synchronization is a poor/invalid substitute for thread
>>>>>>>>> locals in many cases.
>>>>>>>>>
>>>>>>>>> The point of the thread local in these referenced cases is too
>>>>>>>>> allow streaming reads on a file descriptor. if you use a  
>>>>>>>>> shared
>>>>>>>>> file descriptor/buffer you are going to continually invalidate
>>>>>>>>> the buffer.
>>>>>>>>>
>>>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-
>>>>>>>>>> private instance of TermVectorsReader, to avoid synchronizing
>>>>>>>>>> per-document when loading term vectors.
>>>>>>>>>>
>>>>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's
>>>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>>>
>>>>>>>>>> Though, of course, we then synchronize on the underlying file
>>>>>>>>>> (when using FSDirectory), so perhaps we are really not saving
>>>>>>>>>> much by using ThreadLocal here.  But we are looking to relax
>>>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>>>
>>>>>>>>>> Maybe we could make our own ThreadLocal that just uses a
>>>>>>>>>> HashMap, which we'd have to synchronize on when getting the
>>>>>>>>>> per-
>>>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>>>
>>>>>>>>>> Jason has suggested moving to a model where you ask the
>>>>>>>>>> IndexReader for an object that can return term vectors /  
>>>>>>>>>> stored
>>>>>>>>>> fields / etc, and then you interact with that many times to
>>>>>>>>>> retrieve each doc.  We could then synchronize only on
>>>>>>>>>> retrieving that object, and provide a thread-private  
>>>>>>>>>> instance.
>>>>>>>>>>
>>>>>>>>>> It seems like we should move away from using ThreadLocal in
>>>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>>>
>>>>>>>>>> Mike
>>>>>>>>>>
>>>>>>>>>> Adrian Tarau wrote:
>>>>>>>>>>
>>>>>>>>>>> Usually ThreadLocal.remove() should be called at the end 
>>>>>>>>>>> (in a
>>>>>>>>>>> finally block), before the current call leaves your code.
>>>>>>>>>>>
>>>>>>>>>>> Ex : if during searching ThreadLocal is used, every search 
>>>>>>>>>>> (..)
>>>>>>>>>>> method should cleanup any ThreadLocal variables, or even
>>>>>>>>>>> deeper in the implementation. When the call leaves Lucene  
>>>>>>>>>>> any
>>>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>>>
>>>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> ThreadLocal, which we use in several places in Lucene,  
>>>>>>>>>>>> causes
>>>>>>>>>>>> a leak in app servers because the classloader never fully
>>>>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is
>>>>>>>>>>>> holding strong references.
>>>>>>>>>>>>
>>>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>>>> synchronization.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anyone have any ideas on how to solve this w/o falling
>>>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>>>
>>>>>>>>>>>> Mike
>>>>>>>>>>>>
>>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>>
>>>>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no
>>>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>>>> assuming you are not holding any references to your
>>>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which means the ThreadLocal instance should no longer be
>>>>>>>>>>>>>> reachable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It will still be referenced from the Thread(s)
>>>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced,  
>>>>>>>>>>>>> but the
>>>>>>>>>>>>> values
>>>>>>>>>>>>> (now stale) are strongly referenced and won't be actually
>>>>>>>>>>>>> removed
>>>>>>>>>>>>> until the table is resized (under the Java6 impl at  
>>>>>>>>>>>>> least).
>>>>>>>>>>>>> Nice huh?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Yonik
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------- 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> ---------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ----------------------------------------------------------- 
>>>>>>>>>>>> --
>>>>>>>>>>>> --------
>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>> --
>>>>>>>>>>> -------
>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> --
>>>>>>>>>> ------
>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------- 
>>>>>>>>> --
>>>>>>>>> -----
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> --
>>>>>>>> ----
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
>>> Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Fwd%3A- 
> ThreadLocal-in-SegmentReader-tp18326720p18416841.html
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
Well, possibly I'm mistaken, but it seems that this affects non-static fields
too. Please see
http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html where the
use case is described in the details.
In short: it seems that the scope of ThreadLocals does not matter. What
really matters is that they are referenced by ThreadLocals map in the thread
which is still alive.


Robert Engels wrote:
> 
> This is only an issue for static ThreadLocals ...
> 
> On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:
> 
>>
>> The problem here is not because ThreadLocal instances are not GC'd  
>> (they are
>> GC'd, and your test shows this clearly).
>> But even one instance which is not removed from its Thread is  
>> enough to
>> prevent the classloader from being unloaded, and that's the problem.
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>> OK, I created a simple test to test this (attached).  The test just
>>> runs 10 threads, each one creating a 100 KB byte array which is  
>>> stored
>>> into a ThreadLocal, and then periodically the ThreadLocal is replaced
>>> with a new one.  This is to test whether GC of a ThreadLocal, even
>>> though the thread is still alive, in fact leads to GC of the objects
>>> held in the ThreadLocal.
>>>
>>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects are in
>>> fact properly collected.
>>>
>>> So this is not a leak but rather a "delayed collection" issue.   
>>> Java's
>>> GC is never guaranteed to be immediate, and apparently when using
>>> ThreadLocals it's even less immediate than "normal".  In the original
>>> issue, if other things create ThreadLocals, then eventually Lucene's
>>> unreferenced ThreadLocals would be properly collected.
>>>
>>> So I think we continue to use non-static ThreadLocals in Lucene...
>>>
>>> Mike
>>>
>>>
>>>
>>>
>>>
>>>
>>> robert engels wrote:
>>>
>>>> Once again, these are "static" thread locals. A completely different
>>>> issue. Since the object is available statically, the weak reference
>>>> cannot be cleared so stale entries will never be cleared as long as
>>>> the thread is alive.
>>>>
>>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>>
>>>>> Just a few examples of "problems" using ThreadLocals.
>>>>>
>>>>> http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
>>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>>
>>>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"
>>>>> implementation, and maybe the current "problems" of ThreadLocals
>>>>> are not a problem for SegmentReader but it seems safer to use
>>>>> ThreadLocals to pass context information which is cleared when the
>>>>> call exits instead of storing long-lived objects.
>>>>>
>>>>>
>>>>> robert engels wrote:
>>>>>>
>>>>>> Aside from the pre-1.5 thread local "perceived leak", there are no
>>>>>> issues with ThreadLocals if used properly.
>>>>>>
>>>>>> There is no need for try/finally blocks, unless you MUST release
>>>>>> resources immediately, usually this is not the case, which is why
>>>>>> a ThreadLocal is used in the first place.
>>>>>>
>>>>>> From the ThreadLocalMap javadoc...
>>>>>>
>>>>>>  /**
>>>>>>      * ThreadLocalMap is a customized hash map suitable only for
>>>>>>      * maintaining thread local values. No operations are exported
>>>>>>      * outside of the ThreadLocal class. The class is package
>>>>>> private to
>>>>>>      * allow declaration of fields in class Thread.  To help deal
>>>>>> with
>>>>>>      * very large and long-lived usages, the hash table entries  
>>>>>> use
>>>>>>      * WeakReferences for keys. However, since reference queues
>>>>>> are not
>>>>>>      * used, stale entries are guaranteed to be removed only when
>>>>>>      * the table starts running out of space.
>>>>>>      */
>>>>>>
>>>>>> /**
>>>>>>          * Heuristically scan some cells looking for stale  
>>>>>> entries.
>>>>>>          * This is invoked when either a new element is added, or
>>>>>>          * another stale one has been expunged. It performs a
>>>>>>          * logarithmic number of scans, as a balance between no
>>>>>>          * scanning (fast but retains garbage) and a number of  
>>>>>> scans
>>>>>>          * proportional to number of elements, that would find all
>>>>>>          * garbage but would cause some insertions to take O(n)
>>>>>> time.
>>>>>>          *
>>>>>>          * @param i a position known NOT to hold a stale entry.  
>>>>>> The
>>>>>>          * scan starts at the element after i.
>>>>>>          *
>>>>>>          * @param n scan control: <tt>log2(n)</tt> cells are
>>>>>> scanned,
>>>>>>          * unless a stale entry one is found, in which case
>>>>>>          * <tt>log2(table.length)-1</tt> additional cells are
>>>>>> scanned.
>>>>>>          * When called from insertions, this parameter is the  
>>>>>> number
>>>>>>          * of elements, but when from replaceStaleEntry, it is the
>>>>>>          * table length. (Note: all this could be changed to be
>>>>>> either
>>>>>>          * more or less aggressive by weighting n instead of just
>>>>>>          * using straight log n. But this version is simple, fast,
>>>>>> and
>>>>>>          * seems to work well.)
>>>>>>          *
>>>>>>          * @return true if any stale entries have been removed.
>>>>>>          */
>>>>>>
>>>>>>
>>>>>> The instance ThreadLocals (and what the refer to) will be GC'd
>>>>>> when the containing Object is GC'd.
>>>>>>
>>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers
>>>>>> to an object that has native resources (e.g. file handles), it may
>>>>>> not be released until other thread locals are created by the
>>>>>> thread (or the thread terminates).
>>>>>>
>>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>>> applications it should never be necessary - unless a very strange
>>>>>> usage...
>>>>>>
>>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>>
>>>>>>> From what I know, storing objects in ThreadLocal is safe as long
>>>>>>> as you release the object within a try {} finall {} block or
>>>>>>> store objects which are independent of the rest of the code(no
>>>>>>> dependencies).Otherwise it can get pretty tricky(memory leaks,
>>>>>>> classloader problems) after awhile.
>>>>>>>
>>>>>>> It is pretty convenient to pass HTTP request information with a
>>>>>>> ThreadLocal in a servlet(but you should cleanup the variable
>>>>>>> before leaving the servlet) but I'm not sure how safe it is in
>>>>>>> this case.
>>>>>>>
>>>>>>> robert engels wrote:
>>>>>>>> Using synchronization is a poor/invalid substitute for thread
>>>>>>>> locals in many cases.
>>>>>>>>
>>>>>>>> The point of the thread local in these referenced cases is too
>>>>>>>> allow streaming reads on a file descriptor. if you use a shared
>>>>>>>> file descriptor/buffer you are going to continually invalidate
>>>>>>>> the buffer.
>>>>>>>>
>>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-
>>>>>>>>> private instance of TermVectorsReader, to avoid synchronizing
>>>>>>>>> per-document when loading term vectors.
>>>>>>>>>
>>>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's
>>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>>
>>>>>>>>> Though, of course, we then synchronize on the underlying file
>>>>>>>>> (when using FSDirectory), so perhaps we are really not saving
>>>>>>>>> much by using ThreadLocal here.  But we are looking to relax
>>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>>
>>>>>>>>> Maybe we could make our own ThreadLocal that just uses a
>>>>>>>>> HashMap, which we'd have to synchronize on when getting the  
>>>>>>>>> per-
>>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>>
>>>>>>>>> Jason has suggested moving to a model where you ask the
>>>>>>>>> IndexReader for an object that can return term vectors / stored
>>>>>>>>> fields / etc, and then you interact with that many times to
>>>>>>>>> retrieve each doc.  We could then synchronize only on
>>>>>>>>> retrieving that object, and provide a thread-private instance.
>>>>>>>>>
>>>>>>>>> It seems like we should move away from using ThreadLocal in
>>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>> Adrian Tarau wrote:
>>>>>>>>>
>>>>>>>>>> Usually ThreadLocal.remove() should be called at the end(in a
>>>>>>>>>> finally block), before the current call leaves your code.
>>>>>>>>>>
>>>>>>>>>> Ex : if during searching ThreadLocal is used, every search(..)
>>>>>>>>>> method should cleanup any ThreadLocal variables, or even
>>>>>>>>>> deeper in the implementation. When the call leaves Lucene any
>>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>>
>>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>>
>>>>>>>>>>> ThreadLocal, which we use in several places in Lucene, causes
>>>>>>>>>>> a leak in app servers because the classloader never fully
>>>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is
>>>>>>>>>>> holding strong references.
>>>>>>>>>>>
>>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>>> synchronization.
>>>>>>>>>>>
>>>>>>>>>>> Does anyone have any ideas on how to solve this w/o falling
>>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>>
>>>>>>>>>>> Mike
>>>>>>>>>>>
>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>
>>>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no
>>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>>> assuming you are not holding any references to your
>>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Which means the ThreadLocal instance should no longer be
>>>>>>>>>>>>> reachable.
>>>>>>>>>>>>
>>>>>>>>>>>> It will still be referenced from the Thread(s)  
>>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced, but the
>>>>>>>>>>>> values
>>>>>>>>>>>> (now stale) are strongly referenced and won't be actually
>>>>>>>>>>>> removed
>>>>>>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>>>>>>> Nice huh?
>>>>>>>>>>>>
>>>>>>>>>>>> -Yonik
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>>> ---------
>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>>> --------
>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------- 
>>>>>>>>>> -------
>>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------- 
>>>>>>>>> ------
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------- 
>>>>>>>> -----
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------- 
>>>>>>> ----
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Fwd%3A- 
>> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fwd%3A-ThreadLocal-in-SegmentReader-tp18326720p18416841.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
This is only an issue for static ThreadLocals ...

On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:

>
> The problem here is not because ThreadLocal instances are not GC'd  
> (they are
> GC'd, and your test shows this clearly).
> But even one instance which is not removed from its Thread is  
> enough to
> prevent the classloader from being unloaded, and that's the problem.
>
>
> Michael McCandless-2 wrote:
>>
>> OK, I created a simple test to test this (attached).  The test just
>> runs 10 threads, each one creating a 100 KB byte array which is  
>> stored
>> into a ThreadLocal, and then periodically the ThreadLocal is replaced
>> with a new one.  This is to test whether GC of a ThreadLocal, even
>> though the thread is still alive, in fact leads to GC of the objects
>> held in the ThreadLocal.
>>
>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects are in
>> fact properly collected.
>>
>> So this is not a leak but rather a "delayed collection" issue.   
>> Java's
>> GC is never guaranteed to be immediate, and apparently when using
>> ThreadLocals it's even less immediate than "normal".  In the original
>> issue, if other things create ThreadLocals, then eventually Lucene's
>> unreferenced ThreadLocals would be properly collected.
>>
>> So I think we continue to use non-static ThreadLocals in Lucene...
>>
>> Mike
>>
>>
>>
>>
>>
>>
>> robert engels wrote:
>>
>>> Once again, these are "static" thread locals. A completely different
>>> issue. Since the object is available statically, the weak reference
>>> cannot be cleared so stale entries will never be cleared as long as
>>> the thread is alive.
>>>
>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>
>>>> Just a few examples of "problems" using ThreadLocals.
>>>>
>>>> http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>
>>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"
>>>> implementation, and maybe the current "problems" of ThreadLocals
>>>> are not a problem for SegmentReader but it seems safer to use
>>>> ThreadLocals to pass context information which is cleared when the
>>>> call exits instead of storing long-lived objects.
>>>>
>>>>
>>>> robert engels wrote:
>>>>>
>>>>> Aside from the pre-1.5 thread local "perceived leak", there are no
>>>>> issues with ThreadLocals if used properly.
>>>>>
>>>>> There is no need for try/finally blocks, unless you MUST release
>>>>> resources immediately, usually this is not the case, which is why
>>>>> a ThreadLocal is used in the first place.
>>>>>
>>>>> From the ThreadLocalMap javadoc...
>>>>>
>>>>>  /**
>>>>>      * ThreadLocalMap is a customized hash map suitable only for
>>>>>      * maintaining thread local values. No operations are exported
>>>>>      * outside of the ThreadLocal class. The class is package
>>>>> private to
>>>>>      * allow declaration of fields in class Thread.  To help deal
>>>>> with
>>>>>      * very large and long-lived usages, the hash table entries  
>>>>> use
>>>>>      * WeakReferences for keys. However, since reference queues
>>>>> are not
>>>>>      * used, stale entries are guaranteed to be removed only when
>>>>>      * the table starts running out of space.
>>>>>      */
>>>>>
>>>>> /**
>>>>>          * Heuristically scan some cells looking for stale  
>>>>> entries.
>>>>>          * This is invoked when either a new element is added, or
>>>>>          * another stale one has been expunged. It performs a
>>>>>          * logarithmic number of scans, as a balance between no
>>>>>          * scanning (fast but retains garbage) and a number of  
>>>>> scans
>>>>>          * proportional to number of elements, that would find all
>>>>>          * garbage but would cause some insertions to take O(n)
>>>>> time.
>>>>>          *
>>>>>          * @param i a position known NOT to hold a stale entry.  
>>>>> The
>>>>>          * scan starts at the element after i.
>>>>>          *
>>>>>          * @param n scan control: <tt>log2(n)</tt> cells are
>>>>> scanned,
>>>>>          * unless a stale entry one is found, in which case
>>>>>          * <tt>log2(table.length)-1</tt> additional cells are
>>>>> scanned.
>>>>>          * When called from insertions, this parameter is the  
>>>>> number
>>>>>          * of elements, but when from replaceStaleEntry, it is the
>>>>>          * table length. (Note: all this could be changed to be
>>>>> either
>>>>>          * more or less aggressive by weighting n instead of just
>>>>>          * using straight log n. But this version is simple, fast,
>>>>> and
>>>>>          * seems to work well.)
>>>>>          *
>>>>>          * @return true if any stale entries have been removed.
>>>>>          */
>>>>>
>>>>>
>>>>> The instance ThreadLocals (and what the refer to) will be GC'd
>>>>> when the containing Object is GC'd.
>>>>>
>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers
>>>>> to an object that has native resources (e.g. file handles), it may
>>>>> not be released until other thread locals are created by the
>>>>> thread (or the thread terminates).
>>>>>
>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>> applications it should never be necessary - unless a very strange
>>>>> usage...
>>>>>
>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>
>>>>>> From what I know, storing objects in ThreadLocal is safe as long
>>>>>> as you release the object within a try {} finall {} block or
>>>>>> store objects which are independent of the rest of the code(no
>>>>>> dependencies).Otherwise it can get pretty tricky(memory leaks,
>>>>>> classloader problems) after awhile.
>>>>>>
>>>>>> It is pretty convenient to pass HTTP request information with a
>>>>>> ThreadLocal in a servlet(but you should cleanup the variable
>>>>>> before leaving the servlet) but I'm not sure how safe it is in
>>>>>> this case.
>>>>>>
>>>>>> robert engels wrote:
>>>>>>> Using synchronization is a poor/invalid substitute for thread
>>>>>>> locals in many cases.
>>>>>>>
>>>>>>> The point of the thread local in these referenced cases is too
>>>>>>> allow streaming reads on a file descriptor. if you use a shared
>>>>>>> file descriptor/buffer you are going to continually invalidate
>>>>>>> the buffer.
>>>>>>>
>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-
>>>>>>>> private instance of TermVectorsReader, to avoid synchronizing
>>>>>>>> per-document when loading term vectors.
>>>>>>>>
>>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's
>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>
>>>>>>>> Though, of course, we then synchronize on the underlying file
>>>>>>>> (when using FSDirectory), so perhaps we are really not saving
>>>>>>>> much by using ThreadLocal here.  But we are looking to relax
>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>
>>>>>>>> Maybe we could make our own ThreadLocal that just uses a
>>>>>>>> HashMap, which we'd have to synchronize on when getting the  
>>>>>>>> per-
>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>
>>>>>>>> Jason has suggested moving to a model where you ask the
>>>>>>>> IndexReader for an object that can return term vectors / stored
>>>>>>>> fields / etc, and then you interact with that many times to
>>>>>>>> retrieve each doc.  We could then synchronize only on
>>>>>>>> retrieving that object, and provide a thread-private instance.
>>>>>>>>
>>>>>>>> It seems like we should move away from using ThreadLocal in
>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> Adrian Tarau wrote:
>>>>>>>>
>>>>>>>>> Usually ThreadLocal.remove() should be called at the end(in a
>>>>>>>>> finally block), before the current call leaves your code.
>>>>>>>>>
>>>>>>>>> Ex : if during searching ThreadLocal is used, every search(..)
>>>>>>>>> method should cleanup any ThreadLocal variables, or even
>>>>>>>>> deeper in the implementation. When the call leaves Lucene any
>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>
>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>
>>>>>>>>>> ThreadLocal, which we use in several places in Lucene, causes
>>>>>>>>>> a leak in app servers because the classloader never fully
>>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is
>>>>>>>>>> holding strong references.
>>>>>>>>>>
>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>> synchronization.
>>>>>>>>>>
>>>>>>>>>> Does anyone have any ideas on how to solve this w/o falling
>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>
>>>>>>>>>> Mike
>>>>>>>>>>
>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>
>>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no
>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>> assuming you are not holding any references to your
>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>
>>>>>>>>>>>> Which means the ThreadLocal instance should no longer be
>>>>>>>>>>>> reachable.
>>>>>>>>>>>
>>>>>>>>>>> It will still be referenced from the Thread(s)  
>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced, but the
>>>>>>>>>>> values
>>>>>>>>>>> (now stale) are strongly referenced and won't be actually
>>>>>>>>>>> removed
>>>>>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>>>>>> Nice huh?
>>>>>>>>>>>
>>>>>>>>>>> -Yonik
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>> ---------
>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> --------
>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------- 
>>>>>>>>> -------
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> ------
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -----
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
> -- 
> View this message in context: http://www.nabble.com/Fwd%3A- 
> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
The problem here is not because ThreadLocal instances are not GC'd (they are
GC'd, and your test shows this clearly).
But even one instance which is not removed from its Thread is enough to
prevent the classloader from being unloaded, and that's the problem.


Michael McCandless-2 wrote:
> 
> OK, I created a simple test to test this (attached).  The test just  
> runs 10 threads, each one creating a 100 KB byte array which is stored  
> into a ThreadLocal, and then periodically the ThreadLocal is replaced  
> with a new one.  This is to test whether GC of a ThreadLocal, even  
> though the thread is still alive, in fact leads to GC of the objects  
> held in the ThreadLocal.
> 
> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects are in  
> fact properly collected.
> 
> So this is not a leak but rather a "delayed collection" issue.  Java's  
> GC is never guaranteed to be immediate, and apparently when using  
> ThreadLocals it's even less immediate than "normal".  In the original  
> issue, if other things create ThreadLocals, then eventually Lucene's  
> unreferenced ThreadLocals would be properly collected.
> 
> So I think we continue to use non-static ThreadLocals in Lucene...
> 
> Mike
> 
> 
>  
> 
> 
> 
> robert engels wrote:
> 
>> Once again, these are "static" thread locals. A completely different  
>> issue. Since the object is available statically, the weak reference  
>> cannot be cleared so stale entries will never be cleared as long as  
>> the thread is alive.
>>
>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>
>>> Just a few examples of "problems" using ThreadLocals.
>>>
>>> http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>
>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"  
>>> implementation, and maybe the current "problems" of ThreadLocals  
>>> are not a problem for SegmentReader but it seems safer to use  
>>> ThreadLocals to pass context information which is cleared when the  
>>> call exits instead of storing long-lived objects.
>>>
>>>
>>> robert engels wrote:
>>>>
>>>> Aside from the pre-1.5 thread local "perceived leak", there are no  
>>>> issues with ThreadLocals if used properly.
>>>>
>>>> There is no need for try/finally blocks, unless you MUST release  
>>>> resources immediately, usually this is not the case, which is why  
>>>> a ThreadLocal is used in the first place.
>>>>
>>>> From the ThreadLocalMap javadoc...
>>>>
>>>>  /**
>>>>      * ThreadLocalMap is a customized hash map suitable only for
>>>>      * maintaining thread local values. No operations are exported
>>>>      * outside of the ThreadLocal class. The class is package  
>>>> private to
>>>>      * allow declaration of fields in class Thread.  To help deal  
>>>> with
>>>>      * very large and long-lived usages, the hash table entries use
>>>>      * WeakReferences for keys. However, since reference queues  
>>>> are not
>>>>      * used, stale entries are guaranteed to be removed only when
>>>>      * the table starts running out of space.
>>>>      */
>>>>
>>>> /**
>>>>          * Heuristically scan some cells looking for stale entries.
>>>>          * This is invoked when either a new element is added, or
>>>>          * another stale one has been expunged. It performs a
>>>>          * logarithmic number of scans, as a balance between no
>>>>          * scanning (fast but retains garbage) and a number of scans
>>>>          * proportional to number of elements, that would find all
>>>>          * garbage but would cause some insertions to take O(n)  
>>>> time.
>>>>          *
>>>>          * @param i a position known NOT to hold a stale entry. The
>>>>          * scan starts at the element after i.
>>>>          *
>>>>          * @param n scan control: <tt>log2(n)</tt> cells are  
>>>> scanned,
>>>>          * unless a stale entry one is found, in which case
>>>>          * <tt>log2(table.length)-1</tt> additional cells are  
>>>> scanned.
>>>>          * When called from insertions, this parameter is the number
>>>>          * of elements, but when from replaceStaleEntry, it is the
>>>>          * table length. (Note: all this could be changed to be  
>>>> either
>>>>          * more or less aggressive by weighting n instead of just
>>>>          * using straight log n. But this version is simple, fast,  
>>>> and
>>>>          * seems to work well.)
>>>>          *
>>>>          * @return true if any stale entries have been removed.
>>>>          */
>>>>
>>>>
>>>> The instance ThreadLocals (and what the refer to) will be GC'd  
>>>> when the containing Object is GC'd.
>>>>
>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers  
>>>> to an object that has native resources (e.g. file handles), it may  
>>>> not be released until other thread locals are created by the  
>>>> thread (or the thread terminates).
>>>>
>>>> You can avoid this "delay" by calling remove(), but in most  
>>>> applications it should never be necessary - unless a very strange  
>>>> usage...
>>>>
>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>
>>>>> From what I know, storing objects in ThreadLocal is safe as long  
>>>>> as you release the object within a try {} finall {} block or  
>>>>> store objects which are independent of the rest of the code(no  
>>>>> dependencies).Otherwise it can get pretty tricky(memory leaks,  
>>>>> classloader problems) after awhile.
>>>>>
>>>>> It is pretty convenient to pass HTTP request information with a  
>>>>> ThreadLocal in a servlet(but you should cleanup the variable  
>>>>> before leaving the servlet) but I'm not sure how safe it is in  
>>>>> this case.
>>>>>
>>>>> robert engels wrote:
>>>>>> Using synchronization is a poor/invalid substitute for thread  
>>>>>> locals in many cases.
>>>>>>
>>>>>> The point of the thread local in these referenced cases is too  
>>>>>> allow streaming reads on a file descriptor. if you use a shared  
>>>>>> file descriptor/buffer you are going to continually invalidate  
>>>>>> the buffer.
>>>>>>
>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>
>>>>>>>
>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread- 
>>>>>>> private instance of TermVectorsReader, to avoid synchronizing  
>>>>>>> per-document when loading term vectors.
>>>>>>>
>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's  
>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>
>>>>>>> Though, of course, we then synchronize on the underlying file  
>>>>>>> (when using FSDirectory), so perhaps we are really not saving  
>>>>>>> much by using ThreadLocal here.  But we are looking to relax  
>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>
>>>>>>> Maybe we could make our own ThreadLocal that just uses a  
>>>>>>> HashMap, which we'd have to synchronize on when getting the per- 
>>>>>>> thread instances.  Or, go back to sharing a single  
>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>
>>>>>>> Jason has suggested moving to a model where you ask the  
>>>>>>> IndexReader for an object that can return term vectors / stored  
>>>>>>> fields / etc, and then you interact with that many times to  
>>>>>>> retrieve each doc.  We could then synchronize only on  
>>>>>>> retrieving that object, and provide a thread-private instance.
>>>>>>>
>>>>>>> It seems like we should move away from using ThreadLocal in  
>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> Adrian Tarau wrote:
>>>>>>>
>>>>>>>> Usually ThreadLocal.remove() should be called at the end(in a  
>>>>>>>> finally block), before the current call leaves your code.
>>>>>>>>
>>>>>>>> Ex : if during searching ThreadLocal is used, every search(..)  
>>>>>>>> method should cleanup any ThreadLocal variables, or even  
>>>>>>>> deeper in the implementation. When the call leaves Lucene any  
>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>
>>>>>>>> Michael McCandless wrote:
>>>>>>>>>
>>>>>>>>> ThreadLocal, which we use in several places in Lucene, causes  
>>>>>>>>> a leak in app servers because the classloader never fully  
>>>>>>>>> deallocates Lucene's classes because the ThreadLocal is  
>>>>>>>>> holding strong references.
>>>>>>>>>
>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding  
>>>>>>>>> synchronization.
>>>>>>>>>
>>>>>>>>> Does anyone have any ideas on how to solve this w/o falling  
>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>> Begin forwarded message:
>>>>>>>>>
>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no  
>>>>>>>>>>> longer be reachable,
>>>>>>>>>>> assuming you are not holding any references to your  
>>>>>>>>>>> IndexReader.
>>>>>>>>>>>
>>>>>>>>>>> Which means the ThreadLocal instance should no longer be  
>>>>>>>>>>> reachable.
>>>>>>>>>>
>>>>>>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced, but the  
>>>>>>>>>> values
>>>>>>>>>> (now stale) are strongly referenced and won't be actually  
>>>>>>>>>> removed
>>>>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>>>>> Nice huh?
>>>>>>>>>>
>>>>>>>>>> -Yonik
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 

-- 
View this message in context: http://www.nabble.com/Fwd%3A-ThreadLocal-in-SegmentReader-tp18326720p18416022.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
As always, you still have the issue that if the object in the  
ThreadLocal has a reference to a native resource (e.g. file handle),  
you might run out of file handles before any OOM which triggers the  
GC (to close the file handle if relying on finalization).

On Jul 11, 2008, at 4:54 AM, Michael McCandless wrote:

> OK, I created a simple test to test this (attached).  The test just  
> runs 10 threads, each one creating a 100 KB byte array which is  
> stored into a ThreadLocal, and then periodically the ThreadLocal is  
> replaced with a new one.  This is to test whether GC of a  
> ThreadLocal, even though the thread is still alive, in fact leads  
> to GC of the objects held in the ThreadLocal.
>
> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects are  
> in fact properly collected.
>
> So this is not a leak but rather a "delayed collection" issue.   
> Java's GC is never guaranteed to be immediate, and apparently when  
> using ThreadLocals it's even less immediate than "normal".  In the  
> original issue, if other things create ThreadLocals, then  
> eventually Lucene's unreferenced ThreadLocals would be properly  
> collected.
>
> So I think we continue to use non-static ThreadLocals in Lucene...
>
> Mike
>
> <ThreadTest.java>
>
>
> robert engels wrote:
>
>> Once again, these are "static" thread locals. A completely  
>> different issue. Since the object is available statically, the  
>> weak reference cannot be cleared so stale entries will never be  
>> cleared as long as the thread is alive.
>>
>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>
>>> Just a few examples of "problems" using ThreadLocals.
>>>
>>> http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>
>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"  
>>> implementation, and maybe the current "problems" of ThreadLocals  
>>> are not a problem for SegmentReader but it seems safer to use  
>>> ThreadLocals to pass context information which is cleared when  
>>> the call exits instead of storing long-lived objects.
>>>
>>>
>>> robert engels wrote:
>>>>
>>>> Aside from the pre-1.5 thread local "perceived leak", there are  
>>>> no issues with ThreadLocals if used properly.
>>>>
>>>> There is no need for try/finally blocks, unless you MUST release  
>>>> resources immediately, usually this is not the case, which is  
>>>> why a ThreadLocal is used in the first place.
>>>>
>>>> From the ThreadLocalMap javadoc...
>>>>
>>>>  /**
>>>>      * ThreadLocalMap is a customized hash map suitable only for
>>>>      * maintaining thread local values. No operations are exported
>>>>      * outside of the ThreadLocal class. The class is package  
>>>> private to
>>>>      * allow declaration of fields in class Thread.  To help  
>>>> deal with
>>>>      * very large and long-lived usages, the hash table entries use
>>>>      * WeakReferences for keys. However, since reference queues  
>>>> are not
>>>>      * used, stale entries are guaranteed to be removed only when
>>>>      * the table starts running out of space.
>>>>      */
>>>>
>>>> /**
>>>>          * Heuristically scan some cells looking for stale entries.
>>>>          * This is invoked when either a new element is added, or
>>>>          * another stale one has been expunged. It performs a
>>>>          * logarithmic number of scans, as a balance between no
>>>>          * scanning (fast but retains garbage) and a number of  
>>>> scans
>>>>          * proportional to number of elements, that would find all
>>>>          * garbage but would cause some insertions to take O(n)  
>>>> time.
>>>>          *
>>>>          * @param i a position known NOT to hold a stale entry. The
>>>>          * scan starts at the element after i.
>>>>          *
>>>>          * @param n scan control: <tt>log2(n)</tt> cells are  
>>>> scanned,
>>>>          * unless a stale entry one is found, in which case
>>>>          * <tt>log2(table.length)-1</tt> additional cells are  
>>>> scanned.
>>>>          * When called from insertions, this parameter is the  
>>>> number
>>>>          * of elements, but when from replaceStaleEntry, it is the
>>>>          * table length. (Note: all this could be changed to be  
>>>> either
>>>>          * more or less aggressive by weighting n instead of just
>>>>          * using straight log n. But this version is simple,  
>>>> fast, and
>>>>          * seems to work well.)
>>>>          *
>>>>          * @return true if any stale entries have been removed.
>>>>          */
>>>>
>>>>
>>>> The instance ThreadLocals (and what the refer to) will be GC'd  
>>>> when the containing Object is GC'd.
>>>>
>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal  
>>>> refers to an object that has native resources (e.g. file  
>>>> handles), it may not be released until other thread locals are  
>>>> created by the thread (or the thread terminates).
>>>>
>>>> You can avoid this "delay" by calling remove(), but in most  
>>>> applications it should never be necessary - unless a very  
>>>> strange usage...
>>>>
>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>
>>>>> From what I know, storing objects in ThreadLocal is safe as  
>>>>> long as you release the object within a try {} finall {} block  
>>>>> or store objects which are independent of the rest of the code 
>>>>> (no dependencies).Otherwise it can get pretty tricky(memory  
>>>>> leaks, classloader problems) after awhile.
>>>>>
>>>>> It is pretty convenient to pass HTTP request information with a  
>>>>> ThreadLocal in a servlet(but you should cleanup the variable  
>>>>> before leaving the servlet) but I'm not sure how safe it is in  
>>>>> this case.
>>>>>
>>>>> robert engels wrote:
>>>>>> Using synchronization is a poor/invalid substitute for thread  
>>>>>> locals in many cases.
>>>>>>
>>>>>> The point of the thread local in these referenced cases is too  
>>>>>> allow streaming reads on a file descriptor. if you use a  
>>>>>> shared file descriptor/buffer you are going to continually  
>>>>>> invalidate the buffer.
>>>>>>
>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>
>>>>>>>
>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread- 
>>>>>>> private instance of TermVectorsReader, to avoid synchronizing  
>>>>>>> per-document when loading term vectors.
>>>>>>>
>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's  
>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>
>>>>>>> Though, of course, we then synchronize on the underlying file  
>>>>>>> (when using FSDirectory), so perhaps we are really not saving  
>>>>>>> much by using ThreadLocal here.  But we are looking to relax  
>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>
>>>>>>> Maybe we could make our own ThreadLocal that just uses a  
>>>>>>> HashMap, which we'd have to synchronize on when getting the  
>>>>>>> per-thread instances.  Or, go back to sharing a single  
>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>
>>>>>>> Jason has suggested moving to a model where you ask the  
>>>>>>> IndexReader for an object that can return term vectors /  
>>>>>>> stored fields / etc, and then you interact with that many  
>>>>>>> times to retrieve each doc.  We could then synchronize only  
>>>>>>> on retrieving that object, and provide a thread-private  
>>>>>>> instance.
>>>>>>>
>>>>>>> It seems like we should move away from using ThreadLocal in  
>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> Adrian Tarau wrote:
>>>>>>>
>>>>>>>> Usually ThreadLocal.remove() should be called at the end(in  
>>>>>>>> a finally block), before the current call leaves your code.
>>>>>>>>
>>>>>>>> Ex : if during searching ThreadLocal is used, every search 
>>>>>>>> (..) method should cleanup any ThreadLocal variables, or  
>>>>>>>> even deeper in the implementation. When the call leaves  
>>>>>>>> Lucene any used ThreadLocal should be cleaned up.
>>>>>>>>
>>>>>>>> Michael McCandless wrote:
>>>>>>>>>
>>>>>>>>> ThreadLocal, which we use in several places in Lucene,  
>>>>>>>>> causes a leak in app servers because the classloader never  
>>>>>>>>> fully deallocates Lucene's classes because the ThreadLocal  
>>>>>>>>> is holding strong references.
>>>>>>>>>
>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding  
>>>>>>>>> synchronization.
>>>>>>>>>
>>>>>>>>> Does anyone have any ideas on how to solve this w/o falling  
>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>> Begin forwarded message:
>>>>>>>>>
>>>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>>>> So now I'm confused: the SegmentReader itself should no  
>>>>>>>>>>> longer be reachable,
>>>>>>>>>>> assuming you are not holding any references to your  
>>>>>>>>>>> IndexReader.
>>>>>>>>>>>
>>>>>>>>>>> Which means the ThreadLocal instance should no longer be  
>>>>>>>>>>> reachable.
>>>>>>>>>>
>>>>>>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>>>>>>> The key (the ThreadLocal) will be weakly referenced, but  
>>>>>>>>>> the values
>>>>>>>>>> (now stale) are strongly referenced and won't be actually  
>>>>>>>>>> removed
>>>>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>>>>> Nice huh?
>>>>>>>>>>
>>>>>>>>>> -Yonik
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> --------
>>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------- 
>>>>>>>>> -------
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> ------
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -----
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK, I created a simple test to test this (attached).  The test just  
runs 10 threads, each one creating a 100 KB byte array which is stored  
into a ThreadLocal, and then periodically the ThreadLocal is replaced  
with a new one.  This is to test whether GC of a ThreadLocal, even  
though the thread is still alive, in fact leads to GC of the objects  
held in the ThreadLocal.

Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects are in  
fact properly collected.

So this is not a leak but rather a "delayed collection" issue.  Java's  
GC is never guaranteed to be immediate, and apparently when using  
ThreadLocals it's even less immediate than "normal".  In the original  
issue, if other things create ThreadLocals, then eventually Lucene's  
unreferenced ThreadLocals would be properly collected.

So I think we continue to use non-static ThreadLocals in Lucene...

Mike


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
Once again, these are "static" thread locals. A completely different  
issue. Since the object is available statically, the weak reference  
cannot be cleared so stale entries will never be cleared as long as  
the thread is alive.

On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:

> Just a few examples of "problems" using ThreadLocals.
>
> http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
> http://www.theserverside.com/news/thread.tss?thread_id=41473
>
> Once again, I'm not pointing to Lucene SegmentReader as a "bad"  
> implementation, and maybe the current "problems" of ThreadLocals  
> are not a problem for SegmentReader but it seems safer to use  
> ThreadLocals to pass context information which is cleared when the  
> call exits instead of storing long-lived objects.
>
>
> robert engels wrote:
>>
>> Aside from the pre-1.5 thread local "perceived leak", there are no  
>> issues with ThreadLocals if used properly.
>>
>> There is no need for try/finally blocks, unless you MUST release  
>> resources immediately, usually this is not the case, which is why  
>> a ThreadLocal is used in the first place.
>>
>> From the ThreadLocalMap javadoc...
>>
>>  /**
>>      * ThreadLocalMap is a customized hash map suitable only for
>>      * maintaining thread local values. No operations are exported
>>      * outside of the ThreadLocal class. The class is package  
>> private to
>>      * allow declaration of fields in class Thread.  To help deal  
>> with
>>      * very large and long-lived usages, the hash table entries use
>>      * WeakReferences for keys. However, since reference queues  
>> are not
>>      * used, stale entries are guaranteed to be removed only when
>>      * the table starts running out of space.
>>      */
>>
>> /**
>>          * Heuristically scan some cells looking for stale entries.
>>          * This is invoked when either a new element is added, or
>>          * another stale one has been expunged. It performs a
>>          * logarithmic number of scans, as a balance between no
>>          * scanning (fast but retains garbage) and a number of scans
>>          * proportional to number of elements, that would find all
>>          * garbage but would cause some insertions to take O(n) time.
>>          *
>>          * @param i a position known NOT to hold a stale entry. The
>>          * scan starts at the element after i.
>>          *
>>          * @param n scan control: <tt>log2(n)</tt> cells are scanned,
>>          * unless a stale entry one is found, in which case
>>          * <tt>log2(table.length)-1</tt> additional cells are  
>> scanned.
>>          * When called from insertions, this parameter is the number
>>          * of elements, but when from replaceStaleEntry, it is the
>>          * table length. (Note: all this could be changed to be  
>> either
>>          * more or less aggressive by weighting n instead of just
>>          * using straight log n. But this version is simple, fast,  
>> and
>>          * seems to work well.)
>>          *
>>          * @return true if any stale entries have been removed.
>>          */
>>
>>
>> The instance ThreadLocals (and what the refer to) will be GC'd  
>> when the containing Object is GC'd.
>>
>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers  
>> to an object that has native resources (e.g. file handles), it may  
>> not be released until other thread locals are created by the  
>> thread (or the thread terminates).
>>
>> You can avoid this "delay" by calling remove(), but in most  
>> applications it should never be necessary - unless a very strange  
>> usage...
>>
>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>
>>> From what I know, storing objects in ThreadLocal is safe as long  
>>> as you release the object within a try {} finall {} block or  
>>> store objects which are independent of the rest of the code(no  
>>> dependencies).Otherwise it can get pretty tricky(memory leaks,  
>>> classloader problems) after awhile.
>>>
>>> It is pretty convenient to pass HTTP request information with a  
>>> ThreadLocal in a servlet(but you should cleanup the variable  
>>> before leaving the servlet) but I'm not sure how safe it is in  
>>> this case.
>>>
>>> robert engels wrote:
>>>> Using synchronization is a poor/invalid substitute for thread  
>>>> locals in many cases.
>>>>
>>>> The point of the thread local in these referenced cases is too  
>>>> allow streaming reads on a file descriptor. if you use a shared  
>>>> file descriptor/buffer you are going to continually invalidate  
>>>> the buffer.
>>>>
>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>
>>>>>
>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread- 
>>>>> private instance of TermVectorsReader, to avoid synchronizing  
>>>>> per-document when loading term vectors.
>>>>>
>>>>> Clearing this ThreadLocal value per call to SegmentReader's  
>>>>> methods that load term vectors would defeat its purpose.
>>>>>
>>>>> Though, of course, we then synchronize on the underlying file  
>>>>> (when using FSDirectory), so perhaps we are really not saving  
>>>>> much by using ThreadLocal here.  But we are looking to relax  
>>>>> that low level synchronization with LUCENE-753.
>>>>>
>>>>> Maybe we could make our own ThreadLocal that just uses a  
>>>>> HashMap, which we'd have to synchronize on when getting the per- 
>>>>> thread instances.  Or, go back to sharing a single  
>>>>> TermVectorsReader and synchronize per-document.
>>>>>
>>>>> Jason has suggested moving to a model where you ask the  
>>>>> IndexReader for an object that can return term vectors / stored  
>>>>> fields / etc, and then you interact with that many times to  
>>>>> retrieve each doc.  We could then synchronize only on  
>>>>> retrieving that object, and provide a thread-private instance.
>>>>>
>>>>> It seems like we should move away from using ThreadLocal in  
>>>>> Lucene and do "normal" synchronization instead.
>>>>>
>>>>> Mike
>>>>>
>>>>> Adrian Tarau wrote:
>>>>>
>>>>>> Usually ThreadLocal.remove() should be called at the end(in a  
>>>>>> finally block), before the current call leaves your code.
>>>>>>
>>>>>> Ex : if during searching ThreadLocal is used, every search(..)  
>>>>>> method should cleanup any ThreadLocal variables, or even  
>>>>>> deeper in the implementation. When the call leaves Lucene any  
>>>>>> used ThreadLocal should be cleaned up.
>>>>>>
>>>>>> Michael McCandless wrote:
>>>>>>>
>>>>>>> ThreadLocal, which we use in several places in Lucene, causes  
>>>>>>> a leak in app servers because the classloader never fully  
>>>>>>> deallocates Lucene's classes because the ThreadLocal is  
>>>>>>> holding strong references.
>>>>>>>
>>>>>>> Yet, ThreadLocal is very convenient for avoiding  
>>>>>>> synchronization.
>>>>>>>
>>>>>>> Does anyone have any ideas on how to solve this w/o falling  
>>>>>>> back to "normal" synchronization?
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> Begin forwarded message:
>>>>>>>
>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>
>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>> So now I'm confused: the SegmentReader itself should no  
>>>>>>>>> longer be reachable,
>>>>>>>>> assuming you are not holding any references to your  
>>>>>>>>> IndexReader.
>>>>>>>>>
>>>>>>>>> Which means the ThreadLocal instance should no longer be  
>>>>>>>>> reachable.
>>>>>>>>
>>>>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>>>>> The key (the ThreadLocal) will be weakly referenced, but the  
>>>>>>>> values
>>>>>>>> (now stale) are strongly referenced and won't be actually  
>>>>>>>> removed
>>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>>> Nice huh?
>>>>>>>>
>>>>>>>> -Yonik
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> ------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -----
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>


Re: ThreadLocal in SegmentReader

Posted by Adrian Tarau <ad...@gmail.com>.
Just a few examples of "problems" using ThreadLocals.

http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
http://www.theserverside.com/news/thread.tss?thread_id=41473

Once again, I'm not pointing to Lucene SegmentReader as a "bad" 
implementation, and maybe the current "problems" of ThreadLocals are not 
a problem for SegmentReader but it seems safer to use ThreadLocals to 
pass context information which is cleared when the call exits instead of 
storing long-lived objects.


robert engels wrote:
> Aside from the pre-1.5 thread local "perceived leak", there are no 
> issues with ThreadLocals if used properly.
>
> There is no need for try/finally blocks, unless you MUST release 
> resources immediately, usually this is not the case, which is why a 
> ThreadLocal is used in the first place.
>
> From the ThreadLocalMap javadoc...
>
>  /**
>      * ThreadLocalMap is a customized hash map suitable only for
>      * maintaining thread local values. No operations are exported
>      * outside of the ThreadLocal class. The class is package private to
>      * allow declaration of fields in class Thread.  To help deal with
>      * very large and long-lived usages, the hash table entries use
>      * WeakReferences for keys. However, since reference queues are not
>      * used, stale entries are guaranteed to be removed only when
>      * the table starts running out of space.
>      */
>
> /**
>          * Heuristically scan some cells looking for stale entries.
>          * This is invoked when either a new element is added, or
>          * another stale one has been expunged. It performs a
>          * logarithmic number of scans, as a balance between no
>          * scanning (fast but retains garbage) and a number of scans
>          * proportional to number of elements, that would find all
>          * garbage but would cause some insertions to take O(n) time.
>          *
>          * @param i a position known NOT to hold a stale entry. The
>          * scan starts at the element after i.
>          *
>          * @param n scan control: <tt>log2(n)</tt> cells are scanned,
>          * unless a stale entry one is found, in which case
>          * <tt>log2(table.length)-1</tt> additional cells are scanned.
>          * When called from insertions, this parameter is the number
>          * of elements, but when from replaceStaleEntry, it is the
>          * table length. (Note: all this could be changed to be either
>          * more or less aggressive by weighting n instead of just
>          * using straight log n. But this version is simple, fast, and
>          * seems to work well.)
>          *
>          * @return true if any stale entries have been removed.
>          */
>
>
> The instance ThreadLocals (and what the refer to) will be GC'd when 
> the containing Object is GC'd.
>
> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers to 
> an object that has native resources (e.g. file handles), it may not be 
> released until other thread locals are created by the thread (or the 
> thread terminates).
>
> You can avoid this "delay" by calling remove(), but in most 
> applications it should never be necessary - unless a very strange usage...
>
> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>
>> From what I know, storing objects in ThreadLocal is safe as long as 
>> you release the object within a try {} finall {} block or store 
>> objects which are independent of the rest of the code(no 
>> dependencies).Otherwise it can get pretty tricky(memory leaks, 
>> classloader problems) after awhile.
>>
>> It is pretty convenient to pass HTTP request information with a 
>> ThreadLocal in a servlet(but you should cleanup the variable before 
>> leaving the servlet) but I'm not sure how safe it is in this case.
>>
>> robert engels wrote:
>>> Using synchronization is a poor/invalid substitute for thread locals 
>>> in many cases.
>>>
>>> The point of the thread local in these referenced cases is too allow 
>>> streaming reads on a file descriptor. if you use a shared file 
>>> descriptor/buffer you are going to continually invalidate the buffer.
>>>
>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>
>>>>
>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-private 
>>>> instance of TermVectorsReader, to avoid synchronizing per-document 
>>>> when loading term vectors.
>>>>
>>>> Clearing this ThreadLocal value per call to SegmentReader's methods 
>>>> that load term vectors would defeat its purpose.
>>>>
>>>> Though, of course, we then synchronize on the underlying file (when 
>>>> using FSDirectory), so perhaps we are really not saving much by 
>>>> using ThreadLocal here.  But we are looking to relax that low level 
>>>> synchronization with LUCENE-753.
>>>>
>>>> Maybe we could make our own ThreadLocal that just uses a HashMap, 
>>>> which we'd have to synchronize on when getting the per-thread 
>>>> instances.  Or, go back to sharing a single TermVectorsReader and 
>>>> synchronize per-document.
>>>>
>>>> Jason has suggested moving to a model where you ask the IndexReader 
>>>> for an object that can return term vectors / stored fields / etc, 
>>>> and then you interact with that many times to retrieve each doc.  
>>>> We could then synchronize only on retrieving that object, and 
>>>> provide a thread-private instance.
>>>>
>>>> It seems like we should move away from using ThreadLocal in Lucene 
>>>> and do "normal" synchronization instead.
>>>>
>>>> Mike
>>>>
>>>> Adrian Tarau wrote:
>>>>
>>>>> Usually ThreadLocal.remove() should be called at the end(in a 
>>>>> finally block), before the current call leaves your code.
>>>>>
>>>>> Ex : if during searching ThreadLocal is used, every search(..) 
>>>>> method should cleanup any ThreadLocal variables, or even deeper in 
>>>>> the implementation. When the call leaves Lucene any used 
>>>>> ThreadLocal should be cleaned up.
>>>>>
>>>>> Michael McCandless wrote:
>>>>>>
>>>>>> ThreadLocal, which we use in several places in Lucene, causes a 
>>>>>> leak in app servers because the classloader never fully 
>>>>>> deallocates Lucene's classes because the ThreadLocal is holding 
>>>>>> strong references.
>>>>>>
>>>>>> Yet, ThreadLocal is very convenient for avoiding synchronization.
>>>>>>
>>>>>> Does anyone have any ideas on how to solve this w/o falling back 
>>>>>> to "normal" synchronization?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Begin forwarded message:
>>>>>>
>>>>>>> From: "Yonik Seeley" <yonik@apache.org <ma...@apache.org>>
>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>> To: java-user@lucene.apache.org <ma...@lucene.apache.org>
>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>> Reply-To: java-user@lucene.apache.org 
>>>>>>> <ma...@lucene.apache.org>
>>>>>>>
>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>> <lucene@mikemccandless.com <ma...@mikemccandless.com>> 
>>>>>>> wrote:
>>>>>>>> So now I'm confused: the SegmentReader itself should no longer 
>>>>>>>> be reachable,
>>>>>>>> assuming you are not holding any references to your IndexReader.
>>>>>>>>
>>>>>>>> Which means the ThreadLocal instance should no longer be reachable.
>>>>>>>
>>>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>>>> The key (the ThreadLocal) will be weakly referenced, but the values
>>>>>>> (now stale) are strongly referenced and won't be actually removed
>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>> Nice huh?
>>>>>>>
>>>>>>> -Yonik
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
>>>>>>> <ma...@lucene.apache.org>
>>>>>>> For additional commands, e-mail: 
>>>>>>> java-user-help@lucene.apache.org 
>>>>>>> <ma...@lucene.apache.org>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>>>> <ma...@lucene.apache.org>
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>>>>> <ma...@lucene.apache.org>
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>>> <ma...@lucene.apache.org>
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>>>> <ma...@lucene.apache.org>
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>> <ma...@lucene.apache.org>
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>>> <ma...@lucene.apache.org>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>> <ma...@lucene.apache.org>
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>> <ma...@lucene.apache.org>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>> <ma...@lucene.apache.org>
>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>> <ma...@lucene.apache.org>
>>
>


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
As stated in the cited email, they use a static ThreadLocal - this is  
a different issue (and I don't think Lucene uses any of these).

It is still not a bug in this case, but just the same "delayed"  
release behavior.  The problem is that the some class loaders  
probably are never released as the Threads are never terminated, yet  
they do not create any more ThreadLocals (thus causing some stale  
slots to never be cleared).

I am not sure that Lucene has many long-lived objects... maybe  
SegmentReaders if the index rarely changes/updated.

We have done extensive memory profiling of Lucene - there are no  
inherent leaks (as of the 1.9 code set). I don't think anything has  
changed since then related to this (expect using some  
FixedThreadLocal to improve the situation in some cases).

If you search the archives for FixedThreadLocal you can find a simple  
implementation that solves the perceived problem.


On Jul 9, 2008, at 3:59 PM, Adrian Tarau wrote:

> As soon as the object can be collected...which doesn't happen in  
> case of  (some) thread pools.
>
> Here is an example : http://www.opensubscriber.com/message/ojb- 
> user@db.apache.org/1500135.html (it's pretty old but you can find  
> newer examples googling around). There are some replies with  
> workarounds.
>
> I'm not saying there is something wrong using ThreadLocal in  
> Lucene, but it could cause problems when classloaders have to be  
> released, as Michael noticed.
>
> As Jason suggested, a global manager for ThreadLocals could be the  
> solution.
> I prefer to use ThreadLocal to pass "context" objects but I didn't  
> use them to store long-lived objects. I'm always releasing the  
> object after it was used. I'm not sure if is too preventive, but I  
> had some similar problems with web applications(redeploy),  
> ThreadLocals  and long-lived objects.
>
> robert engels wrote:
>>
>> Aside from the pre-1.5 thread local "perceived leak", there are no  
>> issues with ThreadLocals if used properly.
>>
>> There is no need for try/finally blocks, unless you MUST release  
>> resources immediately, usually this is not the case, which is why  
>> a ThreadLocal is used in the first place.
>>
>> From the ThreadLocalMap javadoc...
>>
>>  /**
>>      * ThreadLocalMap is a customized hash map suitable only for
>>      * maintaining thread local values. No operations are exported
>>      * outside of the ThreadLocal class. The class is package  
>> private to
>>      * allow declaration of fields in class Thread.  To help deal  
>> with
>>      * very large and long-lived usages, the hash table entries use
>>      * WeakReferences for keys. However, since reference queues  
>> are not
>>      * used, stale entries are guaranteed to be removed only when
>>      * the table starts running out of space.
>>      */
>>
>> /**
>>          * Heuristically scan some cells looking for stale entries.
>>          * This is invoked when either a new element is added, or
>>          * another stale one has been expunged. It performs a
>>          * logarithmic number of scans, as a balance between no
>>          * scanning (fast but retains garbage) and a number of scans
>>          * proportional to number of elements, that would find all
>>          * garbage but would cause some insertions to take O(n) time.
>>          *
>>          * @param i a position known NOT to hold a stale entry. The
>>          * scan starts at the element after i.
>>          *
>>          * @param n scan control: <tt>log2(n)</tt> cells are scanned,
>>          * unless a stale entry one is found, in which case
>>          * <tt>log2(table.length)-1</tt> additional cells are  
>> scanned.
>>          * When called from insertions, this parameter is the number
>>          * of elements, but when from replaceStaleEntry, it is the
>>          * table length. (Note: all this could be changed to be  
>> either
>>          * more or less aggressive by weighting n instead of just
>>          * using straight log n. But this version is simple, fast,  
>> and
>>          * seems to work well.)
>>          *
>>          * @return true if any stale entries have been removed.
>>          */
>>
>>
>> The instance ThreadLocals (and what the refer to) will be GC'd  
>> when the containing Object is GC'd.
>>
>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers  
>> to an object that has native resources (e.g. file handles), it may  
>> not be released until other thread locals are created by the  
>> thread (or the thread terminates).
>>
>> You can avoid this "delay" by calling remove(), but in most  
>> applications it should never be necessary - unless a very strange  
>> usage...
>>
>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>
>>> From what I know, storing objects in ThreadLocal is safe as long  
>>> as you release the object within a try {} finall {} block or  
>>> store objects which are independent of the rest of the code(no  
>>> dependencies).Otherwise it can get pretty tricky(memory leaks,  
>>> classloader problems) after awhile.
>>>
>>> It is pretty convenient to pass HTTP request information with a  
>>> ThreadLocal in a servlet(but you should cleanup the variable  
>>> before leaving the servlet) but I'm not sure how safe it is in  
>>> this case.
>>>
>>> robert engels wrote:
>>>> Using synchronization is a poor/invalid substitute for thread  
>>>> locals in many cases.
>>>>
>>>> The point of the thread local in these referenced cases is too  
>>>> allow streaming reads on a file descriptor. if you use a shared  
>>>> file descriptor/buffer you are going to continually invalidate  
>>>> the buffer.
>>>>
>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>
>>>>>
>>>>> Well ... SegmentReader uses ThreadLocal to hold a thread- 
>>>>> private instance of TermVectorsReader, to avoid synchronizing  
>>>>> per-document when loading term vectors.
>>>>>
>>>>> Clearing this ThreadLocal value per call to SegmentReader's  
>>>>> methods that load term vectors would defeat its purpose.
>>>>>
>>>>> Though, of course, we then synchronize on the underlying file  
>>>>> (when using FSDirectory), so perhaps we are really not saving  
>>>>> much by using ThreadLocal here.  But we are looking to relax  
>>>>> that low level synchronization with LUCENE-753.
>>>>>
>>>>> Maybe we could make our own ThreadLocal that just uses a  
>>>>> HashMap, which we'd have to synchronize on when getting the per- 
>>>>> thread instances.  Or, go back to sharing a single  
>>>>> TermVectorsReader and synchronize per-document.
>>>>>
>>>>> Jason has suggested moving to a model where you ask the  
>>>>> IndexReader for an object that can return term vectors / stored  
>>>>> fields / etc, and then you interact with that many times to  
>>>>> retrieve each doc.  We could then synchronize only on  
>>>>> retrieving that object, and provide a thread-private instance.
>>>>>
>>>>> It seems like we should move away from using ThreadLocal in  
>>>>> Lucene and do "normal" synchronization instead.
>>>>>
>>>>> Mike
>>>>>
>>>>> Adrian Tarau wrote:
>>>>>
>>>>>> Usually ThreadLocal.remove() should be called at the end(in a  
>>>>>> finally block), before the current call leaves your code.
>>>>>>
>>>>>> Ex : if during searching ThreadLocal is used, every search(..)  
>>>>>> method should cleanup any ThreadLocal variables, or even  
>>>>>> deeper in the implementation. When the call leaves Lucene any  
>>>>>> used ThreadLocal should be cleaned up.
>>>>>>
>>>>>> Michael McCandless wrote:
>>>>>>>
>>>>>>> ThreadLocal, which we use in several places in Lucene, causes  
>>>>>>> a leak in app servers because the classloader never fully  
>>>>>>> deallocates Lucene's classes because the ThreadLocal is  
>>>>>>> holding strong references.
>>>>>>>
>>>>>>> Yet, ThreadLocal is very convenient for avoiding  
>>>>>>> synchronization.
>>>>>>>
>>>>>>> Does anyone have any ideas on how to solve this w/o falling  
>>>>>>> back to "normal" synchronization?
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> Begin forwarded message:
>>>>>>>
>>>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>
>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>>>> So now I'm confused: the SegmentReader itself should no  
>>>>>>>>> longer be reachable,
>>>>>>>>> assuming you are not holding any references to your  
>>>>>>>>> IndexReader.
>>>>>>>>>
>>>>>>>>> Which means the ThreadLocal instance should no longer be  
>>>>>>>>> reachable.
>>>>>>>>
>>>>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>>>>> The key (the ThreadLocal) will be weakly referenced, but the  
>>>>>>>> values
>>>>>>>> (now stale) are strongly referenced and won't be actually  
>>>>>>>> removed
>>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>>> Nice huh?
>>>>>>>>
>>>>>>>> -Yonik
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> ------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -----
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>


Re: ThreadLocal in SegmentReader

Posted by Adrian Tarau <ad...@gmail.com>.
As soon as the object can be collected...which doesn't happen in case 
of  (some) thread pools.

Here is an example : 
http://www.opensubscriber.com/message/ojb-user@db.apache.org/1500135.html 
(it's pretty old but you can find newer examples googling around). There 
are some replies with workarounds.

I'm not saying there is something wrong using ThreadLocal in Lucene, but 
it could cause problems when classloaders have to be released, as 
Michael noticed.

As Jason suggested, a global manager for ThreadLocals could be the solution.
I prefer to use ThreadLocal to pass "context" objects but I didn't use 
them to store long-lived objects. I'm always releasing the object after 
it was used. I'm not sure if is too preventive, but I had some similar 
problems with web applications(redeploy), ThreadLocals  and long-lived 
objects.

robert engels wrote:
> Aside from the pre-1.5 thread local "perceived leak", there are no 
> issues with ThreadLocals if used properly.
>
> There is no need for try/finally blocks, unless you MUST release 
> resources immediately, usually this is not the case, which is why a 
> ThreadLocal is used in the first place.
>
> From the ThreadLocalMap javadoc...
>
>  /**
>      * ThreadLocalMap is a customized hash map suitable only for
>      * maintaining thread local values. No operations are exported
>      * outside of the ThreadLocal class. The class is package private to
>      * allow declaration of fields in class Thread.  To help deal with
>      * very large and long-lived usages, the hash table entries use
>      * WeakReferences for keys. However, since reference queues are not
>      * used, stale entries are guaranteed to be removed only when
>      * the table starts running out of space.
>      */
>
> /**
>          * Heuristically scan some cells looking for stale entries.
>          * This is invoked when either a new element is added, or
>          * another stale one has been expunged. It performs a
>          * logarithmic number of scans, as a balance between no
>          * scanning (fast but retains garbage) and a number of scans
>          * proportional to number of elements, that would find all
>          * garbage but would cause some insertions to take O(n) time.
>          *
>          * @param i a position known NOT to hold a stale entry. The
>          * scan starts at the element after i.
>          *
>          * @param n scan control: <tt>log2(n)</tt> cells are scanned,
>          * unless a stale entry one is found, in which case
>          * <tt>log2(table.length)-1</tt> additional cells are scanned.
>          * When called from insertions, this parameter is the number
>          * of elements, but when from replaceStaleEntry, it is the
>          * table length. (Note: all this could be changed to be either
>          * more or less aggressive by weighting n instead of just
>          * using straight log n. But this version is simple, fast, and
>          * seems to work well.)
>          *
>          * @return true if any stale entries have been removed.
>          */
>
>
> The instance ThreadLocals (and what the refer to) will be GC'd when 
> the containing Object is GC'd.
>
> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers to 
> an object that has native resources (e.g. file handles), it may not be 
> released until other thread locals are created by the thread (or the 
> thread terminates).
>
> You can avoid this "delay" by calling remove(), but in most 
> applications it should never be necessary - unless a very strange usage...
>
> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>
>> From what I know, storing objects in ThreadLocal is safe as long as 
>> you release the object within a try {} finall {} block or store 
>> objects which are independent of the rest of the code(no 
>> dependencies).Otherwise it can get pretty tricky(memory leaks, 
>> classloader problems) after awhile.
>>
>> It is pretty convenient to pass HTTP request information with a 
>> ThreadLocal in a servlet(but you should cleanup the variable before 
>> leaving the servlet) but I'm not sure how safe it is in this case.
>>
>> robert engels wrote:
>>> Using synchronization is a poor/invalid substitute for thread locals 
>>> in many cases.
>>>
>>> The point of the thread local in these referenced cases is too allow 
>>> streaming reads on a file descriptor. if you use a shared file 
>>> descriptor/buffer you are going to continually invalidate the buffer.
>>>
>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>
>>>>
>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-private 
>>>> instance of TermVectorsReader, to avoid synchronizing per-document 
>>>> when loading term vectors.
>>>>
>>>> Clearing this ThreadLocal value per call to SegmentReader's methods 
>>>> that load term vectors would defeat its purpose.
>>>>
>>>> Though, of course, we then synchronize on the underlying file (when 
>>>> using FSDirectory), so perhaps we are really not saving much by 
>>>> using ThreadLocal here.  But we are looking to relax that low level 
>>>> synchronization with LUCENE-753.
>>>>
>>>> Maybe we could make our own ThreadLocal that just uses a HashMap, 
>>>> which we'd have to synchronize on when getting the per-thread 
>>>> instances.  Or, go back to sharing a single TermVectorsReader and 
>>>> synchronize per-document.
>>>>
>>>> Jason has suggested moving to a model where you ask the IndexReader 
>>>> for an object that can return term vectors / stored fields / etc, 
>>>> and then you interact with that many times to retrieve each doc.  
>>>> We could then synchronize only on retrieving that object, and 
>>>> provide a thread-private instance.
>>>>
>>>> It seems like we should move away from using ThreadLocal in Lucene 
>>>> and do "normal" synchronization instead.
>>>>
>>>> Mike
>>>>
>>>> Adrian Tarau wrote:
>>>>
>>>>> Usually ThreadLocal.remove() should be called at the end(in a 
>>>>> finally block), before the current call leaves your code.
>>>>>
>>>>> Ex : if during searching ThreadLocal is used, every search(..) 
>>>>> method should cleanup any ThreadLocal variables, or even deeper in 
>>>>> the implementation. When the call leaves Lucene any used 
>>>>> ThreadLocal should be cleaned up.
>>>>>
>>>>> Michael McCandless wrote:
>>>>>>
>>>>>> ThreadLocal, which we use in several places in Lucene, causes a 
>>>>>> leak in app servers because the classloader never fully 
>>>>>> deallocates Lucene's classes because the ThreadLocal is holding 
>>>>>> strong references.
>>>>>>
>>>>>> Yet, ThreadLocal is very convenient for avoiding synchronization.
>>>>>>
>>>>>> Does anyone have any ideas on how to solve this w/o falling back 
>>>>>> to "normal" synchronization?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Begin forwarded message:
>>>>>>
>>>>>>> From: "Yonik Seeley" <yonik@apache.org <ma...@apache.org>>
>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>> To: java-user@lucene.apache.org <ma...@lucene.apache.org>
>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>> Reply-To: java-user@lucene.apache.org 
>>>>>>> <ma...@lucene.apache.org>
>>>>>>>
>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>> <lucene@mikemccandless.com <ma...@mikemccandless.com>> 
>>>>>>> wrote:
>>>>>>>> So now I'm confused: the SegmentReader itself should no longer 
>>>>>>>> be reachable,
>>>>>>>> assuming you are not holding any references to your IndexReader.
>>>>>>>>
>>>>>>>> Which means the ThreadLocal instance should no longer be reachable.
>>>>>>>
>>>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>>>> The key (the ThreadLocal) will be weakly referenced, but the values
>>>>>>> (now stale) are strongly referenced and won't be actually removed
>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>> Nice huh?
>>>>>>>
>>>>>>> -Yonik
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
>>>>>>> <ma...@lucene.apache.org>
>>>>>>> For additional commands, e-mail: 
>>>>>>> java-user-help@lucene.apache.org 
>>>>>>> <ma...@lucene.apache.org>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>>>> <ma...@lucene.apache.org>
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>>>>> <ma...@lucene.apache.org>
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>>> <ma...@lucene.apache.org>
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>>>> <ma...@lucene.apache.org>
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>> <ma...@lucene.apache.org>
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>>> <ma...@lucene.apache.org>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>> <ma...@lucene.apache.org>
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>> <ma...@lucene.apache.org>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>> <ma...@lucene.apache.org>
>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>> <ma...@lucene.apache.org>
>>
>


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
Aside from the pre-1.5 thread local "perceived leak", there are no  
issues with ThreadLocals if used properly.

There is no need for try/finally blocks, unless you MUST release  
resources immediately, usually this is not the case, which is why a  
ThreadLocal is used in the first place.

 From the ThreadLocalMap javadoc...

  /**
      * ThreadLocalMap is a customized hash map suitable only for
      * maintaining thread local values. No operations are exported
      * outside of the ThreadLocal class. The class is package  
private to
      * allow declaration of fields in class Thread.  To help deal with
      * very large and long-lived usages, the hash table entries use
      * WeakReferences for keys. However, since reference queues are not
      * used, stale entries are guaranteed to be removed only when
      * the table starts running out of space.
      */

/**
          * Heuristically scan some cells looking for stale entries.
          * This is invoked when either a new element is added, or
          * another stale one has been expunged. It performs a
          * logarithmic number of scans, as a balance between no
          * scanning (fast but retains garbage) and a number of scans
          * proportional to number of elements, that would find all
          * garbage but would cause some insertions to take O(n) time.
          *
          * @param i a position known NOT to hold a stale entry. The
          * scan starts at the element after i.
          *
          * @param n scan control: <tt>log2(n)</tt> cells are scanned,
          * unless a stale entry one is found, in which case
          * <tt>log2(table.length)-1</tt> additional cells are scanned.
          * When called from insertions, this parameter is the number
          * of elements, but when from replaceStaleEntry, it is the
          * table length. (Note: all this could be changed to be either
          * more or less aggressive by weighting n instead of just
          * using straight log n. But this version is simple, fast, and
          * seems to work well.)
          *
          * @return true if any stale entries have been removed.
          */


The instance ThreadLocals (and what the refer to) will be GC'd when  
the containing Object is GC'd.

There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers to  
an object that has native resources (e.g. file handles), it may not  
be released until other thread locals are created by the thread (or  
the thread terminates).

You can avoid this "delay" by calling remove(), but in most  
applications it should never be necessary - unless a very strange  
usage...

On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:

> From what I know, storing objects in ThreadLocal is safe as long as  
> you release the object within a try {} finall {} block or store  
> objects which are independent of the rest of the code(no  
> dependencies).Otherwise it can get pretty tricky(memory leaks,  
> classloader problems) after awhile.
>
> It is pretty convenient to pass HTTP request information with a  
> ThreadLocal in a servlet(but you should cleanup the variable before  
> leaving the servlet) but I'm not sure how safe it is in this case.
>
> robert engels wrote:
>> Using synchronization is a poor/invalid substitute for thread  
>> locals in many cases.
>>
>> The point of the thread local in these referenced cases is too  
>> allow streaming reads on a file descriptor. if you use a shared  
>> file descriptor/buffer you are going to continually invalidate the  
>> buffer.
>>
>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>
>>>
>>> Well ... SegmentReader uses ThreadLocal to hold a thread-private  
>>> instance of TermVectorsReader, to avoid synchronizing per- 
>>> document when loading term vectors.
>>>
>>> Clearing this ThreadLocal value per call to SegmentReader's  
>>> methods that load term vectors would defeat its purpose.
>>>
>>> Though, of course, we then synchronize on the underlying file  
>>> (when using FSDirectory), so perhaps we are really not saving  
>>> much by using ThreadLocal here.  But we are looking to relax that  
>>> low level synchronization with LUCENE-753.
>>>
>>> Maybe we could make our own ThreadLocal that just uses a HashMap,  
>>> which we'd have to synchronize on when getting the per-thread  
>>> instances.  Or, go back to sharing a single TermVectorsReader and  
>>> synchronize per-document.
>>>
>>> Jason has suggested moving to a model where you ask the  
>>> IndexReader for an object that can return term vectors / stored  
>>> fields / etc, and then you interact with that many times to  
>>> retrieve each doc.  We could then synchronize only on retrieving  
>>> that object, and provide a thread-private instance.
>>>
>>> It seems like we should move away from using ThreadLocal in  
>>> Lucene and do "normal" synchronization instead.
>>>
>>> Mike
>>>
>>> Adrian Tarau wrote:
>>>
>>>> Usually ThreadLocal.remove() should be called at the end(in a  
>>>> finally block), before the current call leaves your code.
>>>>
>>>> Ex : if during searching ThreadLocal is used, every search(..)  
>>>> method should cleanup any ThreadLocal variables, or even deeper  
>>>> in the implementation. When the call leaves Lucene any used  
>>>> ThreadLocal should be cleaned up.
>>>>
>>>> Michael McCandless wrote:
>>>>>
>>>>> ThreadLocal, which we use in several places in Lucene, causes a  
>>>>> leak in app servers because the classloader never fully  
>>>>> deallocates Lucene's classes because the ThreadLocal is holding  
>>>>> strong references.
>>>>>
>>>>> Yet, ThreadLocal is very convenient for avoiding synchronization.
>>>>>
>>>>> Does anyone have any ideas on how to solve this w/o falling  
>>>>> back to "normal" synchronization?
>>>>>
>>>>> Mike
>>>>>
>>>>> Begin forwarded message:
>>>>>
>>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>> To: java-user@lucene.apache.org
>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>
>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>> <lu...@mikemccandless.com> wrote:
>>>>>>> So now I'm confused: the SegmentReader itself should no  
>>>>>>> longer be reachable,
>>>>>>> assuming you are not holding any references to your IndexReader.
>>>>>>>
>>>>>>> Which means the ThreadLocal instance should no longer be  
>>>>>>> reachable.
>>>>>>
>>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>>> The key (the ThreadLocal) will be weakly referenced, but the  
>>>>>> values
>>>>>> (now stale) are strongly referenced and won't be actually removed
>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>> Nice huh?
>>>>>>
>>>>>> -Yonik
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


Re: ThreadLocal in SegmentReader

Posted by Adrian Tarau <ad...@gmail.com>.
 From what I know, storing objects in ThreadLocal is safe as long as you 
release the object within a try {} finall {} block or store objects 
which are independent of the rest of the code(no dependencies).Otherwise 
it can get pretty tricky(memory leaks, classloader problems) after awhile.

It is pretty convenient to pass HTTP request information with a 
ThreadLocal in a servlet(but you should cleanup the variable before 
leaving the servlet) but I'm not sure how safe it is in this case.

robert engels wrote:
> Using synchronization is a poor/invalid substitute for thread locals 
> in many cases.
>
> The point of the thread local in these referenced cases is too allow 
> streaming reads on a file descriptor. if you use a shared file 
> descriptor/buffer you are going to continually invalidate the buffer.
>
> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>
>>
>> Well ... SegmentReader uses ThreadLocal to hold a thread-private 
>> instance of TermVectorsReader, to avoid synchronizing per-document 
>> when loading term vectors.
>>
>> Clearing this ThreadLocal value per call to SegmentReader's methods 
>> that load term vectors would defeat its purpose.
>>
>> Though, of course, we then synchronize on the underlying file (when 
>> using FSDirectory), so perhaps we are really not saving much by using 
>> ThreadLocal here.  But we are looking to relax that low level 
>> synchronization with LUCENE-753.
>>
>> Maybe we could make our own ThreadLocal that just uses a HashMap, 
>> which we'd have to synchronize on when getting the per-thread 
>> instances.  Or, go back to sharing a single TermVectorsReader and 
>> synchronize per-document.
>>
>> Jason has suggested moving to a model where you ask the IndexReader 
>> for an object that can return term vectors / stored fields / etc, and 
>> then you interact with that many times to retrieve each doc.  We 
>> could then synchronize only on retrieving that object, and provide a 
>> thread-private instance.
>>
>> It seems like we should move away from using ThreadLocal in Lucene 
>> and do "normal" synchronization instead.
>>
>> Mike
>>
>> Adrian Tarau wrote:
>>
>>> Usually ThreadLocal.remove() should be called at the end(in a 
>>> finally block), before the current call leaves your code.
>>>
>>> Ex : if during searching ThreadLocal is used, every search(..) 
>>> method should cleanup any ThreadLocal variables, or even deeper in 
>>> the implementation. When the call leaves Lucene any used ThreadLocal 
>>> should be cleaned up.
>>>
>>> Michael McCandless wrote:
>>>>
>>>> ThreadLocal, which we use in several places in Lucene, causes a 
>>>> leak in app servers because the classloader never fully deallocates 
>>>> Lucene's classes because the ThreadLocal is holding strong references.
>>>>
>>>> Yet, ThreadLocal is very convenient for avoiding synchronization.
>>>>
>>>> Does anyone have any ideas on how to solve this w/o falling back to 
>>>> "normal" synchronization?
>>>>
>>>> Mike
>>>>
>>>> Begin forwarded message:
>>>>
>>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>> To: java-user@lucene.apache.org
>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>> Reply-To: java-user@lucene.apache.org
>>>>>
>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>> <lu...@mikemccandless.com> wrote:
>>>>>> So now I'm confused: the SegmentReader itself should no longer be 
>>>>>> reachable,
>>>>>> assuming you are not holding any references to your IndexReader.
>>>>>>
>>>>>> Which means the ThreadLocal instance should no longer be reachable.
>>>>>
>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>> The key (the ThreadLocal) will be weakly referenced, but the values
>>>>> (now stale) are strongly referenced and won't be actually removed
>>>>> until the table is resized (under the Java6 impl at least).
>>>>> Nice huh?
>>>>>
>>>>> -Yonik
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by robert engels <re...@ix.netcom.com>.
Using synchronization is a poor/invalid substitute for thread locals  
in many cases.

The point of the thread local in these referenced cases is too allow  
streaming reads on a file descriptor. if you use a shared file  
descriptor/buffer you are going to continually invalidate the buffer.

On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:

>
> Well ... SegmentReader uses ThreadLocal to hold a thread-private  
> instance of TermVectorsReader, to avoid synchronizing per-document  
> when loading term vectors.
>
> Clearing this ThreadLocal value per call to SegmentReader's methods  
> that load term vectors would defeat its purpose.
>
> Though, of course, we then synchronize on the underlying file (when  
> using FSDirectory), so perhaps we are really not saving much by  
> using ThreadLocal here.  But we are looking to relax that low level  
> synchronization with LUCENE-753.
>
> Maybe we could make our own ThreadLocal that just uses a HashMap,  
> which we'd have to synchronize on when getting the per-thread  
> instances.  Or, go back to sharing a single TermVectorsReader and  
> synchronize per-document.
>
> Jason has suggested moving to a model where you ask the IndexReader  
> for an object that can return term vectors / stored fields / etc,  
> and then you interact with that many times to retrieve each doc.   
> We could then synchronize only on retrieving that object, and  
> provide a thread-private instance.
>
> It seems like we should move away from using ThreadLocal in Lucene  
> and do "normal" synchronization instead.
>
> Mike
>
> Adrian Tarau wrote:
>
>> Usually ThreadLocal.remove() should be called at the end(in a  
>> finally block), before the current call leaves your code.
>>
>> Ex : if during searching ThreadLocal is used, every search(..)  
>> method should cleanup any ThreadLocal variables, or even deeper in  
>> the implementation. When the call leaves Lucene any used  
>> ThreadLocal should be cleaned up.
>>
>> Michael McCandless wrote:
>>>
>>> ThreadLocal, which we use in several places in Lucene, causes a  
>>> leak in app servers because the classloader never fully  
>>> deallocates Lucene's classes because the ThreadLocal is holding  
>>> strong references.
>>>
>>> Yet, ThreadLocal is very convenient for avoiding synchronization.
>>>
>>> Does anyone have any ideas on how to solve this w/o falling back  
>>> to "normal" synchronization?
>>>
>>> Mike
>>>
>>> Begin forwarded message:
>>>
>>>> From: "Yonik Seeley" <yo...@apache.org>
>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: ThreadLocal in SegmentReader
>>>> Reply-To: java-user@lucene.apache.org
>>>>
>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>> <lu...@mikemccandless.com> wrote:
>>>>> So now I'm confused: the SegmentReader itself should no longer  
>>>>> be reachable,
>>>>> assuming you are not holding any references to your IndexReader.
>>>>>
>>>>> Which means the ThreadLocal instance should no longer be  
>>>>> reachable.
>>>>
>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>> The key (the ThreadLocal) will be weakly referenced, but the values
>>>> (now stale) are strongly referenced and won't be actually removed
>>>> until the table is resized (under the Java6 impl at least).
>>>> Nice huh?
>>>>
>>>> -Yonik
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Jason Rutherglen <ja...@gmail.com>.
A custom thread local (synchronized hashmap with thread as the key) that
uses a global static manager may work.  The global manager can have a close
method for explicit dereferencing.  The thread local registers itself with
the global thread local manager.  The global manager is static and tied to
the classloader which means it would be garbage collected in the case of a
reloading webapp.

On Tue, Jul 8, 2008 at 6:12 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> Well ... SegmentReader uses ThreadLocal to hold a thread-private instance
> of TermVectorsReader, to avoid synchronizing per-document when loading term
> vectors.
>
> Clearing this ThreadLocal value per call to SegmentReader's methods that
> load term vectors would defeat its purpose.
>
> Though, of course, we then synchronize on the underlying file (when using
> FSDirectory), so perhaps we are really not saving much by using ThreadLocal
> here.  But we are looking to relax that low level synchronization with
> LUCENE-753.
>
> Maybe we could make our own ThreadLocal that just uses a HashMap, which
> we'd have to synchronize on when getting the per-thread instances.  Or, go
> back to sharing a single TermVectorsReader and synchronize per-document.
>
> Jason has suggested moving to a model where you ask the IndexReader for an
> object that can return term vectors / stored fields / etc, and then you
> interact with that many times to retrieve each doc.  We could then
> synchronize only on retrieving that object, and provide a thread-private
> instance.
>
> It seems like we should move away from using ThreadLocal in Lucene and do
> "normal" synchronization instead.
>
> Mike
>
>
> Adrian Tarau wrote:
>
>  Usually ThreadLocal.remove() should be called at the end(in a finally
>> block), before the current call leaves your code.
>>
>> Ex : if during searching ThreadLocal is used, every search(..) method
>> should cleanup any ThreadLocal variables, or even deeper in the
>> implementation. When the call leaves Lucene any used ThreadLocal should be
>> cleaned up.
>>
>> Michael McCandless wrote:
>>
>>>
>>> ThreadLocal, which we use in several places in Lucene, causes a leak in
>>> app servers because the classloader never fully deallocates Lucene's classes
>>> because the ThreadLocal is holding strong references.
>>>
>>> Yet, ThreadLocal is very convenient for avoiding synchronization.
>>>
>>> Does anyone have any ideas on how to solve this w/o falling back to
>>> "normal" synchronization?
>>>
>>> Mike
>>>
>>> Begin forwarded message:
>>>
>>>  From: "Yonik Seeley" <yo...@apache.org>
>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: ThreadLocal in SegmentReader
>>>> Reply-To: java-user@lucene.apache.org
>>>>
>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>> <lu...@mikemccandless.com> wrote:
>>>>
>>>>> So now I'm confused: the SegmentReader itself should no longer be
>>>>> reachable,
>>>>> assuming you are not holding any references to your IndexReader.
>>>>>
>>>>> Which means the ThreadLocal instance should no longer be reachable.
>>>>>
>>>>
>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>> The key (the ThreadLocal) will be weakly referenced, but the values
>>>> (now stale) are strongly referenced and won't be actually removed
>>>> until the table is resized (under the Java6 impl at least).
>>>> Nice huh?
>>>>
>>>> -Yonik
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: ThreadLocal in SegmentReader

Posted by Michael McCandless <lu...@mikemccandless.com>.
Well ... SegmentReader uses ThreadLocal to hold a thread-private  
instance of TermVectorsReader, to avoid synchronizing per-document  
when loading term vectors.

Clearing this ThreadLocal value per call to SegmentReader's methods  
that load term vectors would defeat its purpose.

Though, of course, we then synchronize on the underlying file (when  
using FSDirectory), so perhaps we are really not saving much by using  
ThreadLocal here.  But we are looking to relax that low level  
synchronization with LUCENE-753.

Maybe we could make our own ThreadLocal that just uses a HashMap,  
which we'd have to synchronize on when getting the per-thread  
instances.  Or, go back to sharing a single TermVectorsReader and  
synchronize per-document.

Jason has suggested moving to a model where you ask the IndexReader  
for an object that can return term vectors / stored fields / etc, and  
then you interact with that many times to retrieve each doc.  We could  
then synchronize only on retrieving that object, and provide a thread- 
private instance.

It seems like we should move away from using ThreadLocal in Lucene and  
do "normal" synchronization instead.

Mike

Adrian Tarau wrote:

> Usually ThreadLocal.remove() should be called at the end(in a  
> finally block), before the current call leaves your code.
>
> Ex : if during searching ThreadLocal is used, every search(..)  
> method should cleanup any ThreadLocal variables, or even deeper in  
> the implementation. When the call leaves Lucene any used ThreadLocal  
> should be cleaned up.
>
> Michael McCandless wrote:
>>
>> ThreadLocal, which we use in several places in Lucene, causes a  
>> leak in app servers because the classloader never fully deallocates  
>> Lucene's classes because the ThreadLocal is holding strong  
>> references.
>>
>> Yet, ThreadLocal is very convenient for avoiding synchronization.
>>
>> Does anyone have any ideas on how to solve this w/o falling back to  
>> "normal" synchronization?
>>
>> Mike
>>
>> Begin forwarded message:
>>
>>> From: "Yonik Seeley" <yo...@apache.org>
>>> Date: July 7, 2008 3:30:28 PM EDT
>>> To: java-user@lucene.apache.org
>>> Subject: Re: ThreadLocal in SegmentReader
>>> Reply-To: java-user@lucene.apache.org
>>>
>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>> <lu...@mikemccandless.com> wrote:
>>>> So now I'm confused: the SegmentReader itself should no longer be  
>>>> reachable,
>>>> assuming you are not holding any references to your IndexReader.
>>>>
>>>> Which means the ThreadLocal instance should no longer be reachable.
>>>
>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>> The key (the ThreadLocal) will be weakly referenced, but the values
>>> (now stale) are strongly referenced and won't be actually removed
>>> until the table is resized (under the Java6 impl at least).
>>> Nice huh?
>>>
>>> -Yonik
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Fwd: ThreadLocal in SegmentReader

Posted by Adrian Tarau <ad...@gmail.com>.
Usually ThreadLocal.remove() should be called at the end(in a finally 
block), before the current call leaves your code.

Ex : if during searching ThreadLocal is used, every search(..) method 
should cleanup any ThreadLocal variables, or even deeper in the 
implementation. When the call leaves Lucene any used ThreadLocal should 
be cleaned up.

Michael McCandless wrote:
>
> ThreadLocal, which we use in several places in Lucene, causes a leak 
> in app servers because the classloader never fully deallocates 
> Lucene's classes because the ThreadLocal is holding strong references.
>
> Yet, ThreadLocal is very convenient for avoiding synchronization.
>
> Does anyone have any ideas on how to solve this w/o falling back to 
> "normal" synchronization?
>
> Mike
>
> Begin forwarded message:
>
>> From: "Yonik Seeley" <yo...@apache.org>
>> Date: July 7, 2008 3:30:28 PM EDT
>> To: java-user@lucene.apache.org
>> Subject: Re: ThreadLocal in SegmentReader
>> Reply-To: java-user@lucene.apache.org
>>
>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> So now I'm confused: the SegmentReader itself should no longer be 
>>> reachable,
>>> assuming you are not holding any references to your IndexReader.
>>>
>>> Which means the ThreadLocal instance should no longer be reachable.
>>
>> It will still be referenced from the Thread(s) ThreadLocalMap
>> The key (the ThreadLocal) will be weakly referenced, but the values
>> (now stale) are strongly referenced and won't be actually removed
>> until the table is resized (under the Java6 impl at least).
>> Nice huh?
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Fwd: ThreadLocal in SegmentReader

Posted by Michael McCandless <lu...@mikemccandless.com>.
ThreadLocal, which we use in several places in Lucene, causes a leak  
in app servers because the classloader never fully deallocates  
Lucene's classes because the ThreadLocal is holding strong references.

Yet, ThreadLocal is very convenient for avoiding synchronization.

Does anyone have any ideas on how to solve this w/o falling back to  
"normal" synchronization?

Mike

Begin forwarded message:

> From: "Yonik Seeley" <yo...@apache.org>
> Date: July 7, 2008 3:30:28 PM EDT
> To: java-user@lucene.apache.org
> Subject: Re: ThreadLocal in SegmentReader
> Reply-To: java-user@lucene.apache.org
>
> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> So now I'm confused: the SegmentReader itself should no longer be  
>> reachable,
>> assuming you are not holding any references to your IndexReader.
>>
>> Which means the ThreadLocal instance should no longer be reachable.
>
> It will still be referenced from the Thread(s) ThreadLocalMap
> The key (the ThreadLocal) will be weakly referenced, but the values
> (now stale) are strongly referenced and won't be actually removed
> until the table is resized (under the Java6 impl at least).
> Nice huh?
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
Well, this 'replacement' of the ThreadLocal does not solve the initial
problem. As there's always at least one ThreadLocal which binds the object
loaded by the web-app to the Thread which is _not_ loaded by the web-app,
the classloader never may be unloaded.

You are right, this is not the 'leak' in the usual sense, but this is not
the 'delayed GC' problem, this is the 'never-happening-GC' problem when you
have the classloader which needs to be unloaded.


Michael McCandless-2 wrote:
> 
> 
> After discussing this on java-dev:
> 
>     
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200807.mbox/%3CF5FC94B2-E5C7-40C0-8B73-E12245B91CEE@mikemccandless.com%3E
> 
> it seems that this is not in fact a leak but rather a delayed GC  
> issue.  The objects are eventually freed, on Sun 1.4, 1.5 and 1.6.
> 
> When a ThreadLocal instance becomes unreferenced & GC'd, Java does not  
> immediately reclaim the now-unreferenced objects, if the thread  
> remains alive.  However, over time, as other ThreadLocals are used  
> with that thread, those stale objects do eventually get reclaimed.  So  
> my feeling for now is it's OK for Lucene to continue to use non-static  
> ThreadLocals.
> 
> Mike
> 
> Yonik Seeley wrote:
> 
>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> So now I'm confused: the SegmentReader itself should no longer be  
>>> reachable,
>>> assuming you are not holding any references to your IndexReader.
>>>
>>> Which means the ThreadLocal instance should no longer be reachable.
>>
>> It will still be referenced from the Thread(s) ThreadLocalMap
>> The key (the ThreadLocal) will be weakly referenced, but the values
>> (now stale) are strongly referenced and won't be actually removed
>> until the table is resized (under the Java6 impl at least).
>> Nice huh?
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18416048.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Michael McCandless <lu...@mikemccandless.com>.
After discussing this on java-dev:

     http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200807.mbox/%3CF5FC94B2-E5C7-40C0-8B73-E12245B91CEE@mikemccandless.com%3E

it seems that this is not in fact a leak but rather a delayed GC  
issue.  The objects are eventually freed, on Sun 1.4, 1.5 and 1.6.

When a ThreadLocal instance becomes unreferenced & GC'd, Java does not  
immediately reclaim the now-unreferenced objects, if the thread  
remains alive.  However, over time, as other ThreadLocals are used  
with that thread, those stale objects do eventually get reclaimed.  So  
my feeling for now is it's OK for Lucene to continue to use non-static  
ThreadLocals.

Mike

Yonik Seeley wrote:

> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> So now I'm confused: the SegmentReader itself should no longer be  
>> reachable,
>> assuming you are not holding any references to your IndexReader.
>>
>> Which means the ThreadLocal instance should no longer be reachable.
>
> It will still be referenced from the Thread(s) ThreadLocalMap
> The key (the ThreadLocal) will be weakly referenced, but the values
> (now stale) are strongly referenced and won't be actually removed
> until the table is resized (under the Java6 impl at least).
> Nice huh?
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Michael McCandless <lu...@mikemccandless.com>.
Ugh!  I'll move this to java-dev to brainstorm fixes...

Mike

Yonik Seeley wrote:

> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> So now I'm confused: the SegmentReader itself should no longer be  
>> reachable,
>> assuming you are not holding any references to your IndexReader.
>>
>> Which means the ThreadLocal instance should no longer be reachable.
>
> It will still be referenced from the Thread(s) ThreadLocalMap
> The key (the ThreadLocal) will be weakly referenced, but the values
> (now stale) are strongly referenced and won't be actually removed
> until the table is resized (under the Java6 impl at least).
> Nice huh?
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Yonik Seeley <yo...@apache.org>.
On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> So now I'm confused: the SegmentReader itself should no longer be reachable,
> assuming you are not holding any references to your IndexReader.
>
> Which means the ThreadLocal instance should no longer be reachable.

It will still be referenced from the Thread(s) ThreadLocalMap
The key (the ThreadLocal) will be weakly referenced, but the values
(now stale) are strongly referenced and won't be actually removed
until the table is resized (under the Java6 impl at least).
Nice huh?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Michael McCandless <lu...@mikemccandless.com>.
So now I'm confused: the SegmentReader itself should no longer be  
reachable, assuming you are not holding any references to your  
IndexReader.

Which means the ThreadLocal instance should no longer be reachable.

Which means it should be GC'd and everything it's holding should be  
GC'd as well, even if some threads under it are still alive.

I think?

So it seems like what we currently do should be working fine, yet it's  
apparently not.  Are you certain you don't keep a reference to your  
IndexReader somewhere?  Also, are you sure that you waited long enough  
for GC to reclaim these objects?

Mike

Roman Puchkovskiy wrote:

>
> I've tested a little, and it seems that assigning a null is not  
> sufficient.
> As expected...
> I don't see other ways how to fix this, but I'm not the Lucene  
> developer :)
> Fortunately, there's the work-around with temporary thread.
>
>
> Michael McCandless-2 wrote:
>>
>>
>> Hmmm, I see... you're right.  ThreadLocal is dangerous.
>>
>> So how would you recommend fixing it?
>>
>> One thing we can do, in SegmentReader.close, is to call
>> termVectorsLocal.set(null).  We do this eg in FieldsReader.close,
>> which uses a ThreadLocal to hold thread-private clones of the
>> fieldsStream
>>
>> However, that only works if the very same thread that had opened the
>> reader also calls close... which likely often is the case but in
>> general is not guaranteed and we should not expect/require.
>>
>> How about if we set termVectorsLocal itself to null?  Will GC then  
>> "do
>> the right thing", ie, recognize (eventually) that this ThreadLocal
>> instance is no longer referenced, and clear all Objects for all
>> threads that were held in it?
>>
>> Mike
>>
>> Roman Puchkovskiy wrote:
>>
>>>
>>> Unfortunately, it's not ok sometimes. For instance, when Lucene is
>>> loaded by
>>> a web-application from its WEB-INF/lib and SegmentReader is
>>> initialized
>>> during the application start-up (i.e. it's initialized in the thread
>>> which
>>> will never die), this causes problems with unloading of a
>>> classloader of the
>>> web-application. When web-application is undeployed, there's still a
>>> ThreadLocal in a thread which is external to webapp classloader, and
>>> this
>>> ThreadLocal contains an object which references its class, and this
>>> class
>>> was loaded through web-app classloader and hence references it... so
>>> we have
>>> a chain of hard links from an alive thread to our classloader. In  
>>> the
>>> result, the classloader cannot be unloaded together will all its
>>> classes, so
>>> memory waste is considerable.
>>>
>>> I've found a way to work this around by creating a new thread during
>>> webapp
>>> start-up and executing code which eventually initializes Lucene
>>> indices from
>>> this thread, so all spawned ThreadLocals go to this short-lived
>>> thread and
>>> get garbage-collected shortly after the webapp start-up is finished.
>>> But
>>> this does not seem to be a pretty solution. Besides, one has to
>>> guess that
>>> ThreadLocals are the problem to invent such a work-around :)
>>>
>>>
>>> Michael McCandless-2 wrote:
>>>>
>>>>
>>>> Well ... if the thread dies, the value in its ThreadLocal should be
>>>> GC'd.
>>>>
>>>> If the thread does not die (eg thread pool in an app server) then  
>>>> the
>>>> ThreadLocal value remains, but that value is a shallow clone of the
>>>> original TermVectorsReader and should not be consuming that much  
>>>> RAM
>>>> per thread.
>>>>
>>>> So I think it's OK?
>>>>
>>>> Mike
>>>>
>>>> Roman Puchkovskiy wrote:
>>>>
>>>>>
>>>>> Hi.
>>>>>
>>>>> There's a ThreadLocal field in SegmentReader (it's called
>>>>> termVectorsLocal).
>>>>> Some value is put to it, but it's never cleared.
>>>>> Is it ok? It looks like sometimes this behavior may lead to leaks.
>>>>>
>>>>> This is the same in lucene-2.2.0 and lucene-2.3.2.
>>>>>
>>>>> Thanks in advance.
>>>>> -- 
>>>>> View this message in context:
>>>>> http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18306230.html
>>>>> Sent from the Lucene - Java Users mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18314310.html
>>> Sent from the Lucene - Java Users mailing list archive at  
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18317640.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
I've tested a little, and it seems that assigning a null is not sufficient.
As expected...
I don't see other ways how to fix this, but I'm not the Lucene developer :)
Fortunately, there's the work-around with temporary thread.


Michael McCandless-2 wrote:
> 
> 
> Hmmm, I see... you're right.  ThreadLocal is dangerous.
> 
> So how would you recommend fixing it?
> 
> One thing we can do, in SegmentReader.close, is to call  
> termVectorsLocal.set(null).  We do this eg in FieldsReader.close,  
> which uses a ThreadLocal to hold thread-private clones of the  
> fieldsStream
> 
> However, that only works if the very same thread that had opened the  
> reader also calls close... which likely often is the case but in  
> general is not guaranteed and we should not expect/require.
> 
> How about if we set termVectorsLocal itself to null?  Will GC then "do  
> the right thing", ie, recognize (eventually) that this ThreadLocal  
> instance is no longer referenced, and clear all Objects for all  
> threads that were held in it?
> 
> Mike
> 
> Roman Puchkovskiy wrote:
> 
>>
>> Unfortunately, it's not ok sometimes. For instance, when Lucene is  
>> loaded by
>> a web-application from its WEB-INF/lib and SegmentReader is  
>> initialized
>> during the application start-up (i.e. it's initialized in the thread  
>> which
>> will never die), this causes problems with unloading of a  
>> classloader of the
>> web-application. When web-application is undeployed, there's still a
>> ThreadLocal in a thread which is external to webapp classloader, and  
>> this
>> ThreadLocal contains an object which references its class, and this  
>> class
>> was loaded through web-app classloader and hence references it... so  
>> we have
>> a chain of hard links from an alive thread to our classloader. In the
>> result, the classloader cannot be unloaded together will all its  
>> classes, so
>> memory waste is considerable.
>>
>> I've found a way to work this around by creating a new thread during  
>> webapp
>> start-up and executing code which eventually initializes Lucene  
>> indices from
>> this thread, so all spawned ThreadLocals go to this short-lived  
>> thread and
>> get garbage-collected shortly after the webapp start-up is finished.  
>> But
>> this does not seem to be a pretty solution. Besides, one has to  
>> guess that
>> ThreadLocals are the problem to invent such a work-around :)
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>>
>>> Well ... if the thread dies, the value in its ThreadLocal should be
>>> GC'd.
>>>
>>> If the thread does not die (eg thread pool in an app server) then the
>>> ThreadLocal value remains, but that value is a shallow clone of the
>>> original TermVectorsReader and should not be consuming that much RAM
>>> per thread.
>>>
>>> So I think it's OK?
>>>
>>> Mike
>>>
>>> Roman Puchkovskiy wrote:
>>>
>>>>
>>>> Hi.
>>>>
>>>> There's a ThreadLocal field in SegmentReader (it's called
>>>> termVectorsLocal).
>>>> Some value is put to it, but it's never cleared.
>>>> Is it ok? It looks like sometimes this behavior may lead to leaks.
>>>>
>>>> This is the same in lucene-2.2.0 and lucene-2.3.2.
>>>>
>>>> Thanks in advance.
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18306230.html
>>>> Sent from the Lucene - Java Users mailing list archive at  
>>>> Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18314310.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18317640.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmmm, I see... you're right.  ThreadLocal is dangerous.

So how would you recommend fixing it?

One thing we can do, in SegmentReader.close, is to call  
termVectorsLocal.set(null).  We do this eg in FieldsReader.close,  
which uses a ThreadLocal to hold thread-private clones of the  
fieldsStream

However, that only works if the very same thread that had opened the  
reader also calls close... which likely often is the case but in  
general is not guaranteed and we should not expect/require.

How about if we set termVectorsLocal itself to null?  Will GC then "do  
the right thing", ie, recognize (eventually) that this ThreadLocal  
instance is no longer referenced, and clear all Objects for all  
threads that were held in it?

Mike

Roman Puchkovskiy wrote:

>
> Unfortunately, it's not ok sometimes. For instance, when Lucene is  
> loaded by
> a web-application from its WEB-INF/lib and SegmentReader is  
> initialized
> during the application start-up (i.e. it's initialized in the thread  
> which
> will never die), this causes problems with unloading of a  
> classloader of the
> web-application. When web-application is undeployed, there's still a
> ThreadLocal in a thread which is external to webapp classloader, and  
> this
> ThreadLocal contains an object which references its class, and this  
> class
> was loaded through web-app classloader and hence references it... so  
> we have
> a chain of hard links from an alive thread to our classloader. In the
> result, the classloader cannot be unloaded together will all its  
> classes, so
> memory waste is considerable.
>
> I've found a way to work this around by creating a new thread during  
> webapp
> start-up and executing code which eventually initializes Lucene  
> indices from
> this thread, so all spawned ThreadLocals go to this short-lived  
> thread and
> get garbage-collected shortly after the webapp start-up is finished.  
> But
> this does not seem to be a pretty solution. Besides, one has to  
> guess that
> ThreadLocals are the problem to invent such a work-around :)
>
>
> Michael McCandless-2 wrote:
>>
>>
>> Well ... if the thread dies, the value in its ThreadLocal should be
>> GC'd.
>>
>> If the thread does not die (eg thread pool in an app server) then the
>> ThreadLocal value remains, but that value is a shallow clone of the
>> original TermVectorsReader and should not be consuming that much RAM
>> per thread.
>>
>> So I think it's OK?
>>
>> Mike
>>
>> Roman Puchkovskiy wrote:
>>
>>>
>>> Hi.
>>>
>>> There's a ThreadLocal field in SegmentReader (it's called
>>> termVectorsLocal).
>>> Some value is put to it, but it's never cleared.
>>> Is it ok? It looks like sometimes this behavior may lead to leaks.
>>>
>>> This is the same in lucene-2.2.0 and lucene-2.3.2.
>>>
>>> Thanks in advance.
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18306230.html
>>> Sent from the Lucene - Java Users mailing list archive at  
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18314310.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Roman Puchkovskiy <ro...@blandware.com>.
Unfortunately, it's not ok sometimes. For instance, when Lucene is loaded by
a web-application from its WEB-INF/lib and SegmentReader is initialized
during the application start-up (i.e. it's initialized in the thread which
will never die), this causes problems with unloading of a classloader of the
web-application. When web-application is undeployed, there's still a
ThreadLocal in a thread which is external to webapp classloader, and this
ThreadLocal contains an object which references its class, and this class
was loaded through web-app classloader and hence references it... so we have
a chain of hard links from an alive thread to our classloader. In the
result, the classloader cannot be unloaded together will all its classes, so
memory waste is considerable.

I've found a way to work this around by creating a new thread during webapp
start-up and executing code which eventually initializes Lucene indices from
this thread, so all spawned ThreadLocals go to this short-lived thread and
get garbage-collected shortly after the webapp start-up is finished. But
this does not seem to be a pretty solution. Besides, one has to guess that
ThreadLocals are the problem to invent such a work-around :)


Michael McCandless-2 wrote:
> 
> 
> Well ... if the thread dies, the value in its ThreadLocal should be  
> GC'd.
> 
> If the thread does not die (eg thread pool in an app server) then the  
> ThreadLocal value remains, but that value is a shallow clone of the  
> original TermVectorsReader and should not be consuming that much RAM  
> per thread.
> 
> So I think it's OK?
> 
> Mike
> 
> Roman Puchkovskiy wrote:
> 
>>
>> Hi.
>>
>> There's a ThreadLocal field in SegmentReader (it's called  
>> termVectorsLocal).
>> Some value is put to it, but it's never cleared.
>> Is it ok? It looks like sometimes this behavior may lead to leaks.
>>
>> This is the same in lucene-2.2.0 and lucene-2.3.2.
>>
>> Thanks in advance.
>> -- 
>> View this message in context:
>> http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18306230.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18314310.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ThreadLocal in SegmentReader

Posted by Michael McCandless <lu...@mikemccandless.com>.
Well ... if the thread dies, the value in its ThreadLocal should be  
GC'd.

If the thread does not die (eg thread pool in an app server) then the  
ThreadLocal value remains, but that value is a shallow clone of the  
original TermVectorsReader and should not be consuming that much RAM  
per thread.

So I think it's OK?

Mike

Roman Puchkovskiy wrote:

>
> Hi.
>
> There's a ThreadLocal field in SegmentReader (it's called  
> termVectorsLocal).
> Some value is put to it, but it's never cleared.
> Is it ok? It looks like sometimes this behavior may lead to leaks.
>
> This is the same in lucene-2.2.0 and lucene-2.3.2.
>
> Thanks in advance.
> -- 
> View this message in context: http://www.nabble.com/ThreadLocal-in-SegmentReader-tp18306230p18306230.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org