You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by George Aroush <ge...@aroush.net> on 2009/01/01 23:52:18 UTC

RE: Problems with patch for LUCENENET-106

Hi Luc,

I have to agree with Tim.

The .NET's implementation / port of WeakHashMap in Lucene.Net is consistent
with Java Lucene.  If there is a "very" small chance of a hash key
collusion, it is also possible in the Java version of Lucene.  Here is why.

In Java, when WeakHashMap.put(Object key, Object value) is called, this
function calls key.hashCode() to get a hash value (I had a look at Java's
source code for WeakHashMap.)  The same logic is happening in .NET's
implementation / port of WeakHashMap.  In addition, at least for now, the
"key" is always of type IndexReader.  Looking in IndexReader code, there is
no override of Java::hashCode() / C#::GetHashCode().  So, unless if Java's
hashCode() guaranties uniqueness (which it doesn't), then the possibility of
a hash value collusion also exists in the Java version of Lucene.  If this
is the case, then we have a valid defect in Lucene and it should be
addressed.

Erik, do you agree, can you comment?

Regarding #2, & #3, I agree that the .NET implementation of WeakHashMap
could have been confined to just get() / put().  Since you already have a
patch for Lucene.Net 2.1, please submit a new JIRA issue and attach your
patch to it (when you are ready.)  I look forward to your patch.

Regards, and Happy New Year everyone!

-- George


> -----Original Message-----
> From: Timothy Januario [mailto:Timothy.Januario@BDMetrics.com] 
> Sent: Wednesday, December 31, 2008 9:44 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Problems with patch for LUCENENET-106
> 
> Luc,
> I'm not sure that I completely agree with your assessment of 
> the WeakHashMap.  
> 
> It does assume that the hash value is unique which is a 
> correct usage.  It is the responsibility of 
> object.GetHashCode() to return a unique value.  This is a 
> problem in any Hashtable implementation and although I 
> haven't seen the internal java code, I assume that it has the 
> same limitation.  The object would be the key only 
> superficially as the hashcode would be used internally as 
> well (based on the name of the object, WeakHashMap, anyway).  
> If this is incorrect, I apologize for the noise.  I did, 
> however, notice that GetHashCode() is not overridden in the 
> IndexReader and since the .NET framework does not guarantee a 
> unique value (according to the documentation for 
> oject.GetHashCode()), this could present the problem that you 
> have identified.  Is there a need to guarantee uniqueness by 
> overriding the method or does anyone know if the value is 
> always unique already?  When the Cache object was implemented 
> with Hashtable prior to this patch, the same problem would 
> have been inherent with that implementation (of course with 
> the additional bonus of the memory leak ;)) and I don't think 
> I ever saw collisions although quite honestly I was never 
> really looking for them.
> 
> Also, you said that the call to WeakHashMap.Keys could return 
> NULL values.  This may be the reason why the Java 
> implementation supports null keys.  It does seem to be a 
> possibility unless the call to Keys checks that 
> WeakReference.Key.Target is not null at the time of iteration 
> (this would eliminate this possibility since we would now 
> have a strong reference to the object ie: object key = 
> entries[i].Key.Target; if (object != null) 
> keys.Add(object);).  The same problem would exist with the 
> Values property.
> 
> Also, optimizing for few IndexReaders in the WeakHashMap is 
> an assumption that specializes the class for IndexReaders.  
> There is no guarantee that this will be the only use case in 
> the future.  The Java documentation says that the same 
> initial parameters (loadfactor and capacity) as HashMap are 
> used which suggests that it should be left in the .NET 
> implementation with the same parameters as the default 
> Hashtable which is its underlying container.
> 
> Comments?
> -tim
> 
> -----Original Message-----
> From: Vanlerberghe, Luc [mailto:Luc.Vanlerberghe@bvdep.com]
> Sent: Wednesday, December 31, 2008 4:48 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: Problems with patch for LUCENENET-106
> 
> Hi,
> 
> I reviewed the fix for LUCENENET-106 and I see a couple of issues:
> 
> 1. The implementation of WeakHashMap in SupportClass.cs has a 
> bug in that it assumes all keys will have a unique HashCode.  
> Although the chances of this occurring are *very* small, it 
> cannot be guaranteed. A collision would mean that the cached 
> value for one indexReader could be returned as cached value 
> for another indexReader (using the way this class is used in 
> Lucene as example)
> 
> 2. This class attempts to be a complete port of the 
> java.util.WeakHashMap, but:
> - A correct implementation is difficult to implement, and 
> .Net does not have an equivalent of 
> java.lang.ref.ReferenceQueue to accelerate the cleanup process.
> - Since the behaviour of the WeakHashMap is closely tied to 
> the behaviour of the garbage collector, it is very difficult 
> to test properly, if at all.
> - Lucene only uses the get(Object key) and put(K key, V value) methods
> - In Lucene, the keys are IndexReader instances.  In normal 
> use cases there will be only 1 IndexReader live, and perhaps 
> another one if 'warming up' is used before switching 
> IndexSearchers after an index update.
> - If this class is presented as a complete port of 
> java.util.WeakHashMap, users might assume this is production 
> quality code and copy/paste its code in their own projects.  
> Any further bugs found might harm the credibility of the 
> Lucene project, even if those sections are never used in 
> Lucene. (e.g.: The collection returned by Keys might contain 
> null values.  There's no guarantee the GC won't intervene 
> between the call to Cleanup and the calls to keys.Add(w.Key.Target). 
> 
> 3. Most support classes that are needed for the conversion of 
> java to .Net are in the anonymous namespace, but this one is 
> in Lucene.Net.Util.  I would propose to keep the namespaces 
> corresponding to the original java packages clean and either 
> put all support classes in the anonymous namespace or in a 
> Lucene.Net specific one (Lucene.Net.DotNetPort ?) (I see that 
> George moved it already to SupportClass, thanks George!)
> 
> I have been using the java version of Lucene in a project for 
> a long time.
> For a new project that is currently under development, we 
> will be using the .Net version.
> For now we are using the 2.1 version using a custom patch to 
> avoid the memory leak problem.
> 
> I'll post an up to date version of this patch as a 
> replacement for the current implementation of 
> Lucene.Net.Util.WeakHashMap sometime next week...
> - It doesn't try to be a complete implementation of 
> WeakHashMap: only the methods strictly necessary are 
> implemented, all other methods throw a 
> NotImplementedException. (It should probably be renamed to 
> SimplifiedWeakHashMap or something)
> - It's optimized for the 'low number of keys' case.
> - It's simple code, but I want to test it thoroughly first.  
> The original patch was put directly in the FieldCacheImpl 
> code, I want to make sure I didn't make some stupid mistake 
> while converting it.
> 
> I have an account on Jira, but I don't have rights to reopen 
> the issue...
> 
> Regards,
> 
> Luc
> 
> 
> -----Original Message-----
> From: Digy (JIRA) [mailto:jira@apache.org]
> Sent: zaterdag 27 december 2008 19:47
> To: lucene-net-dev@incubator.apache.org
> Subject: [jira] Closed: () Lucene.NET (Revision: 603121) is 
> leaking memory
> 
> 
>      [ 
> https://issues.apache.org/jira/browse/LUCENENET-106?page=com.a
tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> 
> Digy closed LUCENENET-106.
> --------------------------
> 
>     Resolution: Fixed
>       Assignee: Digy
> 
> Patches are applied.
> 
> DIGY.
> 
> > Lucene.NET (Revision: 603121) is leaking memory
> > -----------------------------------------------
> >
> >                 Key: LUCENENET-106
> >                 URL: 
> https://issues.apache.org/jira/browse/LUCENENET-106
> >             Project: Lucene.Net
> >          Issue Type: Bug
> >         Environment: .NET 2.0
> >            Reporter: Anton K.
> >            Assignee: Digy
> >            Priority: Critical
> >         Attachments: DIGY-FieldCacheImpl.patch, Digy.rar, 
> luceneSrc_memUsage.patch, Paches for v2.3.1.rar, 
> WeakHashTable+FieldCacheImpl.rar, WeakReferences.rar
> >
> >
> > readerCache Hashtable field (see FieldCacheImpl.cs) never 
> releases some hash items that have closed IndexReader object 
> as a key. So a lot of Term instances are never released.
> > Java version of Lucene uses WeakHashMap and therefore 
> doesn't have this problem.
> > This bug can be reproduced only when Sort functionality 
> used during search. 
> > See following link for additional information.
> > http://www.gossamer-threads.com/lists/lucene/java-user/55681
> > Steps to reproduce:
> > 1)Create index
> > 2) Modify index by IndexWiter; Close IndexWriter
> > 3) Use IndexSearcher for searching with Sort; Close InexSearcher
> > 4) Go to step 2
> > You'll get OutOfMemoryException after some time of running 
> this algorithm.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
>

RE: Problems with patch for LUCENENET-106

Posted by George Aroush <ge...@aroush.net>.

The fix should be in the implementation / port of WeakHashMap, and not in
Lucene.Net base code.

I believe Luc is working on providing an alternative patch / fix.

-- George

> -----Original Message-----
> From: Timothy Januario [mailto:Timothy.Januario@BDMetrics.com] 
> Sent: Monday, January 05, 2009 10:19 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Problems with patch for LUCENENET-106
> 
> I stand corrected.  I do see the possibility of the collision 
> as a potentially serious bug.  The java version still uses 
> the object as the key which means that if the hashcode is 
> identical across all instances, it just becomes less 
> efficient.  In the current .NET implementation, it stays very 
> efficient, it would just only ever have a single instance in 
> the cache which is a very different and incorrect result.
> 
> -----Original Message-----
> From: Eyal Post [mailto:eyalpost@epocalipse.com]
> Sent: Saturday, January 03, 2009 1:56 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Problems with patch for LUCENENET-106
> 
> I tend to agree with Luc. 
> Hashtables don't rely on small chance for collisions to 
> operate correctly.
> They only rely on it to operate efficiently. 
> This means you can store 2 elements with the same hashcode in 
> a hashtable and they won't override each other, but the 
> hashtable lookups will be less efficient with collisions.
> The problem with code provided in the patch is that it uses 
> the hashcode as the key in the hashtable and altough the 
> chance of collisions is small once they happen the results 
> are undefined.
> I think the best way to solve that is to use WeakEntry as the 
> key and override both GetHashCode and Equals. GetHashCode 
> should return the HashCode field and Equals should compare the Keys
> 
> 
> Eyal Post
>  
> 
> > -----Original Message-----
> > From: George Aroush [mailto:george@aroush.net]
> > Sent: Friday, January 02, 2009 0:52 AM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: RE: Problems with patch for LUCENENET-106
> > 
> > Hi Luc,
> > 
> > I have to agree with Tim.
> > 
> > The .NET's implementation / port of WeakHashMap in Lucene.Net is 
> > consistent with Java Lucene.  If there is a "very" small 
> chance of a 
> > hash key collusion, it is also possible in the Java version 
> of Lucene.  
> > Here is why.
> > 
> > In Java, when WeakHashMap.put(Object key, Object value) is called, 
> > this function calls key.hashCode() to get a hash value (I 
> had a look 
> > at Java's source code for WeakHashMap.) The same logic is 
> happening in 
> > .NET's implementation / port of WeakHashMap.  In addition, at least 
> > for now, the "key" is always of type IndexReader.  Looking in 
> > IndexReader code, there is no override of Java::hashCode() / 
> > C#::GetHashCode().
> >  So, unless if Java's
> > hashCode() guaranties uniqueness (which it doesn't), then the 
> > possibility of a hash value collusion also exists in the 
> Java version 
> > of Lucene.  If this is the case, then we have a valid 
> defect in Lucene 
> > and it should be addressed.
> > 
> > Erik, do you agree, can you comment?
> > 
> > Regarding #2, & #3, I agree that the .NET implementation of 
> > WeakHashMap could have been confined to just get() / put().
> > Since you already have a patch for Lucene.Net 2.1, please 
> submit a new 
> > JIRA issue and attach your patch to it (when you are 
> ready.)  I look 
> > forward to your patch.
> > 
> > Regards, and Happy New Year everyone!
> > 
> > -- George
> > 
> > 
> > > -----Original Message-----
> > > From: Timothy Januario [mailto:Timothy.Januario@BDMetrics.com]
> > > Sent: Wednesday, December 31, 2008 9:44 AM
> > > To: lucene-net-dev@incubator.apache.org
> > > Subject: RE: Problems with patch for LUCENENET-106
> > > 
> > > Luc,
> > > I'm not sure that I completely agree with your assessment of the 
> > > WeakHashMap.
> > > 
> > > It does assume that the hash value is unique which is a
> > correct usage.  
> > > It is the responsibility of
> > > object.GetHashCode() to return a unique value.  This is a
> > problem in
> > > any Hashtable implementation and although I haven't seen
> > the internal
> > > java code, I assume that it has the same limitation.  The
> > object would
> > > be the key only superficially as the hashcode would be used
> > internally
> > > as well (based on the name of the object, WeakHashMap, anyway).
> > > If this is incorrect, I apologize for the noise.  I did, however, 
> > > notice that GetHashCode() is not overridden in the 
> IndexReader and 
> > > since the .NET framework does not guarantee a unique value
> > (according
> > > to the documentation for oject.GetHashCode()), this could
> > present the
> > > problem that you have identified.  Is there a need to guarantee 
> > > uniqueness by overriding the method or does anyone know if
> > the value
> > > is always unique already?  When the Cache object was
> > implemented with
> > > Hashtable prior to this patch, the same problem would have been 
> > > inherent with that implementation (of course with the
> > additional bonus
> > > of the memory leak ;)) and I don't think I ever saw collisions 
> > > although quite honestly I was never really looking for them.
> > > 
> > > Also, you said that the call to WeakHashMap.Keys could 
> return NULL 
> > > values.  This may be the reason why the Java implementation
> > supports
> > > null keys.  It does seem to be a possibility unless the
> > call to Keys
> > > checks that WeakReference.Key.Target is not null at the time of 
> > > iteration (this would eliminate this possibility since we 
> would now 
> > > have a strong reference to the object ie: object key = 
> > > entries[i].Key.Target; if (object != null) 
> keys.Add(object);).  The 
> > > same problem would exist with the Values property.
> > > 
> > > Also, optimizing for few IndexReaders in the WeakHashMap is an 
> > > assumption that specializes the class for IndexReaders.
> > > There is no guarantee that this will be the only use case in the 
> > > future.  The Java documentation says that the same initial
> > parameters
> > > (loadfactor and capacity) as HashMap are used which
> > suggests that it
> > > should be left in the .NET implementation with the same
> > parameters as
> > > the default Hashtable which is its underlying container.
> > > 
> > > Comments?
> > > -tim
> > > 
> > > -----Original Message-----
> > > From: Vanlerberghe, Luc [mailto:Luc.Vanlerberghe@bvdep.com]
> > > Sent: Wednesday, December 31, 2008 4:48 AM
> > > To: lucene-net-dev@incubator.apache.org
> > > Subject: Problems with patch for LUCENENET-106
> > > 
> > > Hi,
> > > 
> > > I reviewed the fix for LUCENENET-106 and I see a couple of issues:
> > > 
> > > 1. The implementation of WeakHashMap in SupportClass.cs has
> > a bug in
> > > that it assumes all keys will have a unique HashCode.
> > > Although the chances of this occurring are *very* small, it
> > cannot be
> > > guaranteed. A collision would mean that the cached value for one 
> > > indexReader could be returned as cached value for another
> > indexReader
> > > (using the way this class is used in Lucene as example)
> > > 
> > > 2. This class attempts to be a complete port of the 
> > > java.util.WeakHashMap, but:
> > > - A correct implementation is difficult to implement, and 
> .Net does 
> > > not have an equivalent of java.lang.ref.ReferenceQueue to
> > accelerate
> > > the cleanup process.
> > > - Since the behaviour of the WeakHashMap is closely tied to the 
> > > behaviour of the garbage collector, it is very difficult to test 
> > > properly, if at all.
> > > - Lucene only uses the get(Object key) and put(K key, V
> > value) methods
> > > - In Lucene, the keys are IndexReader instances.  In normal
> > use cases
> > > there will be only 1 IndexReader live, and perhaps another one if 
> > > 'warming up' is used before switching IndexSearchers 
> after an index 
> > > update.
> > > - If this class is presented as a complete port of 
> > > java.util.WeakHashMap, users might assume this is
> > production quality
> > > code and copy/paste its code in their own projects.
> > > Any further bugs found might harm the credibility of the Lucene 
> > > project, even if those sections are never used in Lucene.
> > (e.g.: The
> > > collection returned by Keys might contain null values.  
> There's no 
> > > guarantee the GC won't intervene between the call to
> > Cleanup and the
> > > calls to keys.Add(w.Key.Target).
> > > 
> > > 3. Most support classes that are needed for the conversion
> > of java to
> > > .Net are in the anonymous namespace, but this one is in 
> > > Lucene.Net.Util.  I would propose to keep the namespaces
> > corresponding
> > > to the original java packages clean and either put all
> > support classes
> > > in the anonymous namespace or in a Lucene.Net specific one 
> > > (Lucene.Net.DotNetPort ?) (I see that George moved it already to 
> > > SupportClass, thanks George!)
> > > 
> > > I have been using the java version of Lucene in a project
> > for a long
> > > time.
> > > For a new project that is currently under development, we will be 
> > > using the .Net version.
> > > For now we are using the 2.1 version using a custom patch
> > to avoid the
> > > memory leak problem.
> > > 
> > > I'll post an up to date version of this patch as a
> > replacement for the
> > > current implementation of Lucene.Net.Util.WeakHashMap 
> sometime next 
> > > week...
> > > - It doesn't try to be a complete implementation of
> > > WeakHashMap: only the methods strictly necessary are
> > implemented, all
> > > other methods throw a NotImplementedException. (It should
> > probably be
> > > renamed to SimplifiedWeakHashMap or something)
> > > - It's optimized for the 'low number of keys' case.
> > > - It's simple code, but I want to test it thoroughly first.  
> > > The original patch was put directly in the FieldCacheImpl
> > code, I want
> > > to make sure I didn't make some stupid mistake while 
> converting it.
> > > 
> > > I have an account on Jira, but I don't have rights to reopen the 
> > > issue...
> > > 
> > > Regards,
> > > 
> > > Luc
> > > 
> > > 
> > > -----Original Message-----
> > > From: Digy (JIRA) [mailto:jira@apache.org]
> > > Sent: zaterdag 27 december 2008 19:47
> > > To: lucene-net-dev@incubator.apache.org
> > > Subject: [jira] Closed: () Lucene.NET (Revision: 603121) 
> is leaking 
> > > memory
> > > 
> > > 
> > >      [
> > > https://issues.apache.org/jira/browse/LUCENENET-106?page=com.a
> > tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> > > 
> > > Digy closed LUCENENET-106.
> > > --------------------------
> > > 
> > >     Resolution: Fixed
> > >       Assignee: Digy
> > > 
> > > Patches are applied.
> > > 
> > > DIGY.
> > > 
> > > > Lucene.NET (Revision: 603121) is leaking memory
> > > > -----------------------------------------------
> > > >
> > > >                 Key: LUCENENET-106
> > > >                 URL: 
> > > https://issues.apache.org/jira/browse/LUCENENET-106
> > > >             Project: Lucene.Net
> > > >          Issue Type: Bug
> > > >         Environment: .NET 2.0
> > > >            Reporter: Anton K.
> > > >            Assignee: Digy
> > > >            Priority: Critical
> > > >         Attachments: DIGY-FieldCacheImpl.patch, Digy.rar,
> > > luceneSrc_memUsage.patch, Paches for v2.3.1.rar,
> > > WeakHashTable+FieldCacheImpl.rar, WeakReferences.rar
> > > >
> > > >
> > > > readerCache Hashtable field (see FieldCacheImpl.cs) never
> > > releases some hash items that have closed IndexReader
> > object as a key. 
> > > So a lot of Term instances are never released.
> > > > Java version of Lucene uses WeakHashMap and therefore
> > > doesn't have this problem.
> > > > This bug can be reproduced only when Sort functionality
> > > used during search. 
> > > > See following link for additional information.
> > > > http://www.gossamer-threads.com/lists/lucene/java-user/55681
> > > > Steps to reproduce:
> > > > 1)Create index
> > > > 2) Modify index by IndexWiter; Close IndexWriter
> > > > 3) Use IndexSearcher for searching with Sort; Close InexSearcher
> > > > 4) Go to step 2
> > > > You'll get OutOfMemoryException after some time of running
> > > this algorithm.
> > > 
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > You can reply to this email to add a comment to the issue online.
> > > 
> > > 
> > > 
> > 
> > 
>

RE: Problems with patch for LUCENENET-106

Posted by Timothy Januario <Ti...@BDMetrics.com>.

I stand corrected.  I do see the possibility of the collision as a
potentially serious bug.  The java version still uses the object as the
key which means that if the hashcode is identical across all instances,
it just becomes less efficient.  In the current .NET implementation, it
stays very efficient, it would just only ever have a single instance in
the cache which is a very different and incorrect result.

-----Original Message-----
From: Eyal Post [mailto:eyalpost@epocalipse.com] 
Sent: Saturday, January 03, 2009 1:56 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Problems with patch for LUCENENET-106

I tend to agree with Luc. 
Hashtables don't rely on small chance for collisions to operate
correctly.
They only rely on it to operate efficiently. 
This means you can store 2 elements with the same hashcode in a
hashtable
and they won't override each other, but the hashtable lookups will be
less
efficient with collisions.
The problem with code provided in the patch is that it uses the hashcode
as
the key in the hashtable and altough the chance of collisions is small
once
they happen the results are undefined.
I think the best way to solve that is to use WeakEntry as the key and
override both GetHashCode and Equals. GetHashCode should return the
HashCode
field and Equals should compare the Keys


Eyal Post
 

> -----Original Message-----
> From: George Aroush [mailto:george@aroush.net] 
> Sent: Friday, January 02, 2009 0:52 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Problems with patch for LUCENENET-106
> 
> Hi Luc,
> 
> I have to agree with Tim.
> 
> The .NET's implementation / port of WeakHashMap in Lucene.Net 
> is consistent with Java Lucene.  If there is a "very" small 
> chance of a hash key collusion, it is also possible in the 
> Java version of Lucene.  Here is why.
> 
> In Java, when WeakHashMap.put(Object key, Object value) is 
> called, this function calls key.hashCode() to get a hash 
> value (I had a look at Java's source code for WeakHashMap.)  
> The same logic is happening in .NET's implementation / port 
> of WeakHashMap.  In addition, at least for now, the "key" is 
> always of type IndexReader.  Looking in IndexReader code, 
> there is no override of Java::hashCode() / C#::GetHashCode(). 
>  So, unless if Java's
> hashCode() guaranties uniqueness (which it doesn't), then the 
> possibility of a hash value collusion also exists in the Java 
> version of Lucene.  If this is the case, then we have a valid 
> defect in Lucene and it should be addressed.
> 
> Erik, do you agree, can you comment?
> 
> Regarding #2, & #3, I agree that the .NET implementation of 
> WeakHashMap could have been confined to just get() / put().  
> Since you already have a patch for Lucene.Net 2.1, please 
> submit a new JIRA issue and attach your patch to it (when you 
> are ready.)  I look forward to your patch.
> 
> Regards, and Happy New Year everyone!
> 
> -- George
> 
> 
> > -----Original Message-----
> > From: Timothy Januario [mailto:Timothy.Januario@BDMetrics.com]
> > Sent: Wednesday, December 31, 2008 9:44 AM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: RE: Problems with patch for LUCENENET-106
> > 
> > Luc,
> > I'm not sure that I completely agree with your assessment of the 
> > WeakHashMap.
> > 
> > It does assume that the hash value is unique which is a 
> correct usage.  
> > It is the responsibility of
> > object.GetHashCode() to return a unique value.  This is a 
> problem in 
> > any Hashtable implementation and although I haven't seen 
> the internal 
> > java code, I assume that it has the same limitation.  The 
> object would 
> > be the key only superficially as the hashcode would be used 
> internally 
> > as well (based on the name of the object, WeakHashMap, anyway).
> > If this is incorrect, I apologize for the noise.  I did, however, 
> > notice that GetHashCode() is not overridden in the IndexReader and 
> > since the .NET framework does not guarantee a unique value 
> (according 
> > to the documentation for oject.GetHashCode()), this could 
> present the 
> > problem that you have identified.  Is there a need to guarantee 
> > uniqueness by overriding the method or does anyone know if 
> the value 
> > is always unique already?  When the Cache object was 
> implemented with 
> > Hashtable prior to this patch, the same problem would have been 
> > inherent with that implementation (of course with the 
> additional bonus 
> > of the memory leak ;)) and I don't think I ever saw collisions 
> > although quite honestly I was never really looking for them.
> > 
> > Also, you said that the call to WeakHashMap.Keys could return NULL 
> > values.  This may be the reason why the Java implementation 
> supports 
> > null keys.  It does seem to be a possibility unless the 
> call to Keys 
> > checks that WeakReference.Key.Target is not null at the time of 
> > iteration (this would eliminate this possibility since we would now 
> > have a strong reference to the object ie: object key = 
> > entries[i].Key.Target; if (object != null) keys.Add(object);).  The 
> > same problem would exist with the Values property.
> > 
> > Also, optimizing for few IndexReaders in the WeakHashMap is an 
> > assumption that specializes the class for IndexReaders.
> > There is no guarantee that this will be the only use case in the 
> > future.  The Java documentation says that the same initial 
> parameters 
> > (loadfactor and capacity) as HashMap are used which 
> suggests that it 
> > should be left in the .NET implementation with the same 
> parameters as 
> > the default Hashtable which is its underlying container.
> > 
> > Comments?
> > -tim
> > 
> > -----Original Message-----
> > From: Vanlerberghe, Luc [mailto:Luc.Vanlerberghe@bvdep.com]
> > Sent: Wednesday, December 31, 2008 4:48 AM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: Problems with patch for LUCENENET-106
> > 
> > Hi,
> > 
> > I reviewed the fix for LUCENENET-106 and I see a couple of issues:
> > 
> > 1. The implementation of WeakHashMap in SupportClass.cs has 
> a bug in 
> > that it assumes all keys will have a unique HashCode.
> > Although the chances of this occurring are *very* small, it 
> cannot be 
> > guaranteed. A collision would mean that the cached value for one 
> > indexReader could be returned as cached value for another 
> indexReader 
> > (using the way this class is used in Lucene as example)
> > 
> > 2. This class attempts to be a complete port of the 
> > java.util.WeakHashMap, but:
> > - A correct implementation is difficult to implement, and .Net does 
> > not have an equivalent of java.lang.ref.ReferenceQueue to 
> accelerate 
> > the cleanup process.
> > - Since the behaviour of the WeakHashMap is closely tied to the 
> > behaviour of the garbage collector, it is very difficult to test 
> > properly, if at all.
> > - Lucene only uses the get(Object key) and put(K key, V 
> value) methods
> > - In Lucene, the keys are IndexReader instances.  In normal 
> use cases 
> > there will be only 1 IndexReader live, and perhaps another one if 
> > 'warming up' is used before switching IndexSearchers after an index 
> > update.
> > - If this class is presented as a complete port of 
> > java.util.WeakHashMap, users might assume this is 
> production quality 
> > code and copy/paste its code in their own projects.
> > Any further bugs found might harm the credibility of the Lucene 
> > project, even if those sections are never used in Lucene. 
> (e.g.: The 
> > collection returned by Keys might contain null values.  There's no 
> > guarantee the GC won't intervene between the call to 
> Cleanup and the 
> > calls to keys.Add(w.Key.Target).
> > 
> > 3. Most support classes that are needed for the conversion 
> of java to 
> > .Net are in the anonymous namespace, but this one is in 
> > Lucene.Net.Util.  I would propose to keep the namespaces 
> corresponding 
> > to the original java packages clean and either put all 
> support classes 
> > in the anonymous namespace or in a Lucene.Net specific one 
> > (Lucene.Net.DotNetPort ?) (I see that George moved it already to 
> > SupportClass, thanks George!)
> > 
> > I have been using the java version of Lucene in a project 
> for a long 
> > time.
> > For a new project that is currently under development, we will be 
> > using the .Net version.
> > For now we are using the 2.1 version using a custom patch 
> to avoid the 
> > memory leak problem.
> > 
> > I'll post an up to date version of this patch as a 
> replacement for the 
> > current implementation of Lucene.Net.Util.WeakHashMap sometime next 
> > week...
> > - It doesn't try to be a complete implementation of
> > WeakHashMap: only the methods strictly necessary are 
> implemented, all 
> > other methods throw a NotImplementedException. (It should 
> probably be 
> > renamed to SimplifiedWeakHashMap or something)
> > - It's optimized for the 'low number of keys' case.
> > - It's simple code, but I want to test it thoroughly first.  
> > The original patch was put directly in the FieldCacheImpl 
> code, I want 
> > to make sure I didn't make some stupid mistake while converting it.
> > 
> > I have an account on Jira, but I don't have rights to reopen the 
> > issue...
> > 
> > Regards,
> > 
> > Luc
> > 
> > 
> > -----Original Message-----
> > From: Digy (JIRA) [mailto:jira@apache.org]
> > Sent: zaterdag 27 december 2008 19:47
> > To: lucene-net-dev@incubator.apache.org
> > Subject: [jira] Closed: () Lucene.NET (Revision: 603121) is leaking 
> > memory
> > 
> > 
> >      [
> > https://issues.apache.org/jira/browse/LUCENENET-106?page=com.a
> tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> > 
> > Digy closed LUCENENET-106.
> > --------------------------
> > 
> >     Resolution: Fixed
> >       Assignee: Digy
> > 
> > Patches are applied.
> > 
> > DIGY.
> > 
> > > Lucene.NET (Revision: 603121) is leaking memory
> > > -----------------------------------------------
> > >
> > >                 Key: LUCENENET-106
> > >                 URL: 
> > https://issues.apache.org/jira/browse/LUCENENET-106
> > >             Project: Lucene.Net
> > >          Issue Type: Bug
> > >         Environment: .NET 2.0
> > >            Reporter: Anton K.
> > >            Assignee: Digy
> > >            Priority: Critical
> > >         Attachments: DIGY-FieldCacheImpl.patch, Digy.rar,
> > luceneSrc_memUsage.patch, Paches for v2.3.1.rar,
> > WeakHashTable+FieldCacheImpl.rar, WeakReferences.rar
> > >
> > >
> > > readerCache Hashtable field (see FieldCacheImpl.cs) never
> > releases some hash items that have closed IndexReader 
> object as a key. 
> > So a lot of Term instances are never released.
> > > Java version of Lucene uses WeakHashMap and therefore
> > doesn't have this problem.
> > > This bug can be reproduced only when Sort functionality
> > used during search. 
> > > See following link for additional information.
> > > http://www.gossamer-threads.com/lists/lucene/java-user/55681
> > > Steps to reproduce:
> > > 1)Create index
> > > 2) Modify index by IndexWiter; Close IndexWriter
> > > 3) Use IndexSearcher for searching with Sort; Close InexSearcher
> > > 4) Go to step 2
> > > You'll get OutOfMemoryException after some time of running
> > this algorithm.
> > 
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> > 
> > 
> > 
> 
>

RE: Problems with patch for LUCENENET-106

Posted by Eyal Post <ey...@epocalipse.com>.

I tend to agree with Luc. 
Hashtables don't rely on small chance for collisions to operate correctly.
They only rely on it to operate efficiently. 
This means you can store 2 elements with the same hashcode in a hashtable
and they won't override each other, but the hashtable lookups will be less
efficient with collisions.
The problem with code provided in the patch is that it uses the hashcode as
the key in the hashtable and altough the chance of collisions is small once
they happen the results are undefined.
I think the best way to solve that is to use WeakEntry as the key and
override both GetHashCode and Equals. GetHashCode should return the HashCode
field and Equals should compare the Keys

Eyal Post
 

> -----Original Message-----
> From: George Aroush [mailto:george@aroush.net] 
> Sent: Friday, January 02, 2009 0:52 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Problems with patch for LUCENENET-106
> 
> Hi Luc,
> 
> I have to agree with Tim.
> 
> The .NET's implementation / port of WeakHashMap in Lucene.Net 
> is consistent with Java Lucene.  If there is a "very" small 
> chance of a hash key collusion, it is also possible in the 
> Java version of Lucene.  Here is why.
> 
> In Java, when WeakHashMap.put(Object key, Object value) is 
> called, this function calls key.hashCode() to get a hash 
> value (I had a look at Java's source code for WeakHashMap.)  
> The same logic is happening in .NET's implementation / port 
> of WeakHashMap.  In addition, at least for now, the "key" is 
> always of type IndexReader.  Looking in IndexReader code, 
> there is no override of Java::hashCode() / C#::GetHashCode(). 
>  So, unless if Java's
> hashCode() guaranties uniqueness (which it doesn't), then the 
> possibility of a hash value collusion also exists in the Java 
> version of Lucene.  If this is the case, then we have a valid 
> defect in Lucene and it should be addressed.
> 
> Erik, do you agree, can you comment?
> 
> Regarding #2, & #3, I agree that the .NET implementation of 
> WeakHashMap could have been confined to just get() / put().  
> Since you already have a patch for Lucene.Net 2.1, please 
> submit a new JIRA issue and attach your patch to it (when you 
> are ready.)  I look forward to your patch.
> 
> Regards, and Happy New Year everyone!
> 
> -- George
> 
> 
> > -----Original Message-----
> > From: Timothy Januario [mailto:Timothy.Januario@BDMetrics.com]
> > Sent: Wednesday, December 31, 2008 9:44 AM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: RE: Problems with patch for LUCENENET-106
> > 
> > Luc,
> > I'm not sure that I completely agree with your assessment of the 
> > WeakHashMap.
> > 
> > It does assume that the hash value is unique which is a 
> correct usage.  
> > It is the responsibility of
> > object.GetHashCode() to return a unique value.  This is a 
> problem in 
> > any Hashtable implementation and although I haven't seen 
> the internal 
> > java code, I assume that it has the same limitation.  The 
> object would 
> > be the key only superficially as the hashcode would be used 
> internally 
> > as well (based on the name of the object, WeakHashMap, anyway).
> > If this is incorrect, I apologize for the noise.  I did, however, 
> > notice that GetHashCode() is not overridden in the IndexReader and 
> > since the .NET framework does not guarantee a unique value 
> (according 
> > to the documentation for oject.GetHashCode()), this could 
> present the 
> > problem that you have identified.  Is there a need to guarantee 
> > uniqueness by overriding the method or does anyone know if 
> the value 
> > is always unique already?  When the Cache object was 
> implemented with 
> > Hashtable prior to this patch, the same problem would have been 
> > inherent with that implementation (of course with the 
> additional bonus 
> > of the memory leak ;)) and I don't think I ever saw collisions 
> > although quite honestly I was never really looking for them.
> > 
> > Also, you said that the call to WeakHashMap.Keys could return NULL 
> > values.  This may be the reason why the Java implementation 
> supports 
> > null keys.  It does seem to be a possibility unless the 
> call to Keys 
> > checks that WeakReference.Key.Target is not null at the time of 
> > iteration (this would eliminate this possibility since we would now 
> > have a strong reference to the object ie: object key = 
> > entries[i].Key.Target; if (object != null) keys.Add(object);).  The 
> > same problem would exist with the Values property.
> > 
> > Also, optimizing for few IndexReaders in the WeakHashMap is an 
> > assumption that specializes the class for IndexReaders.
> > There is no guarantee that this will be the only use case in the 
> > future.  The Java documentation says that the same initial 
> parameters 
> > (loadfactor and capacity) as HashMap are used which 
> suggests that it 
> > should be left in the .NET implementation with the same 
> parameters as 
> > the default Hashtable which is its underlying container.
> > 
> > Comments?
> > -tim
> > 
> > -----Original Message-----
> > From: Vanlerberghe, Luc [mailto:Luc.Vanlerberghe@bvdep.com]
> > Sent: Wednesday, December 31, 2008 4:48 AM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: Problems with patch for LUCENENET-106
> > 
> > Hi,
> > 
> > I reviewed the fix for LUCENENET-106 and I see a couple of issues:
> > 
> > 1. The implementation of WeakHashMap in SupportClass.cs has 
> a bug in 
> > that it assumes all keys will have a unique HashCode.
> > Although the chances of this occurring are *very* small, it 
> cannot be 
> > guaranteed. A collision would mean that the cached value for one 
> > indexReader could be returned as cached value for another 
> indexReader 
> > (using the way this class is used in Lucene as example)
> > 
> > 2. This class attempts to be a complete port of the 
> > java.util.WeakHashMap, but:
> > - A correct implementation is difficult to implement, and .Net does 
> > not have an equivalent of java.lang.ref.ReferenceQueue to 
> accelerate 
> > the cleanup process.
> > - Since the behaviour of the WeakHashMap is closely tied to the 
> > behaviour of the garbage collector, it is very difficult to test 
> > properly, if at all.
> > - Lucene only uses the get(Object key) and put(K key, V 
> value) methods
> > - In Lucene, the keys are IndexReader instances.  In normal 
> use cases 
> > there will be only 1 IndexReader live, and perhaps another one if 
> > 'warming up' is used before switching IndexSearchers after an index 
> > update.
> > - If this class is presented as a complete port of 
> > java.util.WeakHashMap, users might assume this is 
> production quality 
> > code and copy/paste its code in their own projects.
> > Any further bugs found might harm the credibility of the Lucene 
> > project, even if those sections are never used in Lucene. 
> (e.g.: The 
> > collection returned by Keys might contain null values.  There's no 
> > guarantee the GC won't intervene between the call to 
> Cleanup and the 
> > calls to keys.Add(w.Key.Target).
> > 
> > 3. Most support classes that are needed for the conversion 
> of java to 
> > .Net are in the anonymous namespace, but this one is in 
> > Lucene.Net.Util.  I would propose to keep the namespaces 
> corresponding 
> > to the original java packages clean and either put all 
> support classes 
> > in the anonymous namespace or in a Lucene.Net specific one 
> > (Lucene.Net.DotNetPort ?) (I see that George moved it already to 
> > SupportClass, thanks George!)
> > 
> > I have been using the java version of Lucene in a project 
> for a long 
> > time.
> > For a new project that is currently under development, we will be 
> > using the .Net version.
> > For now we are using the 2.1 version using a custom patch 
> to avoid the 
> > memory leak problem.
> > 
> > I'll post an up to date version of this patch as a 
> replacement for the 
> > current implementation of Lucene.Net.Util.WeakHashMap sometime next 
> > week...
> > - It doesn't try to be a complete implementation of
> > WeakHashMap: only the methods strictly necessary are 
> implemented, all 
> > other methods throw a NotImplementedException. (It should 
> probably be 
> > renamed to SimplifiedWeakHashMap or something)
> > - It's optimized for the 'low number of keys' case.
> > - It's simple code, but I want to test it thoroughly first.  
> > The original patch was put directly in the FieldCacheImpl 
> code, I want 
> > to make sure I didn't make some stupid mistake while converting it.
> > 
> > I have an account on Jira, but I don't have rights to reopen the 
> > issue...
> > 
> > Regards,
> > 
> > Luc
> > 
> > 
> > -----Original Message-----
> > From: Digy (JIRA) [mailto:jira@apache.org]
> > Sent: zaterdag 27 december 2008 19:47
> > To: lucene-net-dev@incubator.apache.org
> > Subject: [jira] Closed: () Lucene.NET (Revision: 603121) is leaking 
> > memory
> > 
> > 
> >      [
> > https://issues.apache.org/jira/browse/LUCENENET-106?page=com.a
> tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> > 
> > Digy closed LUCENENET-106.
> > --------------------------
> > 
> >     Resolution: Fixed
> >       Assignee: Digy
> > 
> > Patches are applied.
> > 
> > DIGY.
> > 
> > > Lucene.NET (Revision: 603121) is leaking memory
> > > -----------------------------------------------
> > >
> > >                 Key: LUCENENET-106
> > >                 URL: 
> > https://issues.apache.org/jira/browse/LUCENENET-106
> > >             Project: Lucene.Net
> > >          Issue Type: Bug
> > >         Environment: .NET 2.0
> > >            Reporter: Anton K.
> > >            Assignee: Digy
> > >            Priority: Critical
> > >         Attachments: DIGY-FieldCacheImpl.patch, Digy.rar,
> > luceneSrc_memUsage.patch, Paches for v2.3.1.rar,
> > WeakHashTable+FieldCacheImpl.rar, WeakReferences.rar
> > >
> > >
> > > readerCache Hashtable field (see FieldCacheImpl.cs) never
> > releases some hash items that have closed IndexReader 
> object as a key. 
> > So a lot of Term instances are never released.
> > > Java version of Lucene uses WeakHashMap and therefore
> > doesn't have this problem.
> > > This bug can be reproduced only when Sort functionality
> > used during search. 
> > > See following link for additional information.
> > > http://www.gossamer-threads.com/lists/lucene/java-user/55681
> > > Steps to reproduce:
> > > 1)Create index
> > > 2) Modify index by IndexWiter; Close IndexWriter
> > > 3) Use IndexSearcher for searching with Sort; Close InexSearcher
> > > 4) Go to step 2
> > > You'll get OutOfMemoryException after some time of running
> > this algorithm.
> > 
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> > 
> > 
> > 
> 
>