You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Jeroen Lauwers <Je...@CTLO.NET> on 2008/09/09 10:12:26 UTC
Exception thrown in MultiPhraseQuery.ExtractTerms
Hi,
I think I may have found a bug in MultiPhraseQuery.ExtractTerms().
If the same word occurs twice, an "System.ArgumentException: Item has already been added." is thrown.
Original code:
public override void ExtractTerms(System.Collections.Hashtable terms)
{
for (System.Collections.IEnumerator iter = termArrays.GetEnumerator(); iter.MoveNext(); )
{
Term[] arr = (Term[]) iter.Current;
for (int i = 0; i < arr.Length; i++)
{
terms.Add(arr[i], arr[i]);
}
}
}
Possible patch:
public override void ExtractTerms(System.Collections.Hashtable terms)
{
for (System.Collections.IEnumerator iter = termArrays.GetEnumerator(); iter.MoveNext(); )
{
Term[] arr = (Term[]) iter.Current;
for (int i = 0; i < arr.Length; i++)
{
if(!terms.Contains(arr[i]))
terms.Add(arr[i], arr[i]);
}
}
}
It looks like this a bug in the Java version too. (Or is the behaviour of a java Hashtable different???)
Perhaps we should notify them.
Jeroen
Re: Exception thrown in MultiPhraseQuery.ExtractTerms
Posted by Doug Sale <do...@gmail.com>.
Thanks, Jeroen.
This is indeed a bug in Lucene.Net. System.Collections.Hashtable behavior
is divergent from java.util.HashSet behavior when adding (adding a duplicate
to HashSet replaces the prior added element). This, then, is not a bug in
Lucene Java. I will create a JIRA entry containing your patch.
-Doug
On Tue, Sep 9, 2008 at 3:12 AM, Jeroen Lauwers <Je...@ctlo.net>wrote:
> Hi,
>
> I think I may have found a bug in MultiPhraseQuery.ExtractTerms().
>
> If the same word occurs twice, an "System.ArgumentException: Item has
> already been added." is thrown.
>
> Original code:
> public override void ExtractTerms(System.Collections.Hashtable terms)
> {
> for (System.Collections.IEnumerator iter = termArrays.GetEnumerator();
> iter.MoveNext(); )
> {
> Term[] arr = (Term[]) iter.Current;
> for (int i = 0; i < arr.Length; i++)
> {
> terms.Add(arr[i], arr[i]);
> }
> }
> }
>
> Possible patch:
> public override void ExtractTerms(System.Collections.Hashtable terms)
> {
> for (System.Collections.IEnumerator iter = termArrays.GetEnumerator();
> iter.MoveNext(); )
> {
> Term[] arr = (Term[]) iter.Current;
> for (int i = 0; i < arr.Length; i++)
> {
> if(!terms.Contains(arr[i]))
> terms.Add(arr[i], arr[i]);
> }
> }
> }
>
>
> It looks like this a bug in the Java version too. (Or is the behaviour of a
> java Hashtable different???)
> Perhaps we should notify them.
>
> Jeroen
>