You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Jeroen Lauwers <Je...@CTLO.NET> on 2008/09/09 10:12:26 UTC

Exception thrown in MultiPhraseQuery.ExtractTerms

Hi,

I think I may have found a bug in MultiPhraseQuery.ExtractTerms().

If the same word occurs twice, an "System.ArgumentException: Item has already been added." is thrown.

Original code:
public override void  ExtractTerms(System.Collections.Hashtable terms)
{
      for (System.Collections.IEnumerator iter = termArrays.GetEnumerator(); iter.MoveNext(); )
      {
            Term[] arr = (Term[]) iter.Current;
            for (int i = 0; i < arr.Length; i++)
            {
                  terms.Add(arr[i], arr[i]);
            }
      }
}

Possible patch:
public override void  ExtractTerms(System.Collections.Hashtable terms)
{
      for (System.Collections.IEnumerator iter = termArrays.GetEnumerator(); iter.MoveNext(); )
      {
            Term[] arr = (Term[]) iter.Current;
            for (int i = 0; i < arr.Length; i++)
            {
                  if(!terms.Contains(arr[i]))
                      terms.Add(arr[i], arr[i]);
            }
      }
}


It looks like this a bug in the Java version too. (Or is the behaviour of a java Hashtable different???)
Perhaps we should notify them.

Jeroen

Re: Exception thrown in MultiPhraseQuery.ExtractTerms

Posted by Doug Sale <do...@gmail.com>.
Thanks, Jeroen.

This is indeed a bug in Lucene.Net.  System.Collections.Hashtable behavior
is divergent from java.util.HashSet behavior when adding (adding a duplicate
to HashSet replaces the prior added element).  This, then, is not a bug in
Lucene Java.  I will create a JIRA entry containing your patch.

-Doug

On Tue, Sep 9, 2008 at 3:12 AM, Jeroen Lauwers <Je...@ctlo.net>wrote:

> Hi,
>
> I think I may have found a bug in MultiPhraseQuery.ExtractTerms().
>
> If the same word occurs twice, an "System.ArgumentException: Item has
> already been added." is thrown.
>
> Original code:
> public override void  ExtractTerms(System.Collections.Hashtable terms)
> {
>      for (System.Collections.IEnumerator iter = termArrays.GetEnumerator();
> iter.MoveNext(); )
>      {
>            Term[] arr = (Term[]) iter.Current;
>            for (int i = 0; i < arr.Length; i++)
>            {
>                  terms.Add(arr[i], arr[i]);
>            }
>      }
> }
>
> Possible patch:
> public override void  ExtractTerms(System.Collections.Hashtable terms)
> {
>      for (System.Collections.IEnumerator iter = termArrays.GetEnumerator();
> iter.MoveNext(); )
>      {
>            Term[] arr = (Term[]) iter.Current;
>            for (int i = 0; i < arr.Length; i++)
>            {
>                  if(!terms.Contains(arr[i]))
>                      terms.Add(arr[i], arr[i]);
>            }
>      }
> }
>
>
> It looks like this a bug in the Java version too. (Or is the behaviour of a
> java Hashtable different???)
> Perhaps we should notify them.
>
> Jeroen
>