You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Christopher Kolstad <ch...@ovitas.no> on 2007/10/25 11:02:01 UTC

PrefixQuery's rewrite method does not work as expected when used under .Net 2.0

Hi.

I'm currently using the latest version from SVN of Lucene.Net 2.1. When
compiling it for .Net 2.0 I ran into an interesting "feature".
Query expansion stops before expected. After a lot of debugging, I found the
culprit to be PrefixQuery's Rewrite method. It uses string.StartsWith
(string);
However in .Net 2.0 this will then break on any box with different culture
settings than English. Since .Net 2.0 StartsWith has the overload
StartsWith(string, StringComparisonType) and this is set to default to
StringComparison.CurrentCulture

Norwegian culture for instance states that 'aa' is the same as the single
letter 'å'

So when my index contains para-orange, paraaminosyre para.... and I do the
PrefixQuery para*

this method fails on the if(term.Text().StartsWith(prefixText)) test. on
paraaminosyre.StartsWith(para).

The fix for me was to change the line in PrefixQuery.cs::Rewrite() from

if(term != null && term.Text().StartsWith(prefixText) && term.Field() ==
prefixField)

to

if (term != null && term.Text().StartsWith(prefixText,
StringComparison.Ordinal) && term.Field() == prefixField)

This is a new "feature" in .Net 2.0 and did not affect my .Net 1.1 projects
which still uses the .Net 1.1 library.


Conditional compilation might be required.
-- 
Regards,
Christopher Kolstad
E-mail: chriswk@ifi.uio.no (University)
christopher.kolstad@gmail.com (Home)
chriswk@ovitas.no (Job)

RE: PrefixQuery's rewrite method does not work as expected when used under .Net 2.0

Posted by George Aroush <ge...@aroush.net>.
I have committed this fix; no patch is needed.

-- George 

> -----Original Message-----
> From: DIGY [mailto:digydigy@gmail.com] 
> Sent: Thursday, October 25, 2007 2:40 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: PrefixQuery's rewrite method does not work as 
> expected when used under .Net 2.0
> 
> Hi Christopher,
> 
> Can you open an issue on JIRA and attach your patch to it?
> 
> The test case below justifies you. 
> 
> 	  void WriteQuery()
>         {
>             string sToIndex = "paraaminosyre";
>             string sToSearch = "para";
> 
>             Lucene.Net.Store.RAMDirectory ramDir = new 
> Lucene.Net.Store.RAMDirectory();
> 
>             //INDEX
>             Lucene.Net.Index.IndexWriter writer = new 
> Lucene.Net.Index.IndexWriter(ramDir, new 
> Lucene.Net.Analysis.Standard.StandardAnalyzer());
>             Lucene.Net.Documents.Document doc = new 
> Lucene.Net.Documents.Document();
>             doc.Add(new Lucene.Net.Documents.Field("Field1", 
> sToIndex, Lucene.Net.Documents.Field.Store.YES,
> Lucene.Net.Documents.Field.Index.TOKENIZED));
>             writer.AddDocument(doc);
>             writer.Close();
>             
>             //SEARCH
>             Lucene.Net.Index.IndexReader reader = 
> Lucene.Net.Index.IndexReader.Open(ramDir);
>             Lucene.Net.Search.PrefixQuery pq = new 
> Lucene.Net.Search.PrefixQuery(new 
> Lucene.Net.Index.Term("Field1", sToSearch ));
>             Lucene.Net.Search.Query q = pq.Rewrite(reader);
>             Console.WriteLine("#" + q.ToString() + "#");
>             reader.Close();
>         }
> 
>         public void Test()
>         {
>             
> System.Threading.Thread.CurrentThread.CurrentCulture = 
> System.Globalization.CultureInfo.GetCultureInfo("en-us");
>             WriteQuery();
> 
>             
> System.Threading.Thread.CurrentThread.CurrentCulture = 
> System.Globalization.CultureInfo.GetCultureInfo("nn-no");
>             WriteQuery();
>         }
> 
> Output:
> #Field1:paraaminosyre#
> ##
> 
> 
> Regards,
> 
> DIGY
> 
> -----Original Message-----
> From: christopher.kolstad@gmail.com 
> [mailto:christopher.kolstad@gmail.com]
> On Behalf Of Christopher Kolstad
> Sent: Thursday, October 25, 2007 12:02 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: PrefixQuery's rewrite method does not work as 
> expected when used under .Net 2.0
> 
> Hi.
> 
> I'm currently using the latest version from SVN of Lucene.Net 
> 2.1. When compiling it for .Net 2.0 I ran into an interesting 
> "feature".
> Query expansion stops before expected. After a lot of 
> debugging, I found the culprit to be PrefixQuery's Rewrite 
> method. It uses string.StartsWith (string); However in .Net 
> 2.0 this will then break on any box with different culture 
> settings than English. Since .Net 2.0 StartsWith has the 
> overload StartsWith(string, StringComparisonType) and this is 
> set to default to StringComparison.CurrentCulture
> 
> Norwegian culture for instance states that 'aa' is the same 
> as the single letter 'å'
> 
> So when my index contains para-orange, paraaminosyre para.... 
> and I do the PrefixQuery para*
> 
> this method fails on the 
> if(term.Text().StartsWith(prefixText)) test. on 
> paraaminosyre.StartsWith(para).
> 
> The fix for me was to change the line in 
> PrefixQuery.cs::Rewrite() from
> 
> if(term != null && term.Text().StartsWith(prefixText) && 
> term.Field() ==
> prefixField)
> 
> to
> 
> if (term != null && term.Text().StartsWith(prefixText,
> StringComparison.Ordinal) && term.Field() == prefixField)
> 
> This is a new "feature" in .Net 2.0 and did not affect my 
> .Net 1.1 projects which still uses the .Net 1.1 library.
> 
> 
> Conditional compilation might be required.
> --
> Regards,
> Christopher Kolstad
> E-mail: chriswk@ifi.uio.no (University)
> christopher.kolstad@gmail.com (Home)
> chriswk@ovitas.no (Job)
> 


RE: PrefixQuery's rewrite method does not work as expected when used under .Net 2.0

Posted by DIGY <di...@gmail.com>.
Hi Christopher,

Can you open an issue on JIRA and attach your patch to it?

The test case below justifies you. 

	  void WriteQuery()
        {
            string sToIndex = "paraaminosyre";
            string sToSearch = "para";

            Lucene.Net.Store.RAMDirectory ramDir = new
Lucene.Net.Store.RAMDirectory();

            //INDEX
            Lucene.Net.Index.IndexWriter writer = new
Lucene.Net.Index.IndexWriter(ramDir, new
Lucene.Net.Analysis.Standard.StandardAnalyzer());
            Lucene.Net.Documents.Document doc = new
Lucene.Net.Documents.Document();
            doc.Add(new Lucene.Net.Documents.Field("Field1", sToIndex,
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.TOKENIZED));
            writer.AddDocument(doc);
            writer.Close();
            
            //SEARCH
            Lucene.Net.Index.IndexReader reader =
Lucene.Net.Index.IndexReader.Open(ramDir);
            Lucene.Net.Search.PrefixQuery pq = new
Lucene.Net.Search.PrefixQuery(new Lucene.Net.Index.Term("Field1", sToSearch
));
            Lucene.Net.Search.Query q = pq.Rewrite(reader);
            Console.WriteLine("#" + q.ToString() + "#");
            reader.Close();
        }

        public void Test()
        {
            System.Threading.Thread.CurrentThread.CurrentCulture =
System.Globalization.CultureInfo.GetCultureInfo("en-us");
            WriteQuery();

            System.Threading.Thread.CurrentThread.CurrentCulture =
System.Globalization.CultureInfo.GetCultureInfo("nn-no");
            WriteQuery();
        }

Output:
#Field1:paraaminosyre#
##


Regards,

DIGY

-----Original Message-----
From: christopher.kolstad@gmail.com [mailto:christopher.kolstad@gmail.com]
On Behalf Of Christopher Kolstad
Sent: Thursday, October 25, 2007 12:02 PM
To: lucene-net-dev@incubator.apache.org
Subject: PrefixQuery's rewrite method does not work as expected when used
under .Net 2.0

Hi.

I'm currently using the latest version from SVN of Lucene.Net 2.1. When
compiling it for .Net 2.0 I ran into an interesting "feature".
Query expansion stops before expected. After a lot of debugging, I found the
culprit to be PrefixQuery's Rewrite method. It uses string.StartsWith
(string);
However in .Net 2.0 this will then break on any box with different culture
settings than English. Since .Net 2.0 StartsWith has the overload
StartsWith(string, StringComparisonType) and this is set to default to
StringComparison.CurrentCulture

Norwegian culture for instance states that 'aa' is the same as the single
letter 'å'

So when my index contains para-orange, paraaminosyre para.... and I do the
PrefixQuery para*

this method fails on the if(term.Text().StartsWith(prefixText)) test. on
paraaminosyre.StartsWith(para).

The fix for me was to change the line in PrefixQuery.cs::Rewrite() from

if(term != null && term.Text().StartsWith(prefixText) && term.Field() ==
prefixField)

to

if (term != null && term.Text().StartsWith(prefixText,
StringComparison.Ordinal) && term.Field() == prefixField)

This is a new "feature" in .Net 2.0 and did not affect my .Net 1.1 projects
which still uses the .Net 1.1 library.


Conditional compilation might be required.
-- 
Regards,
Christopher Kolstad
E-mail: chriswk@ifi.uio.no (University)
christopher.kolstad@gmail.com (Home)
chriswk@ovitas.no (Job)