You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by java8964 java8964 <ja...@hotmail.com> on 2010/02/01 17:26:35 UTC

During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

I noticed a strange result from the following test case. For wildcard search, my understanding is that lucene will NOT use any analyzer on the query string. But as the following simple code to show, it looks like that lucene will lower case the search query in the wildcard search. Why? If not, why the following test case show the search hits as one for lower case wildcard search, but not for the upper case data? My original data is NOT analyzed, so they should be stored as the original data in the index segment, right?

Lucene version: 2.9.0

JDK version: JDK 1.6.0_17


public class IndexTest1 {
    public static void main(String[] args) {
        try {
            Directory directory = new RAMDirectory();
            IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(Version.LUCENE_CURRENT), IndexWriter.MaxFieldLength.UNLIMITED);
            Document doc = new Document();
            doc.add(new Field("title", "BBB CCC", Field.Store.YES, Field.Index.NOT_ANALYZED));
            writer.addDocument(doc);
            doc = new Document();
            doc.add(new Field("title", "ddd eee", Field.Store.YES, Field.Index.NOT_ANALYZED));
            writer.addDocument(doc);

            writer.close();

            IndexSearcher searcher = new IndexSearcher(directory, true);
            PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT));
            wrapper.addAnalyzer("title", new KeywordAnalyzer());
            Query query = new QueryParser("title",
                    wrapper).parse("title:BBB*");
            System.out.println("hits of title = " + searcher.search(query, 100).totalHits);
            query = new QueryParser("title",
                    wrapper).parse("title:ddd*");
            System.out.println("hits of title = " + searcher.search(query, 100).totalHits);
            searcher.close();
        } catch (Exception e) {
            System.out.println(e);
        }
    }
}

The output:
hits of title = 0
hits of title = 1

 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by Uwe Schindler <uw...@thetaphi.de>.

Just add the field a second time with Field.Store.YES and Field.Index.NO in original case. For searching ad using the Tokenizer approach as described before using the TokenStream.

Internally this is handled exactly like this (if you enable both Field.Index.ANALYZED and Field.Store.YES).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: java8964 java8964 [mailto:java8964@hotmail.com]
> Sent: Wednesday, February 03, 2010 4:07 PM
> To: java-user@lucene.apache.org
> Subject: RE: During the wild card search, will lucene 2.9.0 to convert
> the search string to lower case?
> 
> 
> Thanks for your help.
> 
> My concern now is that the field could be defined as store. So when the
> user receive the field data, we want to still show the original data,
> in upper case in this case.
> 
> First, I don't think I can use
> queryParser.SetLowercaseExpandedTerms(false), which will remove the
> wildcard search case insensitive functionality for tokenized field.
> 
> To handle this case, if the data is NOT tokenized, but contain upper
> case data, to be able do the wildcard search with uppercase letter,
> like 'BB*', I am thinking that I have to analyzer the non tokenized
> data, using a KeywordTokenizer plus lowercase the data.
> 
> For your suggestion, will the data change to lower case and stored in
> the lucene when it being retrieved?
> 
> Thanks
> 
> > From: uwe@thetaphi.de
> > To: java-user@lucene.apache.org
> > Subject: RE: During the wild card search, will lucene 2.9.0 to
> convert the 	search string to lower case?
> > Date: Wed, 3 Feb 2010 11:17:27 +0100
> >
> > For specific fields using a special TokenStream chain, there is no
> need to write a separate analyzer. You can add fields to a document
> using a TokenStream as parameter: new Field(name, TokenStream).
> >
> > As TokenStream just create a chain from Tokenizer and all Filters
> like:
> >
> > TokenStream ts = new KeywordTokenizer(new StringReader("your text to
> index"));
> > ts = new LowercaseFilter(ts);
> > ...
> > document.add("fieldname", ts);
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Ian Lea [mailto:ian.lea@gmail.com]
> > > Sent: Wednesday, February 03, 2010 11:06 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: During the wild card search, will lucene 2.9.0 to
> convert
> > > the search string to lower case?
> > >
> > > I think you'll have to write your own.  Or just downcase the text
> > > yourself first.
> > >
> > >
> > > --
> > > Ian.
> > >
> > >
> > > On Tue, Feb 2, 2010 at 9:30 PM, java8964 java8964
> > > <ja...@hotmail.com> wrote:
> > > >
> > > > Is there an analyzer like keyword analyzer, but will also
> lowering
> > > the data from lucene? Or I have to do a customer analyzer by
> myself?
> > > >
> > > > Thanks
> > > >
> > > >> From: java8964@hotmail.com
> > > >> To: java-user@lucene.apache.org
> > > >> Subject: RE: During the wild card search, will lucene 2.9.0 to
> > > convert the search string to lower case?
> > > >> Date: Mon, 1 Feb 2010 14:24:00 -0500
> > > >>
> > > >>
> > > >> This is maybe something I am looking for. We are using the
> default
> > > value, which is true.
> > > >>
> > > >> Let me examine this method more.
> > > >>
> > > >> Thanks for your help.
> > > >>
> > > >> > From: digydigy@gmail.com
> > > >> > To: java-user@lucene.apache.org
> > > >> > Subject: RE: During the wild card search, will lucene 2.9.0 to
> > > convert the search string to lower case?
> > > >> > Date: Mon, 1 Feb 2010 20:36:29 +0200
> > > >> >
> > > >> > Did you try queryParser.SetLowercaseExpandedTerms(false)?
> > > >> >
> > > >> > DIGY
> > > >> >
> > > >> > -----Original Message-----
> > > >> > From: java8964 java8964 [mailto:java8964@hotmail.com]
> > > >> > Sent: Monday, February 01, 2010 8:11 PM
> > > >> > To: java-user@lucene.apache.org
> > > >> > Subject: RE: During the wild card search, will lucene 2.9.0 to
> > > convert the
> > > >> > search string to lower case?
> > > >> >
> > > >> >
> > > >> > I would like to confirm your reply. You mean that the query
> parse
> > > will lower
> > > >> > casing. In fact, it looks like that it only does this for wild
> > > card query,
> > > >> > right?
> > > >> >
> > > >> > For the term query, it didn't. As proved by if you change the
> line
> > > to:
> > > >> >
> > > >> >             Query query = new QueryParser("title",
> > > >> > wrapper).parse("title:\"BBB CCC\"");
> > > >> >
> > > >> > You will get 1 hits back. So in this case, the query parser
> class
> > > did in
> > > >> > different way for term query and wild card query.
> > > >> >
> > > >> > We have to use the query parse in this case, but we have our
> own
> > > Query
> > > >> > parser class extends from the lucene query parser class.
> Anything
> > > we can do
> > > >> > to about it?
> > > >> >
> > > >> > Will lucense's query parser class be fixed for the above
> > > inconsistent
> > > >> > implementation?
> > > >> >
> > > >> > Thanks
> > > >> >
> > > >> >
> > > >> > > From: uwe@thetaphi.de
> > > >> > > To: java-user@lucene.apache.org
> > > >> > > Subject: RE: During the wild card search, will lucene 2.9.0
> to
> > > convert the
> > > >> > search string to lower case?
> > > >> > > Date: Mon, 1 Feb 2010 17:41:08 +0100
> > > >> > >
> > > >> > > Only query parser does the lower casing. For such a special
> > > case, I would
> > > >> > suggest to use a PrefixQuery or WildcardQuery directly and not
> use
> > > query
> > > >> > parser.
> > > >> > >
> > > >> > > -----
> > > >> > > Uwe Schindler
> > > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >> > > http://www.thetaphi.de
> > > >> > > eMail: uwe@thetaphi.de
> > > >> > >
> > > >> > > > -----Original Message-----
> > > >> > > > From: java8964 java8964 [mailto:java8964@hotmail.com]
> > > >> > > > Sent: Monday, February 01, 2010 5:27 PM
> > > >> > > > To: java-user@lucene.apache.org
> > > >> > > > Subject: During the wild card search, will lucene 2.9.0 to
> > > convert the
> > > >> > > > search string to lower case?
> > > >> > > >
> > > >> > > >
> > > >> > > > I noticed a strange result from the following test case.
> For
> > > wildcard
> > > >> > > > search, my understanding is that lucene will NOT use any
> > > analyzer on
> > > >> > > > the query string. But as the following simple code to
> show, it
> > > looks
> > > >> > > > like that lucene will lower case the search query in the
> > > wildcard
> > > >> > > > search. Why? If not, why the following test case show the
> > > search hits
> > > >> > > > as one for lower case wildcard search, but not for the
> upper
> > > case data?
> > > >> > > > My original data is NOT analyzed, so they should be stored
> as
> > > the
> > > >> > > > original data in the index segment, right?
> > > >> > > >
> > > >> > > > Lucene version: 2.9.0
> > > >> > > >
> > > >> > > > JDK version: JDK 1.6.0_17
> > > >> > > >
> > > >> > > >
> > > >> > > > public class IndexTest1 {
> > > >> > > >     public static void main(String[] args) {
> > > >> > > >         try {
> > > >> > > >             Directory directory = new RAMDirectory();
> > > >> > > >             IndexWriter writer = new
> IndexWriter(directory,
> > > new
> > > >> > > > StandardAnalyzer(Version.LUCENE_CURRENT),
> > > >> > > > IndexWriter.MaxFieldLength.UNLIMITED);
> > > >> > > >             Document doc = new Document();
> > > >> > > >             doc.add(new Field("title", "BBB CCC",
> > > Field.Store.YES,
> > > >> > > > Field.Index.NOT_ANALYZED));
> > > >> > > >             writer.addDocument(doc);
> > > >> > > >             doc = new Document();
> > > >> > > >             doc.add(new Field("title", "ddd eee",
> > > Field.Store.YES,
> > > >> > > > Field.Index.NOT_ANALYZED));
> > > >> > > >             writer.addDocument(doc);
> > > >> > > >
> > > >> > > >             writer.close();
> > > >> > > >
> > > >> > > >             IndexSearcher searcher = new
> > > IndexSearcher(directory,
> > > >> > > > true);
> > > >> > > >             PerFieldAnalyzerWrapper wrapper = new
> > > >> > > > PerFieldAnalyzerWrapper(new
> > > StandardAnalyzer(Version.LUCENE_CURRENT));
> > > >> > > >             wrapper.addAnalyzer("title", new
> > > KeywordAnalyzer());
> > > >> > > >             Query query = new QueryParser("title",
> > > >> > > >                     wrapper).parse("title:BBB*");
> > > >> > > >             System.out.println("hits of title = " +
> > > >> > > > searcher.search(query, 100).totalHits);
> > > >> > > >             query = new QueryParser("title",
> > > >> > > >                     wrapper).parse("title:ddd*");
> > > >> > > >             System.out.println("hits of title = " +
> > > >> > > > searcher.search(query, 100).totalHits);
> > > >> > > >             searcher.close();
> > > >> > > >         } catch (Exception e) {
> > > >> > > >             System.out.println(e);
> > > >> > > >         }
> > > >> > > >     }
> > > >> > > > }
> > > >> > > >
> > > >> > > > The output:
> > > >> > > > hits of title = 0
> > > >> > > > hits of title = 1
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > _________________________________________________________________
> > > >> > > > Hotmail: Trusted email with powerful SPAM protection.
> > > >> > > > http://clk.atdmt.com/GBL/go/201469227/direct/01/
> > > >> > >
> > > >> > >
> > > >> > > ------------------------------------------------------------
> ----
> > > -----
> > > >> > > To unsubscribe, e-mail: java-user-
> unsubscribe@lucene.apache.org
> > > >> > > For additional commands, e-mail: java-user-
> > > help@lucene.apache.org
> > > >> > >
> > > >> >
> > > >> >
> _________________________________________________________________
> > > >> > Hotmail: Powerful Free email with security by Microsoft.
> > > >> > http://clk.atdmt.com/GBL/go/201469230/direct/01/
> > > >> >
> > > >> >
> > > >> > --------------------------------------------------------------
> ----
> > > ---
> > > >> > To unsubscribe, e-mail: java-user-
> unsubscribe@lucene.apache.org
> > > >> > For additional commands, e-mail: java-user-
> help@lucene.apache.org
> > > >> >
> > > >>
> > > >>
> _________________________________________________________________
> > > >> Hotmail: Trusted email with Microsoft’s powerful SPAM
> protection.
> > > >> http://clk.atdmt.com/GBL/go/201469226/direct/01/
> > > >
> > > > _________________________________________________________________
> > > > Hotmail: Powerful Free email with security by Microsoft.
> > > > http://clk.atdmt.com/GBL/go/201469230/direct/01/
> > >
> > > -------------------------------------------------------------------
> --
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> _________________________________________________________________
> Hotmail: Powerful Free email with security by Microsoft.
> http://clk.atdmt.com/GBL/go/201469230/direct/01/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by java8964 java8964 <ja...@hotmail.com>.

Thanks for your help.

My concern now is that the field could be defined as store. So when the user receive the field data, we want to still show the original data, in upper case in this case.

First, I don't think I can use queryParser.SetLowercaseExpandedTerms(false), which will remove the wildcard search case insensitive functionality for tokenized field.

To handle this case, if the data is NOT tokenized, but contain upper case data, to be able do the wildcard search with uppercase letter, like 'BB*', I am thinking that I have to analyzer the non tokenized data, using a KeywordTokenizer plus lowercase the data.

For your suggestion, will the data change to lower case and stored in the lucene when it being retrieved?

Thanks

> From: uwe@thetaphi.de
> To: java-user@lucene.apache.org
> Subject: RE: During the wild card search, will lucene 2.9.0 to convert the 	search string to lower case?
> Date: Wed, 3 Feb 2010 11:17:27 +0100
> 
> For specific fields using a special TokenStream chain, there is no need to write a separate analyzer. You can add fields to a document using a TokenStream as parameter: new Field(name, TokenStream).
> 
> As TokenStream just create a chain from Tokenizer and all Filters like:
> 
> TokenStream ts = new KeywordTokenizer(new StringReader("your text to index"));
> ts = new LowercaseFilter(ts);
> ...
> document.add("fieldname", ts);
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Ian Lea [mailto:ian.lea@gmail.com]
> > Sent: Wednesday, February 03, 2010 11:06 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: During the wild card search, will lucene 2.9.0 to convert
> > the search string to lower case?
> > 
> > I think you'll have to write your own.  Or just downcase the text
> > yourself first.
> > 
> > 
> > --
> > Ian.
> > 
> > 
> > On Tue, Feb 2, 2010 at 9:30 PM, java8964 java8964
> > <ja...@hotmail.com> wrote:
> > >
> > > Is there an analyzer like keyword analyzer, but will also lowering
> > the data from lucene? Or I have to do a customer analyzer by myself?
> > >
> > > Thanks
> > >
> > >> From: java8964@hotmail.com
> > >> To: java-user@lucene.apache.org
> > >> Subject: RE: During the wild card search, will lucene 2.9.0 to
> > convert the search string to lower case?
> > >> Date: Mon, 1 Feb 2010 14:24:00 -0500
> > >>
> > >>
> > >> This is maybe something I am looking for. We are using the default
> > value, which is true.
> > >>
> > >> Let me examine this method more.
> > >>
> > >> Thanks for your help.
> > >>
> > >> > From: digydigy@gmail.com
> > >> > To: java-user@lucene.apache.org
> > >> > Subject: RE: During the wild card search, will lucene 2.9.0 to
> > convert the search string to lower case?
> > >> > Date: Mon, 1 Feb 2010 20:36:29 +0200
> > >> >
> > >> > Did you try queryParser.SetLowercaseExpandedTerms(false)?
> > >> >
> > >> > DIGY
> > >> >
> > >> > -----Original Message-----
> > >> > From: java8964 java8964 [mailto:java8964@hotmail.com]
> > >> > Sent: Monday, February 01, 2010 8:11 PM
> > >> > To: java-user@lucene.apache.org
> > >> > Subject: RE: During the wild card search, will lucene 2.9.0 to
> > convert the
> > >> > search string to lower case?
> > >> >
> > >> >
> > >> > I would like to confirm your reply. You mean that the query parse
> > will lower
> > >> > casing. In fact, it looks like that it only does this for wild
> > card query,
> > >> > right?
> > >> >
> > >> > For the term query, it didn't. As proved by if you change the line
> > to:
> > >> >
> > >> >             Query query = new QueryParser("title",
> > >> > wrapper).parse("title:\"BBB CCC\"");
> > >> >
> > >> > You will get 1 hits back. So in this case, the query parser class
> > did in
> > >> > different way for term query and wild card query.
> > >> >
> > >> > We have to use the query parse in this case, but we have our own
> > Query
> > >> > parser class extends from the lucene query parser class. Anything
> > we can do
> > >> > to about it?
> > >> >
> > >> > Will lucense's query parser class be fixed for the above
> > inconsistent
> > >> > implementation?
> > >> >
> > >> > Thanks
> > >> >
> > >> >
> > >> > > From: uwe@thetaphi.de
> > >> > > To: java-user@lucene.apache.org
> > >> > > Subject: RE: During the wild card search, will lucene 2.9.0 to
> > convert the
> > >> > search string to lower case?
> > >> > > Date: Mon, 1 Feb 2010 17:41:08 +0100
> > >> > >
> > >> > > Only query parser does the lower casing. For such a special
> > case, I would
> > >> > suggest to use a PrefixQuery or WildcardQuery directly and not use
> > query
> > >> > parser.
> > >> > >
> > >> > > -----
> > >> > > Uwe Schindler
> > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > >> > > http://www.thetaphi.de
> > >> > > eMail: uwe@thetaphi.de
> > >> > >
> > >> > > > -----Original Message-----
> > >> > > > From: java8964 java8964 [mailto:java8964@hotmail.com]
> > >> > > > Sent: Monday, February 01, 2010 5:27 PM
> > >> > > > To: java-user@lucene.apache.org
> > >> > > > Subject: During the wild card search, will lucene 2.9.0 to
> > convert the
> > >> > > > search string to lower case?
> > >> > > >
> > >> > > >
> > >> > > > I noticed a strange result from the following test case. For
> > wildcard
> > >> > > > search, my understanding is that lucene will NOT use any
> > analyzer on
> > >> > > > the query string. But as the following simple code to show, it
> > looks
> > >> > > > like that lucene will lower case the search query in the
> > wildcard
> > >> > > > search. Why? If not, why the following test case show the
> > search hits
> > >> > > > as one for lower case wildcard search, but not for the upper
> > case data?
> > >> > > > My original data is NOT analyzed, so they should be stored as
> > the
> > >> > > > original data in the index segment, right?
> > >> > > >
> > >> > > > Lucene version: 2.9.0
> > >> > > >
> > >> > > > JDK version: JDK 1.6.0_17
> > >> > > >
> > >> > > >
> > >> > > > public class IndexTest1 {
> > >> > > >     public static void main(String[] args) {
> > >> > > >         try {
> > >> > > >             Directory directory = new RAMDirectory();
> > >> > > >             IndexWriter writer = new IndexWriter(directory,
> > new
> > >> > > > StandardAnalyzer(Version.LUCENE_CURRENT),
> > >> > > > IndexWriter.MaxFieldLength.UNLIMITED);
> > >> > > >             Document doc = new Document();
> > >> > > >             doc.add(new Field("title", "BBB CCC",
> > Field.Store.YES,
> > >> > > > Field.Index.NOT_ANALYZED));
> > >> > > >             writer.addDocument(doc);
> > >> > > >             doc = new Document();
> > >> > > >             doc.add(new Field("title", "ddd eee",
> > Field.Store.YES,
> > >> > > > Field.Index.NOT_ANALYZED));
> > >> > > >             writer.addDocument(doc);
> > >> > > >
> > >> > > >             writer.close();
> > >> > > >
> > >> > > >             IndexSearcher searcher = new
> > IndexSearcher(directory,
> > >> > > > true);
> > >> > > >             PerFieldAnalyzerWrapper wrapper = new
> > >> > > > PerFieldAnalyzerWrapper(new
> > StandardAnalyzer(Version.LUCENE_CURRENT));
> > >> > > >             wrapper.addAnalyzer("title", new
> > KeywordAnalyzer());
> > >> > > >             Query query = new QueryParser("title",
> > >> > > >                     wrapper).parse("title:BBB*");
> > >> > > >             System.out.println("hits of title = " +
> > >> > > > searcher.search(query, 100).totalHits);
> > >> > > >             query = new QueryParser("title",
> > >> > > >                     wrapper).parse("title:ddd*");
> > >> > > >             System.out.println("hits of title = " +
> > >> > > > searcher.search(query, 100).totalHits);
> > >> > > >             searcher.close();
> > >> > > >         } catch (Exception e) {
> > >> > > >             System.out.println(e);
> > >> > > >         }
> > >> > > >     }
> > >> > > > }
> > >> > > >
> > >> > > > The output:
> > >> > > > hits of title = 0
> > >> > > > hits of title = 1
> > >> > > >
> > >> > > >
> > >> > > >
> > _________________________________________________________________
> > >> > > > Hotmail: Trusted email with powerful SPAM protection.
> > >> > > > http://clk.atdmt.com/GBL/go/201469227/direct/01/
> > >> > >
> > >> > >
> > >> > > ----------------------------------------------------------------
> > -----
> > >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > > For additional commands, e-mail: java-user-
> > help@lucene.apache.org
> > >> > >
> > >> >
> > >> > _________________________________________________________________
> > >> > Hotmail: Powerful Free email with security by Microsoft.
> > >> > http://clk.atdmt.com/GBL/go/201469230/direct/01/
> > >> >
> > >> >
> > >> > ------------------------------------------------------------------
> > ---
> > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >> >
> > >>
> > >> _________________________________________________________________
> > >> Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> > >> http://clk.atdmt.com/GBL/go/201469226/direct/01/
> > >
> > > _________________________________________________________________
> > > Hotmail: Powerful Free email with security by Microsoft.
> > > http://clk.atdmt.com/GBL/go/201469230/direct/01/
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by Uwe Schindler <uw...@thetaphi.de>.

For specific fields using a special TokenStream chain, there is no need to write a separate analyzer. You can add fields to a document using a TokenStream as parameter: new Field(name, TokenStream).

As TokenStream just create a chain from Tokenizer and all Filters like:

TokenStream ts = new KeywordTokenizer(new StringReader("your text to index"));
ts = new LowercaseFilter(ts);
...
document.add("fieldname", ts);

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: Wednesday, February 03, 2010 11:06 AM
> To: java-user@lucene.apache.org
> Subject: Re: During the wild card search, will lucene 2.9.0 to convert
> the search string to lower case?
> 
> I think you'll have to write your own.  Or just downcase the text
> yourself first.
> 
> 
> --
> Ian.
> 
> 
> On Tue, Feb 2, 2010 at 9:30 PM, java8964 java8964
> <ja...@hotmail.com> wrote:
> >
> > Is there an analyzer like keyword analyzer, but will also lowering
> the data from lucene? Or I have to do a customer analyzer by myself?
> >
> > Thanks
> >
> >> From: java8964@hotmail.com
> >> To: java-user@lucene.apache.org
> >> Subject: RE: During the wild card search, will lucene 2.9.0 to
> convert the search string to lower case?
> >> Date: Mon, 1 Feb 2010 14:24:00 -0500
> >>
> >>
> >> This is maybe something I am looking for. We are using the default
> value, which is true.
> >>
> >> Let me examine this method more.
> >>
> >> Thanks for your help.
> >>
> >> > From: digydigy@gmail.com
> >> > To: java-user@lucene.apache.org
> >> > Subject: RE: During the wild card search, will lucene 2.9.0 to
> convert the search string to lower case?
> >> > Date: Mon, 1 Feb 2010 20:36:29 +0200
> >> >
> >> > Did you try queryParser.SetLowercaseExpandedTerms(false)?
> >> >
> >> > DIGY
> >> >
> >> > -----Original Message-----
> >> > From: java8964 java8964 [mailto:java8964@hotmail.com]
> >> > Sent: Monday, February 01, 2010 8:11 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: RE: During the wild card search, will lucene 2.9.0 to
> convert the
> >> > search string to lower case?
> >> >
> >> >
> >> > I would like to confirm your reply. You mean that the query parse
> will lower
> >> > casing. In fact, it looks like that it only does this for wild
> card query,
> >> > right?
> >> >
> >> > For the term query, it didn't. As proved by if you change the line
> to:
> >> >
> >> >             Query query = new QueryParser("title",
> >> > wrapper).parse("title:\"BBB CCC\"");
> >> >
> >> > You will get 1 hits back. So in this case, the query parser class
> did in
> >> > different way for term query and wild card query.
> >> >
> >> > We have to use the query parse in this case, but we have our own
> Query
> >> > parser class extends from the lucene query parser class. Anything
> we can do
> >> > to about it?
> >> >
> >> > Will lucense's query parser class be fixed for the above
> inconsistent
> >> > implementation?
> >> >
> >> > Thanks
> >> >
> >> >
> >> > > From: uwe@thetaphi.de
> >> > > To: java-user@lucene.apache.org
> >> > > Subject: RE: During the wild card search, will lucene 2.9.0 to
> convert the
> >> > search string to lower case?
> >> > > Date: Mon, 1 Feb 2010 17:41:08 +0100
> >> > >
> >> > > Only query parser does the lower casing. For such a special
> case, I would
> >> > suggest to use a PrefixQuery or WildcardQuery directly and not use
> query
> >> > parser.
> >> > >
> >> > > -----
> >> > > Uwe Schindler
> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> >> > > http://www.thetaphi.de
> >> > > eMail: uwe@thetaphi.de
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: java8964 java8964 [mailto:java8964@hotmail.com]
> >> > > > Sent: Monday, February 01, 2010 5:27 PM
> >> > > > To: java-user@lucene.apache.org
> >> > > > Subject: During the wild card search, will lucene 2.9.0 to
> convert the
> >> > > > search string to lower case?
> >> > > >
> >> > > >
> >> > > > I noticed a strange result from the following test case. For
> wildcard
> >> > > > search, my understanding is that lucene will NOT use any
> analyzer on
> >> > > > the query string. But as the following simple code to show, it
> looks
> >> > > > like that lucene will lower case the search query in the
> wildcard
> >> > > > search. Why? If not, why the following test case show the
> search hits
> >> > > > as one for lower case wildcard search, but not for the upper
> case data?
> >> > > > My original data is NOT analyzed, so they should be stored as
> the
> >> > > > original data in the index segment, right?
> >> > > >
> >> > > > Lucene version: 2.9.0
> >> > > >
> >> > > > JDK version: JDK 1.6.0_17
> >> > > >
> >> > > >
> >> > > > public class IndexTest1 {
> >> > > >     public static void main(String[] args) {
> >> > > >         try {
> >> > > >             Directory directory = new RAMDirectory();
> >> > > >             IndexWriter writer = new IndexWriter(directory,
> new
> >> > > > StandardAnalyzer(Version.LUCENE_CURRENT),
> >> > > > IndexWriter.MaxFieldLength.UNLIMITED);
> >> > > >             Document doc = new Document();
> >> > > >             doc.add(new Field("title", "BBB CCC",
> Field.Store.YES,
> >> > > > Field.Index.NOT_ANALYZED));
> >> > > >             writer.addDocument(doc);
> >> > > >             doc = new Document();
> >> > > >             doc.add(new Field("title", "ddd eee",
> Field.Store.YES,
> >> > > > Field.Index.NOT_ANALYZED));
> >> > > >             writer.addDocument(doc);
> >> > > >
> >> > > >             writer.close();
> >> > > >
> >> > > >             IndexSearcher searcher = new
> IndexSearcher(directory,
> >> > > > true);
> >> > > >             PerFieldAnalyzerWrapper wrapper = new
> >> > > > PerFieldAnalyzerWrapper(new
> StandardAnalyzer(Version.LUCENE_CURRENT));
> >> > > >             wrapper.addAnalyzer("title", new
> KeywordAnalyzer());
> >> > > >             Query query = new QueryParser("title",
> >> > > >                     wrapper).parse("title:BBB*");
> >> > > >             System.out.println("hits of title = " +
> >> > > > searcher.search(query, 100).totalHits);
> >> > > >             query = new QueryParser("title",
> >> > > >                     wrapper).parse("title:ddd*");
> >> > > >             System.out.println("hits of title = " +
> >> > > > searcher.search(query, 100).totalHits);
> >> > > >             searcher.close();
> >> > > >         } catch (Exception e) {
> >> > > >             System.out.println(e);
> >> > > >         }
> >> > > >     }
> >> > > > }
> >> > > >
> >> > > > The output:
> >> > > > hits of title = 0
> >> > > > hits of title = 1
> >> > > >
> >> > > >
> >> > > >
> _________________________________________________________________
> >> > > > Hotmail: Trusted email with powerful SPAM protection.
> >> > > > http://clk.atdmt.com/GBL/go/201469227/direct/01/
> >> > >
> >> > >
> >> > > ----------------------------------------------------------------
> -----
> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > For additional commands, e-mail: java-user-
> help@lucene.apache.org
> >> > >
> >> >
> >> > _________________________________________________________________
> >> > Hotmail: Powerful Free email with security by Microsoft.
> >> > http://clk.atdmt.com/GBL/go/201469230/direct/01/
> >> >
> >> >
> >> > ------------------------------------------------------------------
> ---
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >>
> >> _________________________________________________________________
> >> Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> >> http://clk.atdmt.com/GBL/go/201469226/direct/01/
> >
> > _________________________________________________________________
> > Hotmail: Powerful Free email with security by Microsoft.
> > http://clk.atdmt.com/GBL/go/201469230/direct/01/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by Ian Lea <ia...@gmail.com>.

I think you'll have to write your own.  Or just downcase the text
yourself first.


--
Ian.


On Tue, Feb 2, 2010 at 9:30 PM, java8964 java8964 <ja...@hotmail.com> wrote:
>
> Is there an analyzer like keyword analyzer, but will also lowering the data from lucene? Or I have to do a customer analyzer by myself?
>
> Thanks
>
>> From: java8964@hotmail.com
>> To: java-user@lucene.apache.org
>> Subject: RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
>> Date: Mon, 1 Feb 2010 14:24:00 -0500
>>
>>
>> This is maybe something I am looking for. We are using the default value, which is true.
>>
>> Let me examine this method more.
>>
>> Thanks for your help.
>>
>> > From: digydigy@gmail.com
>> > To: java-user@lucene.apache.org
>> > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
>> > Date: Mon, 1 Feb 2010 20:36:29 +0200
>> >
>> > Did you try queryParser.SetLowercaseExpandedTerms(false)?
>> >
>> > DIGY
>> >
>> > -----Original Message-----
>> > From: java8964 java8964 [mailto:java8964@hotmail.com]
>> > Sent: Monday, February 01, 2010 8:11 PM
>> > To: java-user@lucene.apache.org
>> > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
>> > search string to lower case?
>> >
>> >
>> > I would like to confirm your reply. You mean that the query parse will lower
>> > casing. In fact, it looks like that it only does this for wild card query,
>> > right?
>> >
>> > For the term query, it didn't. As proved by if you change the line to:
>> >
>> >             Query query = new QueryParser("title",
>> > wrapper).parse("title:\"BBB CCC\"");
>> >
>> > You will get 1 hits back. So in this case, the query parser class did in
>> > different way for term query and wild card query.
>> >
>> > We have to use the query parse in this case, but we have our own Query
>> > parser class extends from the lucene query parser class. Anything we can do
>> > to about it?
>> >
>> > Will lucense's query parser class be fixed for the above inconsistent
>> > implementation?
>> >
>> > Thanks
>> >
>> >
>> > > From: uwe@thetaphi.de
>> > > To: java-user@lucene.apache.org
>> > > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
>> > search string to lower case?
>> > > Date: Mon, 1 Feb 2010 17:41:08 +0100
>> > >
>> > > Only query parser does the lower casing. For such a special case, I would
>> > suggest to use a PrefixQuery or WildcardQuery directly and not use query
>> > parser.
>> > >
>> > > -----
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > http://www.thetaphi.de
>> > > eMail: uwe@thetaphi.de
>> > >
>> > > > -----Original Message-----
>> > > > From: java8964 java8964 [mailto:java8964@hotmail.com]
>> > > > Sent: Monday, February 01, 2010 5:27 PM
>> > > > To: java-user@lucene.apache.org
>> > > > Subject: During the wild card search, will lucene 2.9.0 to convert the
>> > > > search string to lower case?
>> > > >
>> > > >
>> > > > I noticed a strange result from the following test case. For wildcard
>> > > > search, my understanding is that lucene will NOT use any analyzer on
>> > > > the query string. But as the following simple code to show, it looks
>> > > > like that lucene will lower case the search query in the wildcard
>> > > > search. Why? If not, why the following test case show the search hits
>> > > > as one for lower case wildcard search, but not for the upper case data?
>> > > > My original data is NOT analyzed, so they should be stored as the
>> > > > original data in the index segment, right?
>> > > >
>> > > > Lucene version: 2.9.0
>> > > >
>> > > > JDK version: JDK 1.6.0_17
>> > > >
>> > > >
>> > > > public class IndexTest1 {
>> > > >     public static void main(String[] args) {
>> > > >         try {
>> > > >             Directory directory = new RAMDirectory();
>> > > >             IndexWriter writer = new IndexWriter(directory, new
>> > > > StandardAnalyzer(Version.LUCENE_CURRENT),
>> > > > IndexWriter.MaxFieldLength.UNLIMITED);
>> > > >             Document doc = new Document();
>> > > >             doc.add(new Field("title", "BBB CCC", Field.Store.YES,
>> > > > Field.Index.NOT_ANALYZED));
>> > > >             writer.addDocument(doc);
>> > > >             doc = new Document();
>> > > >             doc.add(new Field("title", "ddd eee", Field.Store.YES,
>> > > > Field.Index.NOT_ANALYZED));
>> > > >             writer.addDocument(doc);
>> > > >
>> > > >             writer.close();
>> > > >
>> > > >             IndexSearcher searcher = new IndexSearcher(directory,
>> > > > true);
>> > > >             PerFieldAnalyzerWrapper wrapper = new
>> > > > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT));
>> > > >             wrapper.addAnalyzer("title", new KeywordAnalyzer());
>> > > >             Query query = new QueryParser("title",
>> > > >                     wrapper).parse("title:BBB*");
>> > > >             System.out.println("hits of title = " +
>> > > > searcher.search(query, 100).totalHits);
>> > > >             query = new QueryParser("title",
>> > > >                     wrapper).parse("title:ddd*");
>> > > >             System.out.println("hits of title = " +
>> > > > searcher.search(query, 100).totalHits);
>> > > >             searcher.close();
>> > > >         } catch (Exception e) {
>> > > >             System.out.println(e);
>> > > >         }
>> > > >     }
>> > > > }
>> > > >
>> > > > The output:
>> > > > hits of title = 0
>> > > > hits of title = 1
>> > > >
>> > > >
>> > > > _________________________________________________________________
>> > > > Hotmail: Trusted email with powerful SPAM protection.
>> > > > http://clk.atdmt.com/GBL/go/201469227/direct/01/
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> >
>> > _________________________________________________________________
>> > Hotmail: Powerful Free email with security by Microsoft.
>> > http://clk.atdmt.com/GBL/go/201469230/direct/01/
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> _________________________________________________________________
>> Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
>> http://clk.atdmt.com/GBL/go/201469226/direct/01/
>
> _________________________________________________________________
> Hotmail: Powerful Free email with security by Microsoft.
> http://clk.atdmt.com/GBL/go/201469230/direct/01/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by java8964 java8964 <ja...@hotmail.com>.

Is there an analyzer like keyword analyzer, but will also lowering the data from lucene? Or I have to do a customer analyzer by myself?

Thanks

> From: java8964@hotmail.com
> To: java-user@lucene.apache.org
> Subject: RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
> Date: Mon, 1 Feb 2010 14:24:00 -0500
> 
> 
> This is maybe something I am looking for. We are using the default value, which is true.
> 
> Let me examine this method more.
> 
> Thanks for your help.
> 
> > From: digydigy@gmail.com
> > To: java-user@lucene.apache.org
> > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
> > Date: Mon, 1 Feb 2010 20:36:29 +0200
> > 
> > Did you try queryParser.SetLowercaseExpandedTerms(false)?
> > 
> > DIGY
> > 
> > -----Original Message-----
> > From: java8964 java8964 [mailto:java8964@hotmail.com] 
> > Sent: Monday, February 01, 2010 8:11 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
> > search string to lower case?
> > 
> > 
> > I would like to confirm your reply. You mean that the query parse will lower
> > casing. In fact, it looks like that it only does this for wild card query,
> > right?
> > 
> > For the term query, it didn't. As proved by if you change the line to:
> > 
> >             Query query = new QueryParser("title",
> > wrapper).parse("title:\"BBB CCC\"");
> > 
> > You will get 1 hits back. So in this case, the query parser class did in
> > different way for term query and wild card query.
> > 
> > We have to use the query parse in this case, but we have our own Query
> > parser class extends from the lucene query parser class. Anything we can do
> > to about it?
> > 
> > Will lucense's query parser class be fixed for the above inconsistent
> > implementation?
> > 
> > Thanks
> > 
> > 
> > > From: uwe@thetaphi.de
> > > To: java-user@lucene.apache.org
> > > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
> > search string to lower case?
> > > Date: Mon, 1 Feb 2010 17:41:08 +0100
> > > 
> > > Only query parser does the lower casing. For such a special case, I would
> > suggest to use a PrefixQuery or WildcardQuery directly and not use query
> > parser.
> > > 
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > > 
> > > > -----Original Message-----
> > > > From: java8964 java8964 [mailto:java8964@hotmail.com]
> > > > Sent: Monday, February 01, 2010 5:27 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: During the wild card search, will lucene 2.9.0 to convert the
> > > > search string to lower case?
> > > > 
> > > > 
> > > > I noticed a strange result from the following test case. For wildcard
> > > > search, my understanding is that lucene will NOT use any analyzer on
> > > > the query string. But as the following simple code to show, it looks
> > > > like that lucene will lower case the search query in the wildcard
> > > > search. Why? If not, why the following test case show the search hits
> > > > as one for lower case wildcard search, but not for the upper case data?
> > > > My original data is NOT analyzed, so they should be stored as the
> > > > original data in the index segment, right?
> > > > 
> > > > Lucene version: 2.9.0
> > > > 
> > > > JDK version: JDK 1.6.0_17
> > > > 
> > > > 
> > > > public class IndexTest1 {
> > > >     public static void main(String[] args) {
> > > >         try {
> > > >             Directory directory = new RAMDirectory();
> > > >             IndexWriter writer = new IndexWriter(directory, new
> > > > StandardAnalyzer(Version.LUCENE_CURRENT),
> > > > IndexWriter.MaxFieldLength.UNLIMITED);
> > > >             Document doc = new Document();
> > > >             doc.add(new Field("title", "BBB CCC", Field.Store.YES,
> > > > Field.Index.NOT_ANALYZED));
> > > >             writer.addDocument(doc);
> > > >             doc = new Document();
> > > >             doc.add(new Field("title", "ddd eee", Field.Store.YES,
> > > > Field.Index.NOT_ANALYZED));
> > > >             writer.addDocument(doc);
> > > > 
> > > >             writer.close();
> > > > 
> > > >             IndexSearcher searcher = new IndexSearcher(directory,
> > > > true);
> > > >             PerFieldAnalyzerWrapper wrapper = new
> > > > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT));
> > > >             wrapper.addAnalyzer("title", new KeywordAnalyzer());
> > > >             Query query = new QueryParser("title",
> > > >                     wrapper).parse("title:BBB*");
> > > >             System.out.println("hits of title = " +
> > > > searcher.search(query, 100).totalHits);
> > > >             query = new QueryParser("title",
> > > >                     wrapper).parse("title:ddd*");
> > > >             System.out.println("hits of title = " +
> > > > searcher.search(query, 100).totalHits);
> > > >             searcher.close();
> > > >         } catch (Exception e) {
> > > >             System.out.println(e);
> > > >         }
> > > >     }
> > > > }
> > > > 
> > > > The output:
> > > > hits of title = 0
> > > > hits of title = 1
> > > > 
> > > > 
> > > > _________________________________________________________________
> > > > Hotmail: Trusted email with powerful SPAM protection.
> > > > http://clk.atdmt.com/GBL/go/201469227/direct/01/
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> >  		 	   		  
> > _________________________________________________________________
> > Hotmail: Powerful Free email with security by Microsoft.
> > http://clk.atdmt.com/GBL/go/201469230/direct/01/
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
>  		 	   		  
> _________________________________________________________________
> Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> http://clk.atdmt.com/GBL/go/201469226/direct/01/
 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by java8964 java8964 <ja...@hotmail.com>.

This is maybe something I am looking for. We are using the default value, which is true.

Let me examine this method more.

Thanks for your help.

> From: digydigy@gmail.com
> To: java-user@lucene.apache.org
> Subject: RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
> Date: Mon, 1 Feb 2010 20:36:29 +0200
> 
> Did you try queryParser.SetLowercaseExpandedTerms(false)?
> 
> DIGY
> 
> -----Original Message-----
> From: java8964 java8964 [mailto:java8964@hotmail.com] 
> Sent: Monday, February 01, 2010 8:11 PM
> To: java-user@lucene.apache.org
> Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
> search string to lower case?
> 
> 
> I would like to confirm your reply. You mean that the query parse will lower
> casing. In fact, it looks like that it only does this for wild card query,
> right?
> 
> For the term query, it didn't. As proved by if you change the line to:
> 
>             Query query = new QueryParser("title",
> wrapper).parse("title:\"BBB CCC\"");
> 
> You will get 1 hits back. So in this case, the query parser class did in
> different way for term query and wild card query.
> 
> We have to use the query parse in this case, but we have our own Query
> parser class extends from the lucene query parser class. Anything we can do
> to about it?
> 
> Will lucense's query parser class be fixed for the above inconsistent
> implementation?
> 
> Thanks
> 
> 
> > From: uwe@thetaphi.de
> > To: java-user@lucene.apache.org
> > Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
> search string to lower case?
> > Date: Mon, 1 Feb 2010 17:41:08 +0100
> > 
> > Only query parser does the lower casing. For such a special case, I would
> suggest to use a PrefixQuery or WildcardQuery directly and not use query
> parser.
> > 
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> > 
> > > -----Original Message-----
> > > From: java8964 java8964 [mailto:java8964@hotmail.com]
> > > Sent: Monday, February 01, 2010 5:27 PM
> > > To: java-user@lucene.apache.org
> > > Subject: During the wild card search, will lucene 2.9.0 to convert the
> > > search string to lower case?
> > > 
> > > 
> > > I noticed a strange result from the following test case. For wildcard
> > > search, my understanding is that lucene will NOT use any analyzer on
> > > the query string. But as the following simple code to show, it looks
> > > like that lucene will lower case the search query in the wildcard
> > > search. Why? If not, why the following test case show the search hits
> > > as one for lower case wildcard search, but not for the upper case data?
> > > My original data is NOT analyzed, so they should be stored as the
> > > original data in the index segment, right?
> > > 
> > > Lucene version: 2.9.0
> > > 
> > > JDK version: JDK 1.6.0_17
> > > 
> > > 
> > > public class IndexTest1 {
> > >     public static void main(String[] args) {
> > >         try {
> > >             Directory directory = new RAMDirectory();
> > >             IndexWriter writer = new IndexWriter(directory, new
> > > StandardAnalyzer(Version.LUCENE_CURRENT),
> > > IndexWriter.MaxFieldLength.UNLIMITED);
> > >             Document doc = new Document();
> > >             doc.add(new Field("title", "BBB CCC", Field.Store.YES,
> > > Field.Index.NOT_ANALYZED));
> > >             writer.addDocument(doc);
> > >             doc = new Document();
> > >             doc.add(new Field("title", "ddd eee", Field.Store.YES,
> > > Field.Index.NOT_ANALYZED));
> > >             writer.addDocument(doc);
> > > 
> > >             writer.close();
> > > 
> > >             IndexSearcher searcher = new IndexSearcher(directory,
> > > true);
> > >             PerFieldAnalyzerWrapper wrapper = new
> > > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT));
> > >             wrapper.addAnalyzer("title", new KeywordAnalyzer());
> > >             Query query = new QueryParser("title",
> > >                     wrapper).parse("title:BBB*");
> > >             System.out.println("hits of title = " +
> > > searcher.search(query, 100).totalHits);
> > >             query = new QueryParser("title",
> > >                     wrapper).parse("title:ddd*");
> > >             System.out.println("hits of title = " +
> > > searcher.search(query, 100).totalHits);
> > >             searcher.close();
> > >         } catch (Exception e) {
> > >             System.out.println(e);
> > >         }
> > >     }
> > > }
> > > 
> > > The output:
> > > hits of title = 0
> > > hits of title = 1
> > > 
> > > 
> > > _________________________________________________________________
> > > Hotmail: Trusted email with powerful SPAM protection.
> > > http://clk.atdmt.com/GBL/go/201469227/direct/01/
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
>  		 	   		  
> _________________________________________________________________
> Hotmail: Powerful Free email with security by Microsoft.
> http://clk.atdmt.com/GBL/go/201469230/direct/01/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469226/direct/01/

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by Digy <di...@gmail.com>.

Did you try queryParser.SetLowercaseExpandedTerms(false)?

DIGY

-----Original Message-----
From: java8964 java8964 [mailto:java8964@hotmail.com] 
Sent: Monday, February 01, 2010 8:11 PM
To: java-user@lucene.apache.org
Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
search string to lower case?


I would like to confirm your reply. You mean that the query parse will lower
casing. In fact, it looks like that it only does this for wild card query,
right?

For the term query, it didn't. As proved by if you change the line to:

            Query query = new QueryParser("title",
wrapper).parse("title:\"BBB CCC\"");

You will get 1 hits back. So in this case, the query parser class did in
different way for term query and wild card query.

We have to use the query parse in this case, but we have our own Query
parser class extends from the lucene query parser class. Anything we can do
to about it?

Will lucense's query parser class be fixed for the above inconsistent
implementation?

Thanks


> From: uwe@thetaphi.de
> To: java-user@lucene.apache.org
> Subject: RE: During the wild card search, will lucene 2.9.0 to convert the
search string to lower case?
> Date: Mon, 1 Feb 2010 17:41:08 +0100
> 
> Only query parser does the lower casing. For such a special case, I would
suggest to use a PrefixQuery or WildcardQuery directly and not use query
parser.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: java8964 java8964 [mailto:java8964@hotmail.com]
> > Sent: Monday, February 01, 2010 5:27 PM
> > To: java-user@lucene.apache.org
> > Subject: During the wild card search, will lucene 2.9.0 to convert the
> > search string to lower case?
> > 
> > 
> > I noticed a strange result from the following test case. For wildcard
> > search, my understanding is that lucene will NOT use any analyzer on
> > the query string. But as the following simple code to show, it looks
> > like that lucene will lower case the search query in the wildcard
> > search. Why? If not, why the following test case show the search hits
> > as one for lower case wildcard search, but not for the upper case data?
> > My original data is NOT analyzed, so they should be stored as the
> > original data in the index segment, right?
> > 
> > Lucene version: 2.9.0
> > 
> > JDK version: JDK 1.6.0_17
> > 
> > 
> > public class IndexTest1 {
> >     public static void main(String[] args) {
> >         try {
> >             Directory directory = new RAMDirectory();
> >             IndexWriter writer = new IndexWriter(directory, new
> > StandardAnalyzer(Version.LUCENE_CURRENT),
> > IndexWriter.MaxFieldLength.UNLIMITED);
> >             Document doc = new Document();
> >             doc.add(new Field("title", "BBB CCC", Field.Store.YES,
> > Field.Index.NOT_ANALYZED));
> >             writer.addDocument(doc);
> >             doc = new Document();
> >             doc.add(new Field("title", "ddd eee", Field.Store.YES,
> > Field.Index.NOT_ANALYZED));
> >             writer.addDocument(doc);
> > 
> >             writer.close();
> > 
> >             IndexSearcher searcher = new IndexSearcher(directory,
> > true);
> >             PerFieldAnalyzerWrapper wrapper = new
> > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT));
> >             wrapper.addAnalyzer("title", new KeywordAnalyzer());
> >             Query query = new QueryParser("title",
> >                     wrapper).parse("title:BBB*");
> >             System.out.println("hits of title = " +
> > searcher.search(query, 100).totalHits);
> >             query = new QueryParser("title",
> >                     wrapper).parse("title:ddd*");
> >             System.out.println("hits of title = " +
> > searcher.search(query, 100).totalHits);
> >             searcher.close();
> >         } catch (Exception e) {
> >             System.out.println(e);
> >         }
> >     }
> > }
> > 
> > The output:
> > hits of title = 0
> > hits of title = 1
> > 
> > 
> > _________________________________________________________________
> > Hotmail: Trusted email with powerful SPAM protection.
> > http://clk.atdmt.com/GBL/go/201469227/direct/01/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by Erik Hatcher <er...@gmail.com>.

QueryParser has a special capability to lowercase wildcard and prefix  
queries, simply because they are not passed to an analyzer.  Term  
queries, phrase queries (like your example), etc are passed on to the  
analyzer.  You are using the KeywordAnalyzer for the title field, and  
thus it is not lowercased.  Choose a different analyzer that  
lowercases and it will.

	Erik

On Feb 1, 2010, at 1:10 PM, java8964 java8964 wrote:

>
> I would like to confirm your reply. You mean that the query parse  
> will lower casing. In fact, it looks like that it only does this for  
> wild card query, right?
>
> For the term query, it didn't. As proved by if you change the line to:
>
>            Query query = new QueryParser("title",  
> wrapper).parse("title:\"BBB CCC\"");
>
> You will get 1 hits back. So in this case, the query parser class  
> did in different way for term query and wild card query.
>
> We have to use the query parse in this case, but we have our own  
> Query parser class extends from the lucene query parser class.  
> Anything we can do to about it?
>
> Will lucense's query parser class be fixed for the above  
> inconsistent implementation?
>
> Thanks
>
>
>> From: uwe@thetaphi.de
>> To: java-user@lucene.apache.org
>> Subject: RE: During the wild card search, will lucene 2.9.0 to  
>> convert the search string to lower case?
>> Date: Mon, 1 Feb 2010 17:41:08 +0100
>>
>> Only query parser does the lower casing. For such a special case, I  
>> would suggest to use a PrefixQuery or WildcardQuery directly and  
>> not use query parser.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>> -----Original Message-----
>>> From: java8964 java8964 [mailto:java8964@hotmail.com]
>>> Sent: Monday, February 01, 2010 5:27 PM
>>> To: java-user@lucene.apache.org
>>> Subject: During the wild card search, will lucene 2.9.0 to convert  
>>> the
>>> search string to lower case?
>>>
>>>
>>> I noticed a strange result from the following test case. For  
>>> wildcard
>>> search, my understanding is that lucene will NOT use any analyzer on
>>> the query string. But as the following simple code to show, it looks
>>> like that lucene will lower case the search query in the wildcard
>>> search. Why? If not, why the following test case show the search  
>>> hits
>>> as one for lower case wildcard search, but not for the upper case  
>>> data?
>>> My original data is NOT analyzed, so they should be stored as the
>>> original data in the index segment, right?
>>>
>>> Lucene version: 2.9.0
>>>
>>> JDK version: JDK 1.6.0_17
>>>
>>>
>>> public class IndexTest1 {
>>>    public static void main(String[] args) {
>>>        try {
>>>            Directory directory = new RAMDirectory();
>>>            IndexWriter writer = new IndexWriter(directory, new
>>> StandardAnalyzer(Version.LUCENE_CURRENT),
>>> IndexWriter.MaxFieldLength.UNLIMITED);
>>>            Document doc = new Document();
>>>            doc.add(new Field("title", "BBB CCC", Field.Store.YES,
>>> Field.Index.NOT_ANALYZED));
>>>            writer.addDocument(doc);
>>>            doc = new Document();
>>>            doc.add(new Field("title", "ddd eee", Field.Store.YES,
>>> Field.Index.NOT_ANALYZED));
>>>            writer.addDocument(doc);
>>>
>>>            writer.close();
>>>
>>>            IndexSearcher searcher = new IndexSearcher(directory,
>>> true);
>>>            PerFieldAnalyzerWrapper wrapper = new
>>> PerFieldAnalyzerWrapper(new  
>>> StandardAnalyzer(Version.LUCENE_CURRENT));
>>>            wrapper.addAnalyzer("title", new KeywordAnalyzer());
>>>            Query query = new QueryParser("title",
>>>                    wrapper).parse("title:BBB*");
>>>            System.out.println("hits of title = " +
>>> searcher.search(query, 100).totalHits);
>>>            query = new QueryParser("title",
>>>                    wrapper).parse("title:ddd*");
>>>            System.out.println("hits of title = " +
>>> searcher.search(query, 100).totalHits);
>>>            searcher.close();
>>>        } catch (Exception e) {
>>>            System.out.println(e);
>>>        }
>>>    }
>>> }
>>>
>>> The output:
>>> hits of title = 0
>>> hits of title = 1
>>>
>>>
>>> _________________________________________________________________
>>> Hotmail: Trusted email with powerful SPAM protection.
>>> http://clk.atdmt.com/GBL/go/201469227/direct/01/
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 		 	   		
> _________________________________________________________________
> Hotmail: Powerful Free email with security by Microsoft.
> http://clk.atdmt.com/GBL/go/201469230/direct/01/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by java8964 java8964 <ja...@hotmail.com>.

I would like to confirm your reply. You mean that the query parse will lower casing. In fact, it looks like that it only does this for wild card query, right?

For the term query, it didn't. As proved by if you change the line to:

            Query query = new QueryParser("title", wrapper).parse("title:\"BBB CCC\"");

You will get 1 hits back. So in this case, the query parser class did in different way for term query and wild card query.

We have to use the query parse in this case, but we have our own Query parser class extends from the lucene query parser class. Anything we can do to about it?

Will lucense's query parser class be fixed for the above inconsistent implementation?

Thanks


> From: uwe@thetaphi.de
> To: java-user@lucene.apache.org
> Subject: RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?
> Date: Mon, 1 Feb 2010 17:41:08 +0100
> 
> Only query parser does the lower casing. For such a special case, I would suggest to use a PrefixQuery or WildcardQuery directly and not use query parser.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: java8964 java8964 [mailto:java8964@hotmail.com]
> > Sent: Monday, February 01, 2010 5:27 PM
> > To: java-user@lucene.apache.org
> > Subject: During the wild card search, will lucene 2.9.0 to convert the
> > search string to lower case?
> > 
> > 
> > I noticed a strange result from the following test case. For wildcard
> > search, my understanding is that lucene will NOT use any analyzer on
> > the query string. But as the following simple code to show, it looks
> > like that lucene will lower case the search query in the wildcard
> > search. Why? If not, why the following test case show the search hits
> > as one for lower case wildcard search, but not for the upper case data?
> > My original data is NOT analyzed, so they should be stored as the
> > original data in the index segment, right?
> > 
> > Lucene version: 2.9.0
> > 
> > JDK version: JDK 1.6.0_17
> > 
> > 
> > public class IndexTest1 {
> >     public static void main(String[] args) {
> >         try {
> >             Directory directory = new RAMDirectory();
> >             IndexWriter writer = new IndexWriter(directory, new
> > StandardAnalyzer(Version.LUCENE_CURRENT),
> > IndexWriter.MaxFieldLength.UNLIMITED);
> >             Document doc = new Document();
> >             doc.add(new Field("title", "BBB CCC", Field.Store.YES,
> > Field.Index.NOT_ANALYZED));
> >             writer.addDocument(doc);
> >             doc = new Document();
> >             doc.add(new Field("title", "ddd eee", Field.Store.YES,
> > Field.Index.NOT_ANALYZED));
> >             writer.addDocument(doc);
> > 
> >             writer.close();
> > 
> >             IndexSearcher searcher = new IndexSearcher(directory,
> > true);
> >             PerFieldAnalyzerWrapper wrapper = new
> > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT));
> >             wrapper.addAnalyzer("title", new KeywordAnalyzer());
> >             Query query = new QueryParser("title",
> >                     wrapper).parse("title:BBB*");
> >             System.out.println("hits of title = " +
> > searcher.search(query, 100).totalHits);
> >             query = new QueryParser("title",
> >                     wrapper).parse("title:ddd*");
> >             System.out.println("hits of title = " +
> > searcher.search(query, 100).totalHits);
> >             searcher.close();
> >         } catch (Exception e) {
> >             System.out.println(e);
> >         }
> >     }
> > }
> > 
> > The output:
> > hits of title = 0
> > hits of title = 1
> > 
> > 
> > _________________________________________________________________
> > Hotmail: Trusted email with powerful SPAM protection.
> > http://clk.atdmt.com/GBL/go/201469227/direct/01/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

Posted by Uwe Schindler <uw...@thetaphi.de>.

Only query parser does the lower casing. For such a special case, I would suggest to use a PrefixQuery or WildcardQuery directly and not use query parser.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: java8964 java8964 [mailto:java8964@hotmail.com]
> Sent: Monday, February 01, 2010 5:27 PM
> To: java-user@lucene.apache.org
> Subject: During the wild card search, will lucene 2.9.0 to convert the
> search string to lower case?
> 
> 
> I noticed a strange result from the following test case. For wildcard
> search, my understanding is that lucene will NOT use any analyzer on
> the query string. But as the following simple code to show, it looks
> like that lucene will lower case the search query in the wildcard
> search. Why? If not, why the following test case show the search hits
> as one for lower case wildcard search, but not for the upper case data?
> My original data is NOT analyzed, so they should be stored as the
> original data in the index segment, right?
> 
> Lucene version: 2.9.0
> 
> JDK version: JDK 1.6.0_17
> 
> 
> public class IndexTest1 {
>     public static void main(String[] args) {
>         try {
>             Directory directory = new RAMDirectory();
>             IndexWriter writer = new IndexWriter(directory, new
> StandardAnalyzer(Version.LUCENE_CURRENT),
> IndexWriter.MaxFieldLength.UNLIMITED);
>             Document doc = new Document();
>             doc.add(new Field("title", "BBB CCC", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>             writer.addDocument(doc);
>             doc = new Document();
>             doc.add(new Field("title", "ddd eee", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>             writer.addDocument(doc);
> 
>             writer.close();
> 
>             IndexSearcher searcher = new IndexSearcher(directory,
> true);
>             PerFieldAnalyzerWrapper wrapper = new
> PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_CURRENT));
>             wrapper.addAnalyzer("title", new KeywordAnalyzer());
>             Query query = new QueryParser("title",
>                     wrapper).parse("title:BBB*");
>             System.out.println("hits of title = " +
> searcher.search(query, 100).totalHits);
>             query = new QueryParser("title",
>                     wrapper).parse("title:ddd*");
>             System.out.println("hits of title = " +
> searcher.search(query, 100).totalHits);
>             searcher.close();
>         } catch (Exception e) {
>             System.out.println(e);
>         }
>     }
> }
> 
> The output:
> hits of title = 0
> hits of title = 1
> 
> 
> _________________________________________________________________
> Hotmail: Trusted email with powerful SPAM protection.
> http://clk.atdmt.com/GBL/go/201469227/direct/01/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org