You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Karl Heinz Marbaise <kh...@gmx.de> on 2009/01/27 20:29:55 UTC

Lucene 2.4 - Searching

Hi there,

I'm trying to do a, from my point of view, simple thing.

I would like to do a search ignoring the case of the stored information 
in the index...with the following code:

reader = IndexReader.open(indexDirectory);
	    	
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();

//Created my own Query parse to handle ranges like filed:[1 TO 6]
QueryParser parser = new CustomQueryParser(FieldNames.CONTENTS, analyzer);
parser.setAllowLeadingWildcard(true);
parser.setLowercaseExpandedTerms(false);
Query query = parser.parse(queryLine);

TopDocs tmp = searcher.search(query, null, 20, sort);

To be more percisely...

I have a field which is called filename and contains a filename which 
can of course be lowercase or upppercase or a mixture...

I would like to do the following:

+filename:/*scm*.doc

That should result in getting things like

/...SCMtest.doc
/...scmtest.doc
/...scm.doc
etc.

May be someone can give me hint how to solve this...

kind regards
Karl Heinz Marbaise
-- 
SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
Hauptstrasse 177                         USt.IdNr: DE191347579
52146 Würselen                           http://www.soebes.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 2.4 - Searching

Posted by Antony Bowesman <ad...@teamware.com>.

Karl Heinz Marbaise wrote:
> 
> I have a field which is called filename and contains a filename which 
> can of course be lowercase or upppercase or a mixture...
> 
> I would like to do the following:
> 
> +filename:/*scm*.doc
> 
> That should result in getting things like
> 
> /...SCMtest.doc
> /...scmtest.doc
> /...scm.doc
> etc.
> 
> May be someone can give me hint how to solve this...

It's all down to the analyzer you use when you index that field and how you 
choose to tokenize it.  If you want to always search case insensitively, then 
you should lower case the filename when indexing.

Depending on how you implemented your query parser, if you have implemented 
wildcard query support, if it's anything like the standard QP, it will not put 
the query string through the analyzer, so a search for

+filename:/*SCm*.doc

would then not find anything, so you'd need to make sure you lower case all the 
filename field searches at some point.

I use a custom analyzer for filenames, which lower cases and tokenizes by letter 
or digit or any custom chars and my query parser supports custom analyzers for 
getFieldQuery().

If you want to keep the original filename, then just store the field as well as 
index it, then you can get the original back from the Document.

Antony

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 2.4 - Searching

Posted by Ian Lea <ia...@gmail.com>.

Hi


Sounds like a job for RegexQuery.  If you can't figure out how to use
it Google will throw up some examples.  You can downcase everything
yourself or use an analyzer that does it or maybe use a case
insensitive regexp.

Depending on your file names you might want to avoid StandardAnalyzer.
 It is likely to split them.  KeywordAnalyzer might be what you want.


--
Ian.


On Tue, Jan 27, 2009 at 7:29 PM, Karl Heinz Marbaise <kh...@gmx.de> wrote:
> Hi there,
>
> I'm trying to do a, from my point of view, simple thing.
>
> I would like to do a search ignoring the case of the stored information in
> the index...with the following code:
>
> reader = IndexReader.open(indexDirectory);
>
> Searcher searcher = new IndexSearcher(reader);
> Analyzer analyzer = new StandardAnalyzer();
>
> //Created my own Query parse to handle ranges like filed:[1 TO 6]
> QueryParser parser = new CustomQueryParser(FieldNames.CONTENTS, analyzer);
> parser.setAllowLeadingWildcard(true);
> parser.setLowercaseExpandedTerms(false);
> Query query = parser.parse(queryLine);
>
> TopDocs tmp = searcher.search(query, null, 20, sort);
>
> To be more percisely...
>
> I have a field which is called filename and contains a filename which can of
> course be lowercase or upppercase or a mixture...
>
> I would like to do the following:
>
> +filename:/*scm*.doc
>
> That should result in getting things like
>
> /...SCMtest.doc
> /...scmtest.doc
> /...scm.doc
> etc.
>
> May be someone can give me hint how to solve this...
>
> kind regards
> Karl Heinz Marbaise
> --
> SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
> Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
> Hauptstrasse 177                         USt.IdNr: DE191347579
> 52146 Würselen                           http://www.soebes.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org