You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Krishnendra Nandi <kr...@hewitt.com> on 2006/11/20 13:54:15 UTC

Fw: Urgent : Specific search problem with whitespace analyzer

Hi,

I am doing "field:text" kind of search using my own analyzer which behaves 
like whitespaceanalyzer. Following are the code snippets for my own 
whitespaceanalyzer and whitespacetokenizer.


// WhiteSpaceAnalyzerMaestro.java
package com.hewitt.itk.maestro.support.service.simplesearch;

import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;

/** An Analyzer that uses WhitespaceTokenizer. */

public final class WhitespaceAnalyzerMaestro extends Analyzer {
  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new WhitespaceTokenizerMaestro(reader);
  }
} 



// WhitespaceTokenizerMaestro.java
package com.hewitt.itk.maestro.support.service.simplesearch;

import java.io.Reader;

import org.apache.lucene.analysis.WhitespaceTokenizer;

/** A WhitespaceTokenizerMaestro is a tokenizer that divides text at 
whitespace.
 * Adjacent sequences of non-Whitespace characters form tokens. */

public class WhitespaceTokenizerMaestro extends WhitespaceTokenizer {
  /** Construct a new WhitespaceTokenizerMaestro. */
  public WhitespaceTokenizerMaestro(Reader in) {
    super(in);
  }

  /** Collects only characters which do not satisfy
   * {@link Character#isWhitespace(char)} 
   * and lowercases that character before returning.*/
  protected boolean isTokenChar(char c) {
        c = Character.toLowerCase(c); 
    return !Character.isWhitespace(c);
  }
}



I have modified the tokenizer class by making it return characters in 
lower case.

Now my search criteria is  ISSUE_TITLE:test  in which  ISSUE_TITLE is the 
field in which test is to be searched. 

Following is my code snippet which is doing the search:

BooleanQuery masterQuery = new BooleanQuery();
 
 masterQuery.add(MultiFieldQueryParser.parse(
                                                        searchQuery,
                                                        fields,
                                                        analyzer),
                            REQUIRED,
                            PROHIBITED);

Here the searchquery is   ISSUE_TITLE:test , fields is the array of fields 
in which ISSUE_TITLE is one of the fields and analyzer is 
WhitespaceAnalyzerMaestro() (already mentioned above).

When I run the search, the masterQuery I get after running the above code 
snippet has the following value: 
+(ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test*)

which I think is not correct. Is the MultiFieldQueryParser not supporting 
WhiteSpaceAnalyzer?

Please help.

Regards
Krishnendra Nandi

 
The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient 
is strictly prohibited.



Re: Fw: Urgent : Specific search problem with whitespace analyzer

Posted by Chris Hostetter <ho...@fucit.org>.
: I have modified the tokenizer class by making it return characters in
: lower case.

there is really no reason to do this ... have your analyzer use the
WhitespaceTokenizer, wrapped in a LowerCaseFilter ... that will illiminate
some of your custom code, and perhaps some of your problems as well.

regarding the rest of your code...

:  masterQuery.add(MultiFieldQueryParser.parse(
:                                                         searchQuery,
:                                                         fields,
:                                                         analyzer),
:                             REQUIRED,
:                             PROHIBITED);
:
: Here the searchquery is   ISSUE_TITLE:test , fields is the array of fields
: in which ISSUE_TITLE is one of the fields and analyzer is
: WhitespaceAnalyzerMaestro() (already mentioned above).

...there is a lot going on here, some of which you haven't included so we
can't be sure what exactly it is...
  1) have you tested your analyzer in isolation to ensure that it's
     working properly outside of a QueryParser?
  2) have you tried it with a plain QueryParser instead of a MultiFieldQueryParser?
  3) have you verified that fields really contains what you think it does?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Fw: Urgent : Specific search problem with whitespace analyzer

Posted by Daniel Naber <lu...@danielnaber.de>.
On Monday 20 November 2006 13:54, Krishnendra Nandi wrote:

> When I run the search, the masterQuery I get after running the above
> code snippet has the following value:
> +(ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*
> ISSUE_TITLE:test*

Could you make a small self-contained test case that demonstrates this? 
This would help analyzing the problem. Also, yre you using Lucene 1.4? 
Have you tried to update?

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org