You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by none none <ko...@lycos.com> on 2003/07/17 18:52:56 UTC

Re: interesting phrase query issue

i believe that looking for "access manager" should return no hits, if the document has "access, the manager" because the document is different, i know there is a stop word between, so my opinion is skip "the" and all the stop words at Search level rather then Index level,(google does that) but index them anyway.

korfut

--

--------- Original Message ---------

DATE: Thu, 17 Jul 2003 07:53:06
From: Tatu Saloranta <ta...@hypermall.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Cc: 

>On Thursday 17 July 2003 07:20, greg wrote:
>> I have several document sections that are being indexed via the
>> StandardAnalyzer.  One of these documents has the line "access, the
>> manager".  When searching for the phrase "access manager", this document is
>> being returned.  I understand why (at least i think i do), because a stop
>> word is "the" and the "," is being removed by the tokenizer, my question is
>> is there any way I can avoid having this returned in the results?  My
>> thoughts were to create a new analyzer that indexes the word "the" (blick
>> to many of those), or index the "," in some way (also not good).  Any
>> suggestions?
>
>You can also replace all stop words with "dummy" token ("" might be an ok 
>candidate?). That would be similar to indexing "the" (which probably is  
>better idea than indexing ",").
>
>I'm planning to do something similar for paragraph breaks (in case of plain 
>text, double linefeed, for HTML <p> etc), to prevent similar problems.
>
>-+ Tatu +-
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>



____________________________________________________________
Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org