You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by none none <ko...@lycos.com> on 2003/07/17 18:52:56 UTC
Re: interesting phrase query issue
i believe that looking for "access manager" should return no hits, if the document has "access, the manager" because the document is different, i know there is a stop word between, so my opinion is skip "the" and all the stop words at Search level rather then Index level,(google does that) but index them anyway.
korfut
--
--------- Original Message ---------
DATE: Thu, 17 Jul 2003 07:53:06
From: Tatu Saloranta <ta...@hypermall.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Cc:
>On Thursday 17 July 2003 07:20, greg wrote:
>> I have several document sections that are being indexed via the
>> StandardAnalyzer. One of these documents has the line "access, the
>> manager". When searching for the phrase "access manager", this document is
>> being returned. I understand why (at least i think i do), because a stop
>> word is "the" and the "," is being removed by the tokenizer, my question is
>> is there any way I can avoid having this returned in the results? My
>> thoughts were to create a new analyzer that indexes the word "the" (blick
>> to many of those), or index the "," in some way (also not good). Any
>> suggestions?
>
>You can also replace all stop words with "dummy" token ("" might be an ok
>candidate?). That would be similar to indexing "the" (which probably is
>better idea than indexing ",").
>
>I'm planning to do something similar for paragraph breaks (in case of plain
>text, double linefeed, for HTML <p> etc), to prevent similar problems.
>
>-+ Tatu +-
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
____________________________________________________________
Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org