You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2006/11/30 23:44:22 UTC

[jira] Resolved: (LUCENE-733) problems with some non word ascii characters in searchs

     [ http://issues.apache.org/jira/browse/LUCENE-733?page=all ]

Hoss Man resolved LUCENE-733.
-----------------------------

    Resolution: Invalid

The situation described is very likely depending on the Analyzers used when indexing the source text, and when parsing the query ... without specific code demonstrating exactly what analysers were used, there isn't really any evidence of a "bug"

When getting unexpected results back from a Lucene search, please consults the user mailing list before submitting a bug ... the number of people reading/replying to the user list who can provide assistence in understanding the results you are getting is much larger then the number of people watching the Jira issue queue.

> problems with some non word ascii characters in searchs
> -------------------------------------------------------
>
>                 Key: LUCENE-733
>                 URL: http://issues.apache.org/jira/browse/LUCENE-733
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser, Search
>            Reporter: Neil Despain
>
> Here are a number of examples of searches that are not acting as I would expect.
> 1.
> ---------
> I have a document with the text:
> Smith, Bob
> 1.a
> If I do a search:
> Smith,~0.9 Bob~0.9
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:smith,~0.9 content:bob~0.9
> But it only gets a hit on: Bob
> 1.b
> If I do this search:
> "Smith,~0.9 Bob~0.9"~1
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:"bob"~1
> and it also only returns a hit for: Bob
> In both cases words that end with a comma are not found. (other characters have the same affect as commas)
> =========
> 2.
> ---------
> For a document with phone numbers:
> 2124225100
> 212 422 5100
> 212-422-5100
> (212) 422-5100
> (212)4225100
> (212)422-5100
> (212) 422.5100
> (212) 422 5100
> 212.422.5100
> 212.422-5100
> 2.a
> If I do a search:
> 212*422*5100~0.9
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:"(212.422-5100 212-422-5100 2124225100 212.422.5100)"
> I do not get a match on 212)422-5100 -- Doesn't find anything that starts with (212)...
> 2.b
> Search term:
> 212*422*5100
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:212*422*5100
> and does not match 212)422-5100 -- Doesn't find anything that starts with (212)...
> 2.c
> If I try to work around that by searching with proximity for:
> "212 422*5100"~1
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:"(422-5100 422.5100 4225100)"~1
> and again does not find anything with (212)... like (212) 422-5100 or (212)422-5100
> =========

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org