You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/08/21 17:06:19 UTC
DO NOT REPLY [Bug 30785] New: -
German Analyzer does not handle search terms with asterisks
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=30785>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=30785
German Analyzer does not handle search terms with asterisks
Summary: German Analyzer does not handle search terms with
asterisks
Product: Lucene
Version: 1.4
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: Major
Priority: Other
Component: Search
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: kenneth_aitken@freenet.de
I created a test set of text files with special German characters in them and in
UTF-8 format, which I then indexed using GermanAnalyzer and an adapted Lucene
demo program - IndexFiles.java. But the QueryParser in the demo SearchFiles.java
always returns the search term containing the original German umlauts or sz
letters, whenever I use the wildcard asterisk(*). It does not replace the
umlauts and sz letters as I would expect it to do before performing a search.
Examples: Using the GermanAnalyzer, QueryParser returns these words in lower
case, but with the German umlaut letters unchanged in the parsed queries:
Bürger*, Schlüssel*, Währ*, Straß*, herkömm, städt*, Ä*, Ö*, Ü*.
(When the above words appear with broken letters, here they are in ASCII format:
Buerger*, Schluessel*, Waehr*, Strasz*, herkoemm, staedt*, AE*, OE*, UE*).
This is leading to Lucene exceptions such as BooleanQuery.TooManyClauses in our
system that uses Lucene.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org