You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/08/21 17:06:19 UTC

DO NOT REPLY [Bug 30785] New: - German Analyzer does not handle search terms with asterisks

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=30785>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=30785

German Analyzer does not handle search terms with asterisks

           Summary: German Analyzer does not handle search terms with
                    asterisks
           Product: Lucene
           Version: 1.4
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Search
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: kenneth_aitken@freenet.de


I created a test set of text files with special German characters in them and in
UTF-8 format, which I then indexed using GermanAnalyzer and an adapted Lucene
demo program - IndexFiles.java. But the QueryParser in the demo SearchFiles.java
always returns the search term containing the original German umlauts or sz
letters, whenever I use the wildcard asterisk(*). It does not replace the
umlauts and sz letters as I would expect it to do before performing a search.
Examples: Using the GermanAnalyzer, QueryParser returns these words in lower
case, but with the German umlaut letters unchanged in the parsed queries:
Bürger*, Schlüssel*, Währ*, Straß*, herkömm, städt*, Ä*, Ö*, Ü*.
(When the above words appear with broken letters, here they are in ASCII format:
Buerger*, Schluessel*, Waehr*, Strasz*, herkoemm, staedt*, AE*, OE*, UE*).
This is leading to Lucene exceptions such as BooleanQuery.TooManyClauses in our
system that uses Lucene.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org