You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/02/07 19:57:51 UTC
DO NOT REPLY [Bug 26763] New: -
[PATCH] Language guesser contribution
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26763>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26763
[PATCH] Language guesser contribution
Summary: [PATCH] Language guesser contribution
Product: Lucene
Version: unspecified
Platform: Other
OS/Version: Other
Status: NEW
Severity: Enhancement
Priority: Other
Component: Other
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: halleux.jf@skynet.be
Hello,
I'd like to contribute this language guesser to Lucene.
It contains language guessing interfaces and classes as well as trigram
specific classes and some language reference files I generated myself using the
trigram file generation utily in there. I included a unit test as well.
I didn't do any extensive tests on guessing quality and performance but I would
tend to think that they are both OK for a first pass.
I thought about writing a custom Analyzer for this but realized that this
wouldn't be the way to go and that probably the language decision should be
left to the developper, definitely when the Analyzer is used to tokenize a
query.
Have fun,
Jean-François Halleux
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org