You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Grant Ingersoll <gs...@syr.edu> on 2004/07/20 21:23:33 UTC
Tokenizers and java.text.BreakIterator
Hi,
Was wondering if anyone uses java.text.BreakIterator#getWordInstance(Locale) as a tokenizer for various languages? Does it do a good job? It seems like it does, at least for languages where words are separated by spaces or punctuation, but I have only done simple tests.
Anyone have any thoughts on this? What am I missing? Does this seem like a valid approach?
Thanks,
Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org