You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jim Downing <ji...@pobox.com> on 2004/07/09 17:36:23 UTC

Underscore tokenization

Hi,

I'm trying to put together an Analyzer that doesn't separate tokens on
the underscore character. What's the best / easiest way to achieve this?

I've tried removing the references to char code 95 in
StandardTokenizerTokenManager, but it doesn't seem to cut the mustard.
Should I be looking at modifying StandardTokenizer.jj and having javacc
generate my own tokenizer classes?

thanks,
jim

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org