You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by lukes <ma...@gmail.com> on 2016/11/18 22:25:40 UTC

Exclusion List for standard tokenizer

Hi,

  Is there any exclusion list of characters which can be defined for
StandardTokenizer ? In my case, i want to use StandardTokenizer(as it solves
many problems of when to tokenization across languages) but i don't want to
tokenize the stream on certain characters for example '@'. Is there a way i
can provide that input to StandardTokenizer ? I tried to look into the
source code, but seems to got lost. Any pointer is really appreciated. 

Regards.



--
View this message in context: http://lucene.472066.n3.nabble.com/Exclusion-List-for-standard-tokenizer-tp4306511.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Exclusion List for standard tokenizer

Posted by lukes <ma...@gmail.com>.

Actually ClassicTokenizer seems to do the job. Any side effects of using
ClassicTokenizer rather than StandardTokenizer ?

Regards.



--
View this message in context: http://lucene.472066.n3.nabble.com/Exclusion-List-for-standard-tokenizer-tp4306511p4306516.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org