You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Xiaolong Zheng <zh...@gmail.com> on 2016/06/15 19:32:20 UTC

How to prevent WordDelimiterFilter tokenize the string with underscore?

Hi,

How can I prevent WordDelimiterFilter tokenize the string with underscore,
e.g. word_with_underscore.

I am using WordDelimiterFilter to create my own Camel Case analyzer, I was
using the configuration flag:

flags |= GENERATE_WORD_PARTS;
flags |= SPLIT_ON_CASE_CHANGE;
flags |= PRESERVE_ORIGINAL;


But I realize that one of the side effect for using the
SPLIT_ON_CASE_CHANGE is it also tokenize the string with underscore.

I am wondering how can I prevent it to tokenize the string with underscores?




Sincerely,

--Xiaolong

Re: How to prevent WordDelimiterFilter tokenize the string with underscore?

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi,

You can supply custom types. 
please see WordDelimiterFilterFactory and wdfftypes.txt for an example.

ahmet


On Wednesday, June 15, 2016 10:32 PM, Xiaolong Zheng <zh...@gmail.com> wrote:
Hi,

How can I prevent WordDelimiterFilter tokenize the string with underscore,
e.g. word_with_underscore.

I am using WordDelimiterFilter to create my own Camel Case analyzer, I was
using the configuration flag:

flags |= GENERATE_WORD_PARTS;
flags |= SPLIT_ON_CASE_CHANGE;
flags |= PRESERVE_ORIGINAL;


But I realize that one of the side effect for using the
SPLIT_ON_CASE_CHANGE is it also tokenize the string with underscore.

I am wondering how can I prevent it to tokenize the string with underscores?




Sincerely,

--Xiaolong

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org