You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Xiaolong Zheng <zh...@gmail.com> on 2016/06/15 19:32:20 UTC
How to prevent WordDelimiterFilter tokenize the string with underscore?
Hi,
How can I prevent WordDelimiterFilter tokenize the string with underscore,
e.g. word_with_underscore.
I am using WordDelimiterFilter to create my own Camel Case analyzer, I was
using the configuration flag:
flags |= GENERATE_WORD_PARTS;
flags |= SPLIT_ON_CASE_CHANGE;
flags |= PRESERVE_ORIGINAL;
But I realize that one of the side effect for using the
SPLIT_ON_CASE_CHANGE is it also tokenize the string with underscore.
I am wondering how can I prevent it to tokenize the string with underscores?
Sincerely,
--Xiaolong
Re: How to prevent WordDelimiterFilter tokenize the string with
underscore?
Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi,
You can supply custom types.
please see WordDelimiterFilterFactory and wdfftypes.txt for an example.
ahmet
On Wednesday, June 15, 2016 10:32 PM, Xiaolong Zheng <zh...@gmail.com> wrote:
Hi,
How can I prevent WordDelimiterFilter tokenize the string with underscore,
e.g. word_with_underscore.
I am using WordDelimiterFilter to create my own Camel Case analyzer, I was
using the configuration flag:
flags |= GENERATE_WORD_PARTS;
flags |= SPLIT_ON_CASE_CHANGE;
flags |= PRESERVE_ORIGINAL;
But I realize that one of the side effect for using the
SPLIT_ON_CASE_CHANGE is it also tokenize the string with underscore.
I am wondering how can I prevent it to tokenize the string with underscores?
Sincerely,
--Xiaolong
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org