You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Bernhard Kraft <kr...@webconsulting.at> on 2014/01/14 22:56:02 UTC

RFC: Handling tokens with keyword attribute

Hello,

First of all: Hello. I am new to this list.

For some time now we use solr to enable search on websites we implement. 
After implementing some custom token filters I noticed 
"KeywordAttribute" is not properly handled in some token filters.

The SynonymFilter or LowercaseFilter for exampel simply ignore tokens 
having the "keywordAttribute" set. Is this for purpose or is it a bug?


I already saw a bugfix for one of those issues [1] and proposed an 
alternate solution there. My solution would involve creating a wrapper 
object (proxy pattern) for every token filter (TokenStream 
implementation) and alter the chain of responsibility implemented by the 
filter chain by allowing to skip some filters depending on the wrapper.

This would allow handling stuff like keywordAttribute which should get 
handled in the same way by every filter except filters which 
intentionally act on tokens with keywordAttribute set (I would call them 
"KeywordAwareTokenFilter")



[1] https://issues.apache.org/jira/browse/LUCENE-3236


greetings,
Bernhard
-- 
Ing. HTL Bernhard Kraft - Entwickler
webconsulting business services gmbh
W: https://webconsulting.at

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org