You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Michael Sokolov <ms...@gmail.com> on 2018/04/04 13:15:33 UTC

WordDelimiterFilter javadocs are off base

The javadocs for both WDF and WDGF include a pretty detailed discussion
about the proper use of the "combinations" parameter, but no such parameter
exists. I don't know the history here, but it sounds as if the docs might
be referring to some previous incarnation of this filter, perhaps in the
context of some (now-defunct) Solr configuration.

I think it sounds as if there is some sound wisdom underlying the advice in
the docs that is worth preserving, but it needs to be updated to match the
current state of the code. I can take a stab at rewriting, but I want to
make sure I understand the intent of the comment there.

Essentially what it is saying is that a typical usage of WD(G)F is an
asymmetric setup where splitting and subsequent token generation is done
when indexing, but something less aggreesive (at least no generation, maybe
also no splitting) is done when querying. I would probably recommend simply
omitting this filter from query-side analysis. Is there a consensus on the
best way to use this filter today?

-Mike