You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lox <lo...@gmail.com> on 2011/07/04 09:25:22 UTC
Payload doesn't apply to WordDelimiterFilterFactory-generated
tokens
Hi, I have a problem with the WordDelimiterFilterFactory and the
DelimitedPayloadTokenFilterFactory.
It seems that the payloads are applied only to the original word that I
index and the WordDelimiterFilter doesn't apply the payloads to the tokens
it generates.
For example, imagine I index the string JavaProject|1.7,
at the end of my analyzer pipeline will be transformed like this:
JavaProject|1.7 -----> javaproject|1.7 java project
Instead, what I would is a result like this:
JavaProject|1.7 -----> javaproject|1.7 java|1.7 project|1.7
This way the payload would be applied to the document even in case of
partial matches on the original word.
Now I have used the pipe notation but imagine those payloads already stored
in solr internally.
How can I do this?
If it is needed, my analyzer looks like this:
<fieldType name="text_C" class="solr.TextField" positionIncrementGap="100"
stored="false" indexed="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="^[a-z]{2,5}[0-9]{1,4}?([.]|[a-z])?(.*)"
replacement="" replace="all" />
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1"
generateNumberParts="1"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="1" max="30" />
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
.
.
.
Thank you.
--
View this message in context: http://lucene.472066.n3.nabble.com/Payload-doesn-t-apply-to-WordDelimiterFilterFactory-generated-tokens-tp3136748p3136748.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Payload doesn't apply to WordDelimiterFilterFactory-generated
tokens
Posted by Chris Hostetter <ho...@fucit.org>.
: It seems that the payloads are applied only to the original word that I
: index and the WordDelimiterFilter doesn't apply the payloads to the tokens
: it generates.
I believe you are correct. I think the general rule for most TokenFilters
that you will find in Lucene/Solr is that they don't typically "clone"
attributes (like payloads) when generating new Tokens -- it may be what
you want in your use case, but there's no hard & fast rule that it would
always make sense to do so.
If you'd like to opne a jira (or submit a patch) i suspect a new
"clonePayload" attribute could be added to the WDF Factory to drive this
kind of behavior so people with use cases where it made sense could enable
this -- but i haven't looked at that code (or the current TokenStream API)
enough to have any idea how hard it would be.
-Hoss