You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dmitry Kan <dm...@gmail.com> on 2015/06/04 13:48:38 UTC

TokenOrderingFilter

Hi guys,

Sorry for sending questions to the dev list and not to the user one.
Somehow I'm getting more luck here.

We have found the class o.a.solr.highlight.TokenOrderingFilter
with the following comment:


-/**

   - * Orders Tokens in a window first by their startOffset ascending.

   - * endOffset is currently ignored.

   - * This is meant to work around fickleness in the highlighter only.  It

   - * can mess up token positions and should not be used for indexing
or querying.

   - */

   -final class TokenOrderingFilter extends TokenFilter {

In fact, removing this class didn't change the behaviour of the highlighter.

Could anybody shed light on its necessity?

Thanks,

Dmitry Kan

Re: TokenOrderingFilter

Posted by Dmitry Kan <dm...@gmail.com>.
Hi David,

Thanks for your quick reply.

In fact, we do use WDF in 4.10.2. It very much looks as you explain, that
the offsets are preserved in the monotonically increasing order. Here is
the list of filters we use on the indexing side:

solr.MappingCharFilterFactory

solr.StandardTokenizerFactory

solr.StandardFilterFactory

solr.WordDelimiterFilterFactory

solr.LowerCaseFilterFactory

custom filters that do not mingle with the order of the offsets.




On 4 June 2015 at 18:35, david.w.smiley@gmail.com <da...@gmail.com>
wrote:

> Hi Dmitry,
>
> Ideally, the token stream produces tokens that have a startOffset >= the
> startOffset of the previous token from the stream.  Sometime in the past
> year or so, this was enforced at the indexing layer, I think.  There used
> to be TokenFilters that violated this contract; I think earlier versions of
> WordDelimiterFilter could.  If my assumption that this is asserted at the
> indexing layer is correct, then I think TokenOrderingFilter is obsolete.
>
> ~ David
>
> On Thu, Jun 4, 2015 at 7:48 AM Dmitry Kan <dm...@gmail.com> wrote:
>
>>    Hi guys,
>>
>> Sorry for sending questions to the dev list and not to the user one.
>> Somehow I'm getting more luck here.
>>
>> We have found the class o.a.solr.highlight.TokenOrderingFilter
>> with the following comment:
>>
>>
>> -/**
>>
>>    - * Orders Tokens in a window first by their startOffset ascending.
>>
>>    - * endOffset is currently ignored.
>>
>>    - * This is meant to work around fickleness in the highlighter only.  It
>>
>>    - * can mess up token positions and should not be used for indexing or querying.
>>
>>    - */
>>
>>    -final class TokenOrderingFilter extends TokenFilter {
>>
>> In fact, removing this class didn't change the behaviour of the highlighter.
>>
>> Could anybody shed light on its necessity?
>>
>> Thanks,
>>
>> Dmitry Kan
>>
>>

Re: TokenOrderingFilter

Posted by "david.w.smiley@gmail.com" <da...@gmail.com>.
Hi Dmitry,

Ideally, the token stream produces tokens that have a startOffset >= the
startOffset of the previous token from the stream.  Sometime in the past
year or so, this was enforced at the indexing layer, I think.  There used
to be TokenFilters that violated this contract; I think earlier versions of
WordDelimiterFilter could.  If my assumption that this is asserted at the
indexing layer is correct, then I think TokenOrderingFilter is obsolete.

~ David

On Thu, Jun 4, 2015 at 7:48 AM Dmitry Kan <dm...@gmail.com> wrote:

>    Hi guys,
>
> Sorry for sending questions to the dev list and not to the user one.
> Somehow I'm getting more luck here.
>
> We have found the class o.a.solr.highlight.TokenOrderingFilter
> with the following comment:
>
>
> -/**
>
>    - * Orders Tokens in a window first by their startOffset ascending.
>
>    - * endOffset is currently ignored.
>
>    - * This is meant to work around fickleness in the highlighter only.  It
>
>    - * can mess up token positions and should not be used for indexing or querying.
>
>    - */
>
>    -final class TokenOrderingFilter extends TokenFilter {
>
> In fact, removing this class didn't change the behaviour of the highlighter.
>
> Could anybody shed light on its necessity?
>
> Thanks,
>
> Dmitry Kan
>
>