You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Alan Woodward <al...@romseysoftware.co.uk> on 2012/03/12 17:47:10 UTC

Preserving TokenFilters

Hello,

I have a number of operations that I want to apply to a TokenStream, supplementing the original tokens with modified forms.  For example, I want to reverse tokens, to allow prefix wildcard queries, and I want to index both lowercased and original terms.

I initially tried to wrap ReverseStringFilter and LowerCaseFilter with a generic 'preserve original token' filter, but this doesn't work, as TokenFilter chaining works by pulling tokens from parents, and I somehow need to push them into children.  So I tried subclassing the filters instead, but of course they're both final…

Is there already some way of doing this that I'm missing?  Or will I just have to copy'n'paste RSFilter and LCFilter to my own package, and add the preserving logic myself?

(I'm aware that there's a Solr filter, ReversedWildcardFilter, that will do part of this for me, but I was hoping to only use lucene classes).

Thanks,

Alan Woodward
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Preserving TokenFilters

Posted by Brandon Mintern <mi...@easyesi.com>.

Everything that we've read seems to indicate that heavy Lucene users
inevitably write their own Filter streams. We just did this ourselves
a month or two ago, and it really wasn't too bad. Just make sure that
you reference the latest Lucene release when you're writing your own
filter. There's a splitting filter that could serve as a good
reference if you need to emit multiple tokens at the same position.

We referred to "Lucene in Action" (version 2) when writing it. While
helpful, it was a bit out of date. Just make sure that whatever
reference you use (either source code or a howto) is up to date.

On Mon, Mar 12, 2012 at 9:47 AM, Alan Woodward
<al...@romseysoftware.co.uk> wrote:
> Hello,
>
> I have a number of operations that I want to apply to a TokenStream, supplementing the original tokens with modified forms.  For example, I want to reverse tokens, to allow prefix wildcard queries, and I want to index both lowercased and original terms.
>
> I initially tried to wrap ReverseStringFilter and LowerCaseFilter with a generic 'preserve original token' filter, but this doesn't work, as TokenFilter chaining works by pulling tokens from parents, and I somehow need to push them into children.  So I tried subclassing the filters instead, but of course they're both final…
>
> Is there already some way of doing this that I'm missing?  Or will I just have to copy'n'paste RSFilter and LCFilter to my own package, and add the preserving logic myself?
>
> (I'm aware that there's a Solr filter, ReversedWildcardFilter, that will do part of this for me, but I was hoping to only use lucene classes).
>
> Thanks,
>
> Alan Woodward
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org