You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Batzenmann <ax...@freiheit.com> on 2008/10/01 10:16:01 UTC

Re: Howto concatenate tokens at index time (without spaces)


Otis Gospodnetic wrote:
> 
> I haven't used the German analyzer (either Snowball or the one we have in
> Lucene's contrib), but have you checked if that does the trick of keeping
> words together?
> 
I'm not sure how this can work out with words that are space separated,
especially since we use a whitespacetokenizer first in the filter chain.

I solved the problem for now by applying the follwing filter:

public class ConcatFilter extends TokenFilter {
    private Token _last;
    private Queue<Token> _concatVersions = new LinkedList<Token>(); 

    public ConcatFilter(TokenStream input) {
        super(input);
    }

    @Override
    public Token next() throws IOException {
        final Token next = input.next();
        if ( next != null ) {
            if ( _last != null ) {
                final String concatStr = _last.termText() + next.termText();
                _concatVersions.add(new Token(concatStr, 0,
concatStr.length()));
            }
            _last = next;
            return next;
        } else if ( ! _concatVersions.isEmpty() ) {
            return _concatVersions.poll();
        }
        return null;
    }
}
-- 
View this message in context: http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19756337.html
Sent from the Solr - User mailing list archive at Nabble.com.