You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Batzenmann <ax...@freiheit.com> on 2008/10/01 10:16:01 UTC
Re: Howto concatenate tokens at index time (without spaces)
Otis Gospodnetic wrote:
>
> I haven't used the German analyzer (either Snowball or the one we have in
> Lucene's contrib), but have you checked if that does the trick of keeping
> words together?
>
I'm not sure how this can work out with words that are space separated,
especially since we use a whitespacetokenizer first in the filter chain.
I solved the problem for now by applying the follwing filter:
public class ConcatFilter extends TokenFilter {
private Token _last;
private Queue<Token> _concatVersions = new LinkedList<Token>();
public ConcatFilter(TokenStream input) {
super(input);
}
@Override
public Token next() throws IOException {
final Token next = input.next();
if ( next != null ) {
if ( _last != null ) {
final String concatStr = _last.termText() + next.termText();
_concatVersions.add(new Token(concatStr, 0,
concatStr.length()));
}
_last = next;
return next;
} else if ( ! _concatVersions.isEmpty() ) {
return _concatVersions.poll();
}
return null;
}
}
--
View this message in context: http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19756337.html
Sent from the Solr - User mailing list archive at Nabble.com.