You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by da...@correo.aeat.es on 2013/09/17 08:16:38 UTC

Problem with SynonymFilter and StopFilterFactory

Hi, 

I have encoutered a problem applying StopFilterFactory and 
SynonimFilterFactory. The problem is that SynonymFilter removes the gaps 
that were previously put by the StopFilterFactory. I'm applying filters in 
query time, because users need to change synonym lists frequently.

This is my schema, and an example of the issue:


String: "documentacion para agentes"

org.apache.solr.analysis.WhitespaceTokenizerFactory 
{luceneMatchVersion=LUCENE_35}
position        1       2       3
term text       documentación    para   agentes
startOffset     0       14      19
endOffset       13      18      26
org.apache.solr.analysis.LowerCaseFilterFactory 
{luceneMatchVersion=LUCENE_35}
position        1       2       3
term text       documentación    para   agentes
startOffset     0       14      19
endOffset       13      18      26
org.apache.solr.analysis.StopFilterFactory {words=stopwords_intranet.txt, 
ignoreCase=true, enablePositionIncrements=true, 
luceneMatchVersion=LUCENE_35}
position        1       3
term text       documentación   agentes
startOffset     0       19
endOffset       13      26
org.apache.solr.analysis.SynonymFilterFactory 
{synonyms=sinonimos_intranet.txt, expand=true, ignoreCase=true, 
luceneMatchVersion=LUCENE_35}
position        1       2
term text       documentación   agente
        archivo         agentes
type    SYNONYM SYNONYM
        SYNONYM SYNONYM
startOffset 0           19
        0               19
endOffset 13            26
        13              26


As you can see, the position should be 1 and 3, but SynonymFilter removes 
the gap and moves token from position 3 to 2
I've got the same problem with Solr 3.5 y 4.0. 
I don't know if it's a bug or an error with my configuration. In other 
schemas that I have worked with, I had always put the SynonymFilter 
previous to StopFilter, but in this I prefered using this order because of 
the big number of synonym that the list has (i.e. I don't want to generate 
a lot of synonyms for a word that I really wanted to remove).

Thanks,

David Dávila Atienza
AEAT - Departamento de Informática Tributaria