You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by solrnovice <ma...@yahoo.com> on 2015/05/12 05:02:39 UTC

Solr Multiword Synonym Problem

Hi all, 

I am trying to solve the solr multiword synonym issue at our installation, I
am currently using SOLR-4.9.x version. I used the
"com.lucidworks.analysis.AutoPhrasingTokenFilterFactory" from Lucidworks git
repo and used this in my schema.xml and also used their
"com.lucidworks.analysis.AutoPhrasingQParserPlugin" in the solrconfig.xml.
To make testing easier for the solr community, i used the autophrases.txt as
below. 
big apple
new york city
city of new york
new york new york
new york ny
ny city
ny ny
new york

When i run a query for "big+apple" my parsedQuery converts perfectly.

"parsedquery":"(+DisjunctionMaxQuery((searchField:big_apple)))/no_coord",
    "parsedquery_toString":"+(searchField:big_apple)",
......................................

but when i search for new+york+city, it converts to 

   "parsedquery":"(+(DisjunctionMaxQuery((searchField:new_york_city))
DisjunctionMaxQuery((searchField:city))))/no_coord",
    "parsedquery_toString":"+((searchField:new_york_city)
(searchField:city))",
    "explain":{},

Why is it trying to parse the word "city" separately. I thought when it
finds an exact match "new york city" in the auto phrases.txt it should just
replace the white space with underscore ( which is what i choose) in my
solrconfig. 
But if i comment out the following in my autophrases.txt  

#city of new york

it works, fine, it doesn't perform a DisjunctionMaxQuery on "city". 

Same with "New york Ny", since there is an entry in auto phrases.txt
beginning with Ny , its searching for NY as well. 

Its like an overlap causing this problem.  

Did anybody face this problem, if so could you please throw some light on
how you solved this?  I used the branch from git for lucid works, that was
10 months old.

Any help is highly appreciated.

this is my solrconfig.xml

----------------------------------------------------------------------------------------------------------
<queryParser name="autophrasingParser"
class="com.lucidworks.analysis.AutoPhrasingQParserPlugin"  >
      <str name="phrases">autophrases.txt</str>
      <str name="replaceWhitespaceWith">_</str> 
      <str name="defType">edismax</str>
</queryParser>
  <requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
       <str name="df">text</str>
      <str name="defType">autophrasingParser</str>
     </lst>
    </requestHandler>
----------------------------------------------------------------------------------------------------------
This is my setting from schema.xml
----------------------------------------------------------------------------------------------------------

  <fieldType name="text_autophrase" class="solr.TextField" 
positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.WhitespaceTokenizerFactory" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter
class="com.lucidworks.analysis.AutoPhrasingTokenFilterFactory"
phrases="autophrases.txt" includeTokens="true" replaceWhitespaceWith="_" />
     </analyzer>
     <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory" />
         <filter class="solr.LowerCaseFilterFactory" />
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
     </analyzer>
    </fieldType>
----------------------------------------------------------------------------------------------------------





thanks
SolrUser








--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Multiword-Synonym-Problem-tp4204979.html
Sent from the Solr - User mailing list archive at Nabble.com.