You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Moved) (JIRA)" <ji...@apache.org> on 2012/02/23 23:43:49 UTC
[jira] [Moved] (LUCENE-3821) search slop problem introduced
somewhere between Solr 1.4 and Solr 3.5
[ https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir moved SOLR-3158 to LUCENE-3821:
-------------------------------------------
Component/s: (was: search)
Lucene Fields: New
Affects Version/s: (was: 3.5)
4.0
3.5
Key: LUCENE-3821 (was: SOLR-3158)
Project: Lucene - Java (was: Solr)
> search slop problem introduced somewhere between Solr 1.4 and Solr 3.5
> ----------------------------------------------------------------------
>
> Key: LUCENE-3821
> URL: https://issues.apache.org/jira/browse/LUCENE-3821
> Project: Lucene - Java
> Issue Type: Bug
> Affects Versions: 3.5, 4.0
> Reporter: Naomi Dushay
> Attachments: schema.xml, solrconfig-test.xml
>
>
> In upgrading from Solr 1.4 to Solr 3.5, the following phrase searches stopped working in dismax:
> "The Beatles as musicians : Revolver through the Anthology"
> "Color-blindness [print/digital]; its dangers and its detection"
> Both of these queries have a repeated work, and have many terms. It's not the number of terms or the colon surrounded by spaces, because the following phrase search works in Solr 3.5 (and Solr 1.4):
> "International encyclopedia of revolution and protest : 1500 to the present"
> With Robert Muir's help, we have narrowed the problem down to slop (proximity in lucene QueryParser, query slop in dismax). I have included debugQuery details for the Beatles search; I confirmed the same behavior with the color-blindness search.
> Solr 3.5: it fails when (query) slop setting isn't 0.
> ----
> lucene QueryParser with proximity set to 1 (or anything > 0) : no match
> URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~1
> final query: all_search:"the beatl as musician revolv through the antholog"~1
> lucene QueryParser with proximity set to 0: result!
> URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"
> final query: all_search:"the beatl as musician revolv through the antholog"
> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
> <snip>
> 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
> <snip>
> dismax QueryParser with qs=1: no match
> ps=0
> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"&qs=1&ps=0
> final query: +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog")~0.01
> ps=1
> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"&qs=1&ps=1
> final query: +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~1)~0.01
> dismax QueryParser with qs=0: result!
> ps=0
> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"&qs=0&ps=0
> final query: +(all_search:"the beatl as musician revolv through the antholog")~0.01 (all_search:"the beatl as musician revolv through the antholog")~0.01
> ps=1
> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"&qs=0&ps=1
> final query: +(all_search:"the beatl as musician revolv through the antholog")~0.01 (all_search:"the beatl as musician revolv through the antholog"~1)~0.01
> 8.564867 = (MATCH) sum of:
> 4.2824335 = (MATCH) weight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
> <snip>
> 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
> <snip>
> Solr 1.4: it works regardless of slop settings
> ----
> lucene QueryParser with any proximity value: result!
> ~0
> URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"
> final query: all_search:"the beatl as musician revolv through the antholog"
> ~1
> URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~1
> final query: all_search:"the beatl as musician revolv through the antholog"~1
> 5.2672544 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
> <snip>
> 48.157753 = idf(all_search: the=3549637 beatl=392 as=751093 musician=11992 revolv=822 through=88522 the=3549637 antholog=11246)
> <snip>
> dismax QueryParser with any qs: result!
> qs=0, ps=0
> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"&qs=0&ps=0
> final query: +(all_search:"the beatl as musician revolv through the antholog")~0.01 (all_search:"the beatl as musician revolv through the antholog")~0.01
> qs=0, ps=1
> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"&qs=0&ps=1
> final query: +(all_search:"the beatl as musician revolv through the antholog")~0.01 (all_search:"the beatl as musician revolv through the antholog"~1)~0.01
> dismax QueryParser with qs=0: result!
> qs=1, ps=0
> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"&qs=1&ps=0
> final query: +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog")~0.01
> qs=1, ps=1
> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"&qs=1&ps=1
> final query: +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~1)~0.01
> 7.4490223 = (MATCH) sum of:
> 3.7245111 = weight(all_search:"the beatl as musician revolv through the antholog"~1 in 3469163), product of:
> <snip>
> 48.157753 = idf(all_search: the=3549637 beatl=392 as=751093 musician=11992 revolv=822 through=88522 the=3549637 antholog=11246)
> <snip>
> More information:
> schema.xml:
> <field name="all_search" type="text" indexed="true" stored="false" />
> solr 3.5:
> <fieldtype name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory" />
> <filter class="solr.ICUFoldingFilterFactory"/>
> <filter class="solr.WordDelimiterFilterFactory"
> splitOnCaseChange="1" generateWordParts="1" catenateWords="1"
> splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1"
> catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1" />
> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> </analyzer>
> </fieldtype>
> solr1.4:
> <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
> <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory" />
> <filter class="schema.UnicodeNormalizationFilterFactory" version="icu4j" composed="false" remove_diacritics="true" remove_modifiers="true" fold="true" />
> <filter class="solr.WordDelimiterFilterFactory"
> splitOnCaseChange="1" generateWordParts="1" catenateWords="1"
> splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1"
> catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1" />
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> </analyzer>
> </fieldtype>
> And the analysis page shows the same results for Solr 3.5 and 1.4
> Solr 3.5:
> position 1 2 3 4 5 6 7 8
> term text the beatl as musician revolv through the antholog
> keyword false false false false false false false false
> startOffset 0 4 12 15 27 36 44 48
> endOffset 3 11 14 24 35 43 47 57
> type word word word word word word word word
> Solr 1.4:
> term position 1 2 3 4 5 6 7 8
> term text the beatl as musician revolv through the antholog
> term type word word word word word word word word
> source start,end 0,3 4,11 12,14 15,24 27,35 36,43 44,47 48,57
> For debug purposes, we can consider the Solr document as:
> <doc>
> <str name="all_search">The Beatles as musicians : Revolver through the Anthology</str>
> </doc>
> I can't attached the full SolrDoc as all_search is indexed, but not stored, and I use SolrJ to write to the index from java objects ... plus our objects have a zillion fields (I work in a library with very rich metadata and very exacting solr fields). I have attached the Solr 3.5 schema and solrconfig, but they are big and ugly for the same reasons.
> For more details, see the erroneously titled email thread "result present in Solr 1.4 but missing in Solr 3.5, dismax only" started on 2012-02-22 on solr-user@lucene.apache.org.
> - Naomi
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org