You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Naomi Dushay <nd...@stanford.edu> on 2012/02/22 20:55:04 UTC

result present in Solr 1.4, but missing in Solr 3.5, dismax only

I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   I have a test checking for a search result in Solr, and the test passes in Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I just included output from lucene QueryParser to prove the document exists and is found 

I am completely stumped.


Here are the debugQuery details:

***Solr 3.5***

lucene QueryParser:     

URL:   q=all_search:"The Beatles as musicians : Revolver through the Anthology"
final query:  all_search:"the beatl as musician revolv through the antholog"

6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
  1.0 = queryWeight(all_search:"the beatl as musician revolv through the antholog"), product of:
    48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
    0.02063975 = queryNorm
  6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
    1.0 = tf(phraseFreq=1.0)
    48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
    0.125 = fieldNorm(field=all_search, doc=1064395)

dismax QueryParser:   
URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"
final query:   +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~3)~0.01

(no matches)


***Solr 1.4***

lucene QueryParser:   

URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
final query:  all_search:"the beatl as musician revolv through the antholog"

5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
  1.0 = tf(phraseFreq=1.0)
  48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
  0.109375 = fieldNorm(field=all_search, doc=3469163)

dismax QueryParser:   
URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"
final query:  +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~3)~0.01

score:

7.449651 = (MATCH) sum of:
  3.7248254 = weight(all_search:"the beatl as musician revolv through the antholog"~1 in 3469163), product of:
    0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the antholog"~1), product of:
      48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
      0.014681898 = queryNorm
    5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
      1.0 = tf(phraseFreq=1.0)
      48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
      0.109375 = fieldNorm(field=all_search, doc=3469163)
  3.7248254 = weight(all_search:"the beatl as musician revolv through the antholog"~3 in 3469163), product of:
    0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the antholog"~3), product of:
      48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
      0.014681898 = queryNorm
    5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
      1.0 = tf(phraseFreq=1.0)
      48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
      0.109375 = fieldNorm(field=all_search, doc=3469163)




Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Naomi Dushay <nd...@stanford.edu>.
I forgot to include the field definition information:

schema.xml:
  <field name="all_search" type="text" indexed="true" stored="false" />

solr 3.5:
      <fieldtype name="text" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.ICUFoldingFilterFactory"/>  
        <filter class="solr.WordDelimiterFilterFactory"
          splitOnCaseChange="1" generateWordParts="1" catenateWords="1"
          splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1"
          catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1" />
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      </analyzer>
    </fieldtype>

solr1.4:
<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="schema.UnicodeNormalizationFilterFactory"
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true" />
        <filter class="solr.WordDelimiterFilterFactory" 
          splitOnCaseChange="1" generateWordParts="1" catenateWords="1" 
          splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1" 
          catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      </analyzer>
    </fieldtype>


And the analysis page shows the same results for Solr 3.5 and 1.4


Solr 3.5:

position 	1	2	3	4	5	6	7	8
term text 	the	beatl	as	musician	revolv	through	the	antholog
keyword 	false	false	false	false	false	false	false	false
startOffset 	0	4	12	15	27	36	44	48
endOffset 	3	11	14	24	35	43	47	57
type 	word	word	word	word	word	word	word	word

Solr 1.4:

term position 	1	2	3	4	5	6	7	8
term text 	the	beatl	as	musician	revolv	through	the	antholog
term type 	word	word	word	word	word	word	word	word
source start,end 	0,3	4,11	12,14	15,24	27,35	36,43	44,47	48,57

- Naomi

--
View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768007.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Naomi Dushay <nd...@stanford.edu>.
Jonathan,

I have the same problem without the colon - I tested that, but didn't mention it.   

mm can't be the issue either:   in Solr 3.5, if I remove one of the occurrences of "the"  (doesn't matter which), I get results.  Removing any other word does NOT get results.   And if the query isn't a phrase query, it gets results.

And no, it can't be related to what you refer to as the  "dismax stopwords problem", since i can demonstrate the problem with a single field.  mm can't be the issue 


I have run into problems in the past with a non-alpha character surrounded by spaces tanking my search results for dismax … but I fixed that with this fieldType:

    <!-- single token with punctuation terms removed so dismax doesn't look for punctuation terms in these fields -->
    <!-- On client side, Lucene query parser breaks things up by whitespace *before* field analysis for dismax -->
    <!-- so punctuation terms (& : ;) are stopwords to allow results from other fields when these chars are surrounded by spaces in query -->
    <!--  do not lowercase -->
    <fieldType name="string_punct_stop" class="solr.TextField" omitNorms="true">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose" />
        <!-- removing punctuation for Lucene query parser issues -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_punctuation.txt" enablePositionIncrements="true" />
      </analyzer>
    </fieldType>

My stopwords_punctuation.txt file is

#Punctuation characters we want to ignore in queries
:
;
&
/

and used this type instead of string for fields in my dismax qf.    Thus, the punctuation "terms" in the query are not present for the fields that were formerly string fields.

- Naomi

On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote:

> So I don't really know what I'm talking about, and I'm not really sure if it's related or not, but your particular query:
> 
> "The Beatles as musicians : Revolver through the Anthology"
> 
> With the lone "word" that's a ':', reminds me of a dismax stopwords-type problem I ran into. Now, I ran into it on 1.4.  I don't know why it would be different on 1.4 and 3.x. And I see you aren't even using a multi-field dismax in your sample query, so it couldn't possibly be what I ran into... I don't think. But I'll write this anyway in case it gives someone some ideas.
> 
> The problem I ran into is caused by different analysis in two fields both used in a dismax, one that ends up keeping ":" as a token, and one that doesn't.  Which ends up having the same effect as the famous 'dismax stopwords problem'.
> 
> Maybe somehow your schema changed such to produce this problem in 3.x but not in 1.4? Although again I realize the fact that you are only using a single field in your demo dismax query kind of suggests it's not this problem. Wonder if you try the query without the ":", if the problem goes away, that might be a hint. Or, maybe someone more skilled at understanding what's in those Solr debug statements than I am (it's kind of all greek to me) will be able to take this hint and rule out or confirm that it may have something to do with your problem.
> 
> Here I write up the issue I ran into (which may or may not have anything to do with what you ran into)
> 
> http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
> 
> 
> Also, you don't say what your 'mm' is in your dismax queries, that could be relevant if it's got anything to do with anything similar to the issue I'm talking about.
> 
> Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens for 'mm' in such a way that the 'varying field analysis dismax gotcha' can manifest with only one field, if the way dismax counts tokens for 'mm' differs from number of tokens the single field's analysis produces?
> 
> Jonathan
> 
> On 2/22/2012 2:55 PM, Naomi Dushay wrote:
>> I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   I have a test checking for a search result in Solr, and the test passes in Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I just included output from lucene QueryParser to prove the document exists and is found
>> 
>> I am completely stumped.
>> 
>> 
>> Here are the debugQuery details:
>> 
>> ***Solr 3.5***
>> 
>> lucene QueryParser:
>> 
>> URL:   q=all_search:"The Beatles as musicians : Revolver through the Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
>>   1.0 = queryWeight(all_search:"the beatl as musician revolv through the antholog"), product of:
>>     48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>>     0.02063975 = queryNorm
>>   6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
>>     1.0 = tf(phraseFreq=1.0)
>>     48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>>     0.125 = fieldNorm(field=all_search, doc=1064395)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"
>> final query:   +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~3)~0.01
>> 
>> (no matches)
>> 
>> 
>> ***Solr 1.4***
>> 
>> lucene QueryParser:
>> 
>> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>>   1.0 = tf(phraseFreq=1.0)
>>   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>   0.109375 = fieldNorm(field=all_search, doc=3469163)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"
>> final query:  +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~3)~0.01
>> 
>> score:
>> 
>> 7.449651 = (MATCH) sum of:
>>   3.7248254 = weight(all_search:"the beatl as musician revolv through the antholog"~1 in 3469163), product of:
>>     0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the antholog"~1), product of:
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.014681898 = queryNorm
>>     5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>>       1.0 = tf(phraseFreq=1.0)
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.109375 = fieldNorm(field=all_search, doc=3469163)
>>   3.7248254 = weight(all_search:"the beatl as musician revolv through the antholog"~3 in 3469163), product of:
>>     0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the antholog"~3), product of:
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.014681898 = queryNorm
>>     5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>>       1.0 = tf(phraseFreq=1.0)
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.109375 = fieldNorm(field=all_search, doc=3469163)
>> 
>> 
>> 


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Naomi Dushay <nd...@stanford.edu>.
Ticket created:

https://issues.apache.org/jira/browse/SOLR-3158

(perhaps it's a lucene problem, not a Solr one -- feel free to move it or whatever.)

- Naomi


On Feb 23, 2012, at 11:55 AM, Robert Muir [via Lucene] wrote:

> Please make a new one if you dont mind! 
> 
> On Thu, Feb 23, 2012 at 2:45 PM, Naomi Dushay <[hidden email]> wrote:
> 
> > Robert - 
> > 
> > Did you mean for me to attach my docs to an existing ticket (which one?) or just want to make sure I attach the docs to the new issue? 
> > 
> > - Naomi 
> > 
> > On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote: 
> > 
> >> Please attach your docs if you dont mind. 
> >> 
> >> I worked up tests for this (in general for ANY phrase query, 
> >> increasing the slop should never remove results, only potentially 
> >> enlarge them). 
> >> 
> >> It fails already... but its good to also have your test case too... 
> >> 
> >> On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay <[hidden email]> wrote: 
> >> 
> >> > Robert, 
> >> > 
> >> > I will create a jira issue with the documentation.  FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax;   For lucene QueryParser, only the value of 0 got results. 
> >> > 
> >> > - Naomi 
> >> > 
> >> > 
> >> > On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: 
> >> > 
> >> >> Is it possible to also provide your document? 
> >> >> If you could attach the document and the analysis config and queries 
> >> >> to a JIRA issue, that would be most ideal. 
> >> >> 
> >> >> On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay <[hidden email]> wrote: 
> >> >> 
> >> >> > Robert, 
> >> >> > 
> >> >> > You found it!   it is the phrase slop.  What do I do now?   I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … 
> >> >> > 
> >> >> > - Naomi 
> >> >> > 
> >> >> > 
> >> >> > Solr 1.4: 
> >> >> > 
> >> >> > luceneQueryParser: 
> >> >> > 
> >> >> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3 
> >> >> > final query:  all_search:"the beatl as musician revolv through the antholog"~3 
> >> >> > 
> >> >> > got result 
> >> >> > 
> >> >> > 
> >> >> > Solr 3.5 
> >> >> > 
> >> >> > luceneQueryParser: 
> >> >> > 
> >> >> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3 
> >> >> > final query:  all_search:"the beatl as musician revolv through the antholog"~3 
> >> >> > 
> >> >> > NO result 
> >> >> > 
> >> >> > 
> >> >> > 
> >> >> >> lucene QueryParser: 
> >> >> >> 
> >> >> >> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology" 
> >> >> >> final query:  all_search:"the beatl as musician revolv through the antholog" 
> >> >> > 
> >> >> > 
> >> >> > 
> >> >> > 
> >> >> > On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: 
> >> >> > 
> >> >> >> On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay <[hidden email]> wrote: 
> >> >> >> > Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: 
> >> >> >> > 
> >> >> >> >  "The Beatles as musicians : Revolver through the Anthology" 
> >> >> >> >  "Color-blindness [print/digital]; its dangers and its detection" 
> >> >> >> > 
> >> >> >> > but this is a PHRASE search. 
> >> >> >> > 
> >> >> >> 
> >> >> >> Can you take your same phrase queries, and simply add some slop to 
> >> >> >> them (e.g. ~3) and ensure they still match with the lucene 
> >> >> >> queryparser? SloppyPhraseQuery has a bit of a history with repeats 
> >> >> >> since Lucene 2.9 that you were using. 
> >> >> >> 
> >> >> >> https://issues.apache.org/jira/browse/LUCENE-3068
> >> >> >> https://issues.apache.org/jira/browse/LUCENE-3215
> >> >> >> https://issues.apache.org/jira/browse/LUCENE-3412
> >> >> >> 
> >> >> >> -- 
> >> >> >> lucidimagination.com 
> >> >> >> 
> >> >> >> 
> >> >> >> If you reply to this email, your message will be added to the discussion below: 
> >> >> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
> >> >> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. 
> >> >> >> NAML 
> >> >> > 
> >> >> > 
> >> >> > 
> >> >> > -- 
> >> >> > View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
> >> >> > Sent from the Solr - User mailing list archive at Nabble.com. 
> >> >> 
> >> >> 
> >> >> 
> >> >> -- 
> >> >> lucidimagination.com 
> >> >> 
> >> >> 
> >> >> If you reply to this email, your message will be added to the discussion below: 
> >> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
> >> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. 
> >> >> NAML 
> >> > 
> >> 
> >> 
> >> 
> >> -- 
> >> lucidimagination.com 
> >> 
> >> 
> >> If you reply to this email, your message will be added to the discussion below: 
> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html
> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. 
> >> NAML 
> >
> 
> 
> 
> -- 
> lucidimagination.com 
> 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770786.html
> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
> NAML


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Robert Muir <rc...@gmail.com>.
Please make a new one if you dont mind!

On Thu, Feb 23, 2012 at 2:45 PM, Naomi Dushay <nd...@stanford.edu> wrote:
> Robert -
>
> Did you mean for me to attach my docs to an existing ticket (which one?) or just want to make sure I attach the docs to the new issue?
>
> - Naomi
>
> On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote:
>
>> Please attach your docs if you dont mind.
>>
>> I worked up tests for this (in general for ANY phrase query,
>> increasing the slop should never remove results, only potentially
>> enlarge them).
>>
>> It fails already... but its good to also have your test case too...
>>
>> On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay <[hidden email]> wrote:
>>
>> > Robert,
>> >
>> > I will create a jira issue with the documentation.  FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax;   For lucene QueryParser, only the value of 0 got results.
>> >
>> > - Naomi
>> >
>> >
>> > On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote:
>> >
>> >> Is it possible to also provide your document?
>> >> If you could attach the document and the analysis config and queries
>> >> to a JIRA issue, that would be most ideal.
>> >>
>> >> On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay <[hidden email]> wrote:
>> >>
>> >> > Robert,
>> >> >
>> >> > You found it!   it is the phrase slop.  What do I do now?   I am using Solr from trunk from December, and all those JIRA tixes are marked fixed …
>> >> >
>> >> > - Naomi
>> >> >
>> >> >
>> >> > Solr 1.4:
>> >> >
>> >> > luceneQueryParser:
>> >> >
>> >> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3
>> >> > final query:  all_search:"the beatl as musician revolv through the antholog"~3
>> >> >
>> >> > got result
>> >> >
>> >> >
>> >> > Solr 3.5
>> >> >
>> >> > luceneQueryParser:
>> >> >
>> >> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3
>> >> > final query:  all_search:"the beatl as musician revolv through the antholog"~3
>> >> >
>> >> > NO result
>> >> >
>> >> >
>> >> >
>> >> >> lucene QueryParser:
>> >> >>
>> >> >> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
>> >> >> final query:  all_search:"the beatl as musician revolv through the antholog"
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote:
>> >> >
>> >> >> On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay <[hidden email]> wrote:
>> >> >> > Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated:
>> >> >> >
>> >> >> >  "The Beatles as musicians : Revolver through the Anthology"
>> >> >> >  "Color-blindness [print/digital]; its dangers and its detection"
>> >> >> >
>> >> >> > but this is a PHRASE search.
>> >> >> >
>> >> >>
>> >> >> Can you take your same phrase queries, and simply add some slop to
>> >> >> them (e.g. ~3) and ensure they still match with the lucene
>> >> >> queryparser? SloppyPhraseQuery has a bit of a history with repeats
>> >> >> since Lucene 2.9 that you were using.
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/LUCENE-3068
>> >> >> https://issues.apache.org/jira/browse/LUCENE-3215
>> >> >> https://issues.apache.org/jira/browse/LUCENE-3412
>> >> >>
>> >> >> --
>> >> >> lucidimagination.com
>> >> >>
>> >> >>
>> >> >> If you reply to this email, your message will be added to the discussion below:
>> >> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
>> >> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
>> >> >> NAML
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
>> >> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >>
>> >> --
>> >> lucidimagination.com
>> >>
>> >>
>> >> If you reply to this email, your message will be added to the discussion below:
>> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
>> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
>> >> NAML
>> >
>>
>>
>>
>> --
>> lucidimagination.com
>>
>>
>> If you reply to this email, your message will be added to the discussion below:
>> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html
>> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
>> NAML
>



-- 
lucidimagination.com

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Naomi Dushay <nd...@stanford.edu>.
Robert -

Did you mean for me to attach my docs to an existing ticket (which one?) or just want to make sure I attach the docs to the new issue?

- Naomi

On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote:

> Please attach your docs if you dont mind. 
> 
> I worked up tests for this (in general for ANY phrase query, 
> increasing the slop should never remove results, only potentially 
> enlarge them). 
> 
> It fails already... but its good to also have your test case too... 
> 
> On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay <[hidden email]> wrote:
> 
> > Robert, 
> > 
> > I will create a jira issue with the documentation.  FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax;   For lucene QueryParser, only the value of 0 got results. 
> > 
> > - Naomi 
> > 
> > 
> > On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: 
> > 
> >> Is it possible to also provide your document? 
> >> If you could attach the document and the analysis config and queries 
> >> to a JIRA issue, that would be most ideal. 
> >> 
> >> On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay <[hidden email]> wrote: 
> >> 
> >> > Robert, 
> >> > 
> >> > You found it!   it is the phrase slop.  What do I do now?   I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … 
> >> > 
> >> > - Naomi 
> >> > 
> >> > 
> >> > Solr 1.4: 
> >> > 
> >> > luceneQueryParser: 
> >> > 
> >> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3 
> >> > final query:  all_search:"the beatl as musician revolv through the antholog"~3 
> >> > 
> >> > got result 
> >> > 
> >> > 
> >> > Solr 3.5 
> >> > 
> >> > luceneQueryParser: 
> >> > 
> >> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3 
> >> > final query:  all_search:"the beatl as musician revolv through the antholog"~3 
> >> > 
> >> > NO result 
> >> > 
> >> > 
> >> > 
> >> >> lucene QueryParser: 
> >> >> 
> >> >> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology" 
> >> >> final query:  all_search:"the beatl as musician revolv through the antholog" 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: 
> >> > 
> >> >> On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay <[hidden email]> wrote: 
> >> >> > Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: 
> >> >> > 
> >> >> >  "The Beatles as musicians : Revolver through the Anthology" 
> >> >> >  "Color-blindness [print/digital]; its dangers and its detection" 
> >> >> > 
> >> >> > but this is a PHRASE search. 
> >> >> > 
> >> >> 
> >> >> Can you take your same phrase queries, and simply add some slop to 
> >> >> them (e.g. ~3) and ensure they still match with the lucene 
> >> >> queryparser? SloppyPhraseQuery has a bit of a history with repeats 
> >> >> since Lucene 2.9 that you were using. 
> >> >> 
> >> >> https://issues.apache.org/jira/browse/LUCENE-3068
> >> >> https://issues.apache.org/jira/browse/LUCENE-3215
> >> >> https://issues.apache.org/jira/browse/LUCENE-3412
> >> >> 
> >> >> -- 
> >> >> lucidimagination.com 
> >> >> 
> >> >> 
> >> >> If you reply to this email, your message will be added to the discussion below: 
> >> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
> >> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. 
> >> >> NAML 
> >> > 
> >> > 
> >> > 
> >> > -- 
> >> > View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
> >> > Sent from the Solr - User mailing list archive at Nabble.com. 
> >> 
> >> 
> >> 
> >> -- 
> >> lucidimagination.com 
> >> 
> >> 
> >> If you reply to this email, your message will be added to the discussion below: 
> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. 
> >> NAML 
> >
> 
> 
> 
> -- 
> lucidimagination.com 
> 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html
> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
> NAML


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Robert Muir <rc...@gmail.com>.
Please attach your docs if you dont mind.

I worked up tests for this (in general for ANY phrase query,
increasing the slop should never remove results, only potentially
enlarge them).

It fails already... but its good to also have your test case too...

On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay <nd...@stanford.edu> wrote:
> Robert,
>
> I will create a jira issue with the documentation.  FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax;   For lucene QueryParser, only the value of 0 got results.
>
> - Naomi
>
>
> On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote:
>
>> Is it possible to also provide your document?
>> If you could attach the document and the analysis config and queries
>> to a JIRA issue, that would be most ideal.
>>
>> On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay <[hidden email]> wrote:
>>
>> > Robert,
>> >
>> > You found it!   it is the phrase slop.  What do I do now?   I am using Solr from trunk from December, and all those JIRA tixes are marked fixed …
>> >
>> > - Naomi
>> >
>> >
>> > Solr 1.4:
>> >
>> > luceneQueryParser:
>> >
>> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3
>> > final query:  all_search:"the beatl as musician revolv through the antholog"~3
>> >
>> > got result
>> >
>> >
>> > Solr 3.5
>> >
>> > luceneQueryParser:
>> >
>> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3
>> > final query:  all_search:"the beatl as musician revolv through the antholog"~3
>> >
>> > NO result
>> >
>> >
>> >
>> >> lucene QueryParser:
>> >>
>> >> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
>> >> final query:  all_search:"the beatl as musician revolv through the antholog"
>> >
>> >
>> >
>> >
>> > On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote:
>> >
>> >> On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay <[hidden email]> wrote:
>> >> > Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated:
>> >> >
>> >> >  "The Beatles as musicians : Revolver through the Anthology"
>> >> >  "Color-blindness [print/digital]; its dangers and its detection"
>> >> >
>> >> > but this is a PHRASE search.
>> >> >
>> >>
>> >> Can you take your same phrase queries, and simply add some slop to
>> >> them (e.g. ~3) and ensure they still match with the lucene
>> >> queryparser? SloppyPhraseQuery has a bit of a history with repeats
>> >> since Lucene 2.9 that you were using.
>> >>
>> >> https://issues.apache.org/jira/browse/LUCENE-3068
>> >> https://issues.apache.org/jira/browse/LUCENE-3215
>> >> https://issues.apache.org/jira/browse/LUCENE-3412
>> >>
>> >> --
>> >> lucidimagination.com
>> >>
>> >>
>> >> If you reply to this email, your message will be added to the discussion below:
>> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
>> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
>> >> NAML
>> >
>> >
>> >
>> > --
>> > View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> lucidimagination.com
>>
>>
>> If you reply to this email, your message will be added to the discussion below:
>> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
>> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
>> NAML
>



-- 
lucidimagination.com

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Naomi Dushay <nd...@stanford.edu>.
Robert,

I will create a jira issue with the documentation.  FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax;   For lucene QueryParser, only the value of 0 got results.

- Naomi


On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote:

> Is it possible to also provide your document? 
> If you could attach the document and the analysis config and queries 
> to a JIRA issue, that would be most ideal. 
> 
> On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay <[hidden email]> wrote:
> 
> > Robert, 
> > 
> > You found it!   it is the phrase slop.  What do I do now?   I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … 
> > 
> > - Naomi 
> > 
> > 
> > Solr 1.4: 
> > 
> > luceneQueryParser: 
> > 
> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3 
> > final query:  all_search:"the beatl as musician revolv through the antholog"~3 
> > 
> > got result 
> > 
> > 
> > Solr 3.5 
> > 
> > luceneQueryParser: 
> > 
> > URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3 
> > final query:  all_search:"the beatl as musician revolv through the antholog"~3 
> > 
> > NO result 
> > 
> > 
> > 
> >> lucene QueryParser: 
> >> 
> >> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology" 
> >> final query:  all_search:"the beatl as musician revolv through the antholog" 
> > 
> > 
> > 
> > 
> > On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: 
> > 
> >> On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay <[hidden email]> wrote: 
> >> > Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: 
> >> > 
> >> >  "The Beatles as musicians : Revolver through the Anthology" 
> >> >  "Color-blindness [print/digital]; its dangers and its detection" 
> >> > 
> >> > but this is a PHRASE search. 
> >> > 
> >> 
> >> Can you take your same phrase queries, and simply add some slop to 
> >> them (e.g. ~3) and ensure they still match with the lucene 
> >> queryparser? SloppyPhraseQuery has a bit of a history with repeats 
> >> since Lucene 2.9 that you were using. 
> >> 
> >> https://issues.apache.org/jira/browse/LUCENE-3068
> >> https://issues.apache.org/jira/browse/LUCENE-3215
> >> https://issues.apache.org/jira/browse/LUCENE-3412
> >> 
> >> -- 
> >> lucidimagination.com 
> >> 
> >> 
> >> If you reply to this email, your message will be added to the discussion below: 
> >> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
> >> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. 
> >> NAML 
> > 
> > 
> > 
> > -- 
> > View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> lucidimagination.com 
> 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html
> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
> NAML


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Robert Muir <rc...@gmail.com>.
Is it possible to also provide your document?
If you could attach the document and the analysis config and queries
to a JIRA issue, that would be most ideal.

On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay <nd...@stanford.edu> wrote:
> Robert,
>
> You found it!   it is the phrase slop.  What do I do now?   I am using Solr from trunk from December, and all those JIRA tixes are marked fixed …
>
> - Naomi
>
>
> Solr 1.4:
>
> luceneQueryParser:
>
> URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3
> final query:  all_search:"the beatl as musician revolv through the antholog"~3
>
> got result
>
>
> Solr 3.5
>
> luceneQueryParser:
>
> URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3
> final query:  all_search:"the beatl as musician revolv through the antholog"~3
>
> NO result
>
>
>
>> lucene QueryParser:
>>
>> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>
>
>
>
> On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote:
>
>> On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay <[hidden email]> wrote:
>> > Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated:
>> >
>> >  "The Beatles as musicians : Revolver through the Anthology"
>> >  "Color-blindness [print/digital]; its dangers and its detection"
>> >
>> > but this is a PHRASE search.
>> >
>>
>> Can you take your same phrase queries, and simply add some slop to
>> them (e.g. ~3) and ensure they still match with the lucene
>> queryparser? SloppyPhraseQuery has a bit of a history with repeats
>> since Lucene 2.9 that you were using.
>>
>> https://issues.apache.org/jira/browse/LUCENE-3068
>> https://issues.apache.org/jira/browse/LUCENE-3215
>> https://issues.apache.org/jira/browse/LUCENE-3412
>>
>> --
>> lucidimagination.com
>>
>>
>> If you reply to this email, your message will be added to the discussion below:
>> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
>> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
>> NAML
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
lucidimagination.com

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Naomi Dushay <nd...@stanford.edu>.
Robert,

You found it!   it is the phrase slop.  What do I do now?   I am using Solr from trunk from December, and all those JIRA tixes are marked fixed …

- Naomi


Solr 1.4:

luceneQueryParser:

URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3
final query:  all_search:"the beatl as musician revolv through the antholog"~3

got result


Solr 3.5

luceneQueryParser:

URL: q=all_search:"The Beatles as musicians : Revolver through the Anthology"~3
final query:  all_search:"the beatl as musician revolv through the antholog"~3

NO result



> lucene QueryParser:
> 
> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
> final query:  all_search:"the beatl as musician revolv through the antholog"




On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote:

> On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay <[hidden email]> wrote: 
> > Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: 
> > 
> >  "The Beatles as musicians : Revolver through the Anthology" 
> >  "Color-blindness [print/digital]; its dangers and its detection" 
> > 
> > but this is a PHRASE search. 
> > 
> 
> Can you take your same phrase queries, and simply add some slop to 
> them (e.g. ~3) and ensure they still match with the lucene 
> queryparser? SloppyPhraseQuery has a bit of a history with repeats 
> since Lucene 2.9 that you were using. 
> 
> https://issues.apache.org/jira/browse/LUCENE-3068
> https://issues.apache.org/jira/browse/LUCENE-3215
> https://issues.apache.org/jira/browse/LUCENE-3412
> 
> -- 
> lucidimagination.com 
> 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html
> To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here.
> NAML



--
View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay <nd...@stanford.edu> wrote:
> Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated:
>
>  "The Beatles as musicians : Revolver through the Anthology"
>  "Color-blindness [print/digital]; its dangers and its detection"
>
> but this is a PHRASE search.
>

Can you take your same phrase queries, and simply add some slop to
them (e.g. ~3) and ensure they still match with the lucene
queryparser? SloppyPhraseQuery has a bit of a history with repeats
since Lucene 2.9 that you were using.

https://issues.apache.org/jira/browse/LUCENE-3068
https://issues.apache.org/jira/browse/LUCENE-3215
https://issues.apache.org/jira/browse/LUCENE-3412

-- 
lucidimagination.com

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Naomi Dushay <nd...@stanford.edu>.
Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated:

 "The Beatles as musicians : Revolver through the Anthology"
 "Color-blindness [print/digital]; its dangers and its detection"

but this is a PHRASE search.  

In case it's relevant, both Solr 1.4 and Solr 3.5:
 do NOT use stopwords in the fieldtype;  
 mm is  6<-1 6<90%  for dismax
 qs is 1
 ps is 3

And both use this filter last

<filter class="solr.RemoveDuplicatesTokenFilterFactory" />

… but I believe that filter is only used for consecutive tokens.

Lastly, 

 "Color-blindness [print/digital]; its and its detection"   works   ("danger" is removed, rather than one of the repeated "its")

- Naomi



On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote:

> So I don't really know what I'm talking about, and I'm not really sure if it's related or not, but your particular query:
> 
> "The Beatles as musicians : Revolver through the Anthology"
> 
> With the lone "word" that's a ':', reminds me of a dismax stopwords-type problem I ran into. Now, I ran into it on 1.4.  I don't know why it would be different on 1.4 and 3.x. And I see you aren't even using a multi-field dismax in your sample query, so it couldn't possibly be what I ran into... I don't think. But I'll write this anyway in case it gives someone some ideas.
> 
> The problem I ran into is caused by different analysis in two fields both used in a dismax, one that ends up keeping ":" as a token, and one that doesn't.  Which ends up having the same effect as the famous 'dismax stopwords problem'.
> 
> Maybe somehow your schema changed such to produce this problem in 3.x but not in 1.4? Although again I realize the fact that you are only using a single field in your demo dismax query kind of suggests it's not this problem. Wonder if you try the query without the ":", if the problem goes away, that might be a hint. Or, maybe someone more skilled at understanding what's in those Solr debug statements than I am (it's kind of all greek to me) will be able to take this hint and rule out or confirm that it may have something to do with your problem.
> 
> Here I write up the issue I ran into (which may or may not have anything to do with what you ran into)
> 
> http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
> 
> 
> Also, you don't say what your 'mm' is in your dismax queries, that could be relevant if it's got anything to do with anything similar to the issue I'm talking about.
> 
> Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens for 'mm' in such a way that the 'varying field analysis dismax gotcha' can manifest with only one field, if the way dismax counts tokens for 'mm' differs from number of tokens the single field's analysis produces?
> 
> Jonathan
> 
> On 2/22/2012 2:55 PM, Naomi Dushay wrote:
>> I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   I have a test checking for a search result in Solr, and the test passes in Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I just included output from lucene QueryParser to prove the document exists and is found
>> 
>> I am completely stumped.
>> 
>> 
>> Here are the debugQuery details:
>> 
>> ***Solr 3.5***
>> 
>> lucene QueryParser:
>> 
>> URL:   q=all_search:"The Beatles as musicians : Revolver through the Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
>>   1.0 = queryWeight(all_search:"the beatl as musician revolv through the antholog"), product of:
>>     48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>>     0.02063975 = queryNorm
>>   6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
>>     1.0 = tf(phraseFreq=1.0)
>>     48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>>     0.125 = fieldNorm(field=all_search, doc=1064395)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"
>> final query:   +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~3)~0.01
>> 
>> (no matches)
>> 
>> 
>> ***Solr 1.4***
>> 
>> lucene QueryParser:
>> 
>> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>>   1.0 = tf(phraseFreq=1.0)
>>   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>   0.109375 = fieldNorm(field=all_search, doc=3469163)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"
>> final query:  +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~3)~0.01
>> 
>> score:
>> 
>> 7.449651 = (MATCH) sum of:
>>   3.7248254 = weight(all_search:"the beatl as musician revolv through the antholog"~1 in 3469163), product of:
>>     0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the antholog"~1), product of:
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.014681898 = queryNorm
>>     5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>>       1.0 = tf(phraseFreq=1.0)
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.109375 = fieldNorm(field=all_search, doc=3469163)
>>   3.7248254 = weight(all_search:"the beatl as musician revolv through the antholog"~3 in 3469163), product of:
>>     0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the antholog"~3), product of:
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.014681898 = queryNorm
>>     5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>>       1.0 = tf(phraseFreq=1.0)
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.109375 = fieldNorm(field=all_search, doc=3469163)
>> 
>> 
>> 


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

Posted by Jonathan Rochkind <ro...@jhu.edu>.
So I don't really know what I'm talking about, and I'm not really sure 
if it's related or not, but your particular query:

"The Beatles as musicians : Revolver through the Anthology"

With the lone "word" that's a ':', reminds me of a dismax stopwords-type 
problem I ran into. Now, I ran into it on 1.4.  I don't know why it 
would be different on 1.4 and 3.x. And I see you aren't even using a 
multi-field dismax in your sample query, so it couldn't possibly be what 
I ran into... I don't think. But I'll write this anyway in case it gives 
someone some ideas.

The problem I ran into is caused by different analysis in two fields 
both used in a dismax, one that ends up keeping ":" as a token, and one 
that doesn't.  Which ends up having the same effect as the famous 
'dismax stopwords problem'.

Maybe somehow your schema changed such to produce this problem in 3.x 
but not in 1.4? Although again I realize the fact that you are only 
using a single field in your demo dismax query kind of suggests it's not 
this problem. Wonder if you try the query without the ":", if the 
problem goes away, that might be a hint. Or, maybe someone more skilled 
at understanding what's in those Solr debug statements than I am (it's 
kind of all greek to me) will be able to take this hint and rule out or 
confirm that it may have something to do with your problem.

Here I write up the issue I ran into (which may or may not have anything 
to do with what you ran into)

http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/


Also, you don't say what your 'mm' is in your dismax queries, that could 
be relevant if it's got anything to do with anything similar to the 
issue I'm talking about.

Hmm, I wonder if Solr 3.x changes the way dismax calculates number of 
tokens for 'mm' in such a way that the 'varying field analysis dismax 
gotcha' can manifest with only one field, if the way dismax counts 
tokens for 'mm' differs from number of tokens the single field's 
analysis produces?

Jonathan

On 2/22/2012 2:55 PM, Naomi Dushay wrote:
> I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   I have a test checking for a search result in Solr, and the test passes in Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I just included output from lucene QueryParser to prove the document exists and is found
>
> I am completely stumped.
>
>
> Here are the debugQuery details:
>
> ***Solr 3.5***
>
> lucene QueryParser:
>
> URL:   q=all_search:"The Beatles as musicians : Revolver through the Anthology"
> final query:  all_search:"the beatl as musician revolv through the antholog"
>
> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
>    1.0 = queryWeight(all_search:"the beatl as musician revolv through the antholog"), product of:
>      48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>      0.02063975 = queryNorm
>    6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 1064395), product of:
>      1.0 = tf(phraseFreq=1.0)
>      48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>      0.125 = fieldNorm(field=all_search, doc=1064395)
>
> dismax QueryParser:
> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"
> final query:   +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~3)~0.01
>
> (no matches)
>
>
> ***Solr 1.4***
>
> lucene QueryParser:
>
> URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
> final query:  all_search:"the beatl as musician revolv through the antholog"
>
> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>    1.0 = tf(phraseFreq=1.0)
>    48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>    0.109375 = fieldNorm(field=all_search, doc=3469163)
>
> dismax QueryParser:
> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the Anthology"
> final query:  +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the antholog"~3)~0.01
>
> score:
>
> 7.449651 = (MATCH) sum of:
>    3.7248254 = weight(all_search:"the beatl as musician revolv through the antholog"~1 in 3469163), product of:
>      0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the antholog"~1), product of:
>        48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>        0.014681898 = queryNorm
>      5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>        1.0 = tf(phraseFreq=1.0)
>        48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>        0.109375 = fieldNorm(field=all_search, doc=3469163)
>    3.7248254 = weight(all_search:"the beatl as musician revolv through the antholog"~3 in 3469163), product of:
>      0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the antholog"~3), product of:
>        48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>        0.014681898 = queryNorm
>      5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the antholog" in 3469163), product of:
>        1.0 = tf(phraseFreq=1.0)
>        48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>        0.109375 = fieldNorm(field=all_search, doc=3469163)
>
>
>